Generally deep learning can be carried out in the cloud or by utilising extremely high performance computing platforms, often utilising multiple graphics cards to accelerate the process.
Inference can be carried out in the cloud too, which works well for non-time critical workflows.
However, inference is now commonly being carried out on a device local to the data being analysed, which significantly reduces the time for a result to be generated (i.e. recognising the face of someone on a watch list)
Clearly, for real-time applications such as facial recognition or the detection of defective products in a production line, it is important that the result is generated as quickly as possible, so that a person of interest can be identified and tracked, or the faulty product can be quickly rejected.
This is where AI Inference at the Edge makes sense. Installing a low power computer with an integrated inference accelerator, close to the source of data, results in much faster response time.
When compared to cloud inference, inference at the edge can potentially reduce the time for a result from a few seconds to a fraction of a second.
The benefits of this do not need to be explained.
To ensure that the computer carrying out inference has the necessary performance, without the need for an expensive and power hungry CPU or GPU, an inference accelerator card or specialist inference platform can be the perfect solution.
Utilising accelerators based on Intel Movidius, Nvidia Jetson, or a specialist FPGA has the potential to significantly reduce both the cost and the power consumption per inference ‘channel’.