Generally deep learning can be carried out in the cloud or by utilising extremely high performance computing platforms, often utilising multiple graphics cards to accelerate the process.
Inference can be carried out in the cloud too, which works well for non-time critical workflows.
However, inference is now commonly being carried out on a device local to the data being analysed, which significantly reduces the time for a result to be generated (i.e. recognising the face of someone on a watch list)