A middle ground for computer vision
How did Edge Impulse manage to create such a compact ML model? In a blog post, principal machine learning engineer Matthew Kelcey and senior developer relations engineer Louis Moreau noted that image classification is simpler than object detection.
The former takes an image as an input and outputs the type of object and works even on microcontrollers. Object detection, on the other hand, outputs information such as the class, object count, position, and size, and is “hardly” run on MCUs due to their complexity.
FOMO adopts a middle ground by returning the centroid location of objects, but not their size. This offers the ability to identify, count objects, and determine the relative positions of the objects without excessive overheads.
“The FOMO model provides a variant in between; a simplified version of object detection that is suitable for many use cases where the position of the objects in the image is needed but when a large or complex model cannot be used due to resource constraints on the device, ”they wrote.
According to the developers, FOMO also performs better on a large number of small objects than MobileNet-SSD and YOLOv5 – another object detection algorithm.
Limitations of FOMO
As an ML model designed to run on microcontrollers, FOMO does suffer from some shortcomings that another model running on a more powerful system is unlikely to encounter.
For instance, Kelcey and Moreau acknowledged that FOMO will not detect distinct objects where their centroids “overlap” and result in them occupying the same cell in the output. Moreover, FOMO operates best when all objects are of a similar size.
Reading between the lines, FOMO is probably more suited for monitoring a video feed of a manufacturing line than say, to count the number of shoppers at a shopping mall.
On the plus side, FOMO is also compatible with any MobileNetV2 model and can leverage transfer learning to work with existing models. And this gives FOMO “the capabilities to scale from the smallest microcontrollers all the way to full gateways or GPUs.”
More detailed documentation about FOMO can be found here.
Paul Mah is the editor of DSAITrends. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose. You can reach him at [email protected].
Image credit: iStockphoto / pisittar