Start a personal dev blog on your domain for free with Hashnode and grow your readership.
Get Started

A Practical Comparison of Face Detection and Recognition Tools

Original Article here.

As an IT company, Diatom Enterprises has been producing custom software for already 13 years. However, during the recent year we have been deeply interested in the IoT, AI and robotics, and the Robot Pepper was selected as a perfect platform to integrate all Diatom’s developments and to bring it to the business environment.

Lately, we encountered the necessity to use face detection and recognition on one of our experimental projects for the robot Pepper, and faced several challenges with this feature. On Pepper, the built-in face detection and recognition functions have several issues:

Lengthy face detection process – up to 15 seconds to detect a person’s face

  • Unstable face recognition – In good lighting conditions, it is 6 of 10; in low light conditions, 4 of 10 .

We decided to find a way to improve the main disadvantages of Pepper.

Our basis for the new approach was to use a person-tracking feature on Pepper. This feature indicates when there is a person around. Once we know that a person is in front of Pepper, we use Pepper’s video stream to take a picture, assuming that the person’s face should be there.

The next step is to recognize the face. We extended a Microsoft web API for face recognition to pre-learn new faces from images. Once we upload new face images to the Microsoft Face API, the person is ready to be recognized.

Using our web API, we upload a picture taken by Pepper to the Microsoft Face API service and get JSON data about the person in response if the image was recognized. We can get the name, age, emotion, gender and facial features such as glasses, moustache, beard and sideburns in return. Pepper then uses this info on its own.

Once we had evaluated this method of face detection and recognition, we decided to look around and find other available solutions as well. We figured out that there are basically two working approaches: either use a web-based API service for face recognition or a computer-hosted application that uses a facial recognition tool. We played with a few of the popular available tools for face detection and recognition.

Below is a short summary of our results.


This is a web-based service for face recognition and detection. We created our own wrapper for the available Microsoft Face API methods. The wrapper has some additional functionality we needed in order for it to work with Pepper.

This method produced the following results:

– Hybrid approach: Face detection is on Pepper (computer); recognition takes place over the web API service. – Overall, face detection and recognition now takes up to six seconds – two seconds to take the picture on Pepper and three to four seconds to transfer it over the internet, recognize it and send the result back to Pepper. – Overall time to detect and recognize a person – three to seven seconds – Face recognition now is very stable; it is 18 of 20. – Cost: MS Face API is $1.50 per 1,000 transactions for 0–1,000,000 transactions. Face storage costs $0.50 per 1,000 images, per month. See more here.


This approach works on a computer as a standalone running application. The computer has to have a camera connected to it. WebIP cameras also work well for this.

We used a Windows-based desktop application to detect and recognize faces. The face detection is very stable and is able to detect a face within four meters. The face recognition uses a proprietary database. Each person can have several faces stored in the database. Unfortunately, face recognition works quickly but is very unstable. It cannot be used in production projects.

It produced the following results:

– Hybrid: Face detection is local computer-hosted; face recognition is over a web service. – Face detection – one second – Face detection stability – 18 of 20 – Face recognition – one second – Face recognition stability – 16 of 20 – Working distance to detect and recognize faces – up to four meters – Overall time to detect and recognize a person – two seconds – Cost: A commercial license costs $399 for a single developer or $799 for a whole work group of 25 developers. See more here.

Start a personal dev blog on your domain for free and grow your readership.

3.4K+ developers have started their personal blogs on Hashnode in the last one month.

Write in Markdown · Publish articles on custom domain · Gain readership on day zero · Automatic GitHub backup and more

No Comments Yet