Object Detection in Chatbots and Beyond

by Mila Slesar

This blog has been dealing with the chat bot topic frequently. We have written about how to build a chatbot using a bot development framework, how to create a chatbot user interface, paying special attention to the conversation flow and language, and even about handing over chatbots built on a specific chatbot platform.

An actual case of building a chatbot was still to be written. In this post, we’ll show how alternative-spaces team made one for Telegram and Facebook messenger. Since the demo project included work with an image object detection API, we’ll dedicate the first chapter to image processing/object detection. The second will tell about Dogbi, alternative-spaces’ object detection app. If you’re interested in building a chatbot in Python using object recognition, we hope it will be useful.

Object Detection Technology

The technology is related to image processing and computer vision and is used in face detection and recognition, video object co-segmentation, and similar tasks. It deals with detecting instances of objects of a certain class in digital images or videos. All items in a class have particular features and an input image can be compared with a specific object model. For example, shape-based object detection uses the items’ similar shapes to classify them.

Object detection methods are based on either machine learning or deep learning. The ML-based approach defines the class’ features and eventually uses a support vector machine or other techniques for the classification. Deep learning techniques can detect an object in an image without specifically defining the features. They are typically based on convolutional neural networks.

In the project mentioned above, alternative-spaces team utilized TensorFlow Object Detection API. It’s a research library whose object detectors follow the deep learning approach. The API empowers developers to build, train, and deploy object detection models for various uses.

The creation of an object detection application starts with assembling a dataset, a collection of images with labels. ImageNet is a valuable service for machine learning and object recognition projects. Moreover, it provides a solution for another need – the bounding box necessary for specifying the location of the object in each image.

Alternatively, use the open-source LabelImg or other tools. Whatever you choose, it must provide you with a folder with .jpg (data)images and .xml (label)file. The latter are eventually converted to .csv format.

If you wish to avoid training an object detection model from scratch for weeks, with a high-end graphics processing unit, use one of available pre-trained models. Choose one, download it, and retrain with your custom dataset, replacing that model’s classes with yours. When you stop TensorFlow training, you can export the latest checkpoint file to a graph file and perform live inference with it.

The resulting model can be used in many ways, e.g., for building a real-time iOS object recognition application or, converted to TensorFlow Lite format, for building an app for Android. Alternative-spaces team utilized it for a chatbot whose description follows.

Dogbi: alternative-spaces‘ Dog Breed Identification Chatbot

The goal was to create a chatbot app which can identify dog breeds using a photo. The user sends a photo of a dog, and Dogbi analyzes the image and responds with an estimate of similarity. The input image is not necessarily a picture of a dog only, and the system’s task is twofold:

1) detect a dog in a picture;

2) guess the breed of the dog using the available knowledge.

Dogbi is using a pre-trained model for dog breeds recognition. The team took advantage of the TensorFlow model and ready-made instructions. (Here you can find the scripts and data for reproducing the breed classification model training, analysis, and inference.) The bot API is written in Python.

You basically have to download the Inception model (a deep neural network pre-trained by Google) and the Stanford Dogs dataset. The latter is using 20K images of 120 dog breeds with class labels and bounding boxes from ImageNet to facilitate a fine-grained image classification.

NB: Assuming that the accuracy of a trained object detection model is directly proportional to the number of images, the team initially used 2-5K images for each dog breed. The dataset was downloaded with a script crawling large sets on ImageNet. It was validating, converting, and changing the file sizes for some ten hours. Surprisingly, a model handling a smaller dataset (100-200 images per breed) turned out to work way better. The result may depend on the dataset quality and the weights in the model.

On top of the Inception model, you have to build a dog breed classification neural network model, train it, and then freeze. The duration depends on the depth of your model and number of epochs. A CSV file with predicted vs. actual breed should be used to analyze precision on the training data. The frozen model is used to classify an image either available on the filesystem or downloadable as an HTTP resource.

The frozen model is ready to be used for image classification tasks. The project can be dockerized with premade docker-compose and .dockerfile. There’s a Docker file in our repository to build the Docker image and run the application.

Here’s how you can build a chatbot for Telegram on your own:

Fill out const_empty.py in telegram_bot directory and rename it to const.py
Generate a legit secret key for settings_without_key.py (add SECRET_KEY = ‘YOUR KEY’) and rename it to settings.py
Copy the static files into AI directory: TensorFlow graph and breeds.csv should be placed in root/AI directory, and static header images (one per breed, a picture which displays it best) – in root/AI/static/media/ directory
Build a docker image using docker-compose build and launch it with docker-compose up
Forward the 8005 and 8006 ports to a default 80 port of your server

That’s it! Dogbi is running in a Docker container on your server!

For Last

The function and the model used in Dogbi are simple, but don’t think the technology can be used only to analyze photos or create chatbots. Object detection using Python can recognize foods or work in a system that tells what it sees in real time, to name a few.

Object detection has been used for vehicle detection, security systems, and self-driving cars. The use cases should be extended to image segmentation and distance estimation soon. Real-time object recognition on the camera screen (e.g., to help visually impaired individuals) and more challenging applications and goals lay ahead.

Content created by our partner, Onix-systems.