Jibo, a social robot with speech recognition skills

by Anna Umanenko


A storyteller that entertains kids, a personal assistant that reminds and keeps the family members current with appointments and activities, or a friendly companion to talk to is what Jibo, the world’s first social robot is about. Building strong communication skills is a central concept for this family oriented, emotionally engaging machine.

Jibo is different from other AI robots because of his ability to respond intelligently and quickly, recognize speech, recognize faces and accomplish all sorts of tasks requested by a user. Jibo’s “natural” speaking skills are going to reshape the home robot technology and be “game changing” solution. It is the natural speech processing that brings to life these new communication skills.

What natural speech processing implies

The technology of natural speech recognition was developed to enable a robot’s computing system to analyze a user’s speech and understand their needs through a spoken word. Natural speech processing comprises the main constituent of artificial intelligence and is part of machine learning technology.

Natural language processing software is utilized to perform the following tasks:

  • Speech segmentation.

Recognition algorithms that allow segmentation of speech into smaller parts, so a sentence’s grammar construction becomes clear for a robot.

  • Deep analytics.

This feature is based on an advanced technique of speech processing that “knows” how to pull out the meaningful information from large sets of data. Deep analytics is used to process complex queries in both unstructured and semi-structured data.

  • Language translation.

Through natural speech recognition software, machine translation algorithms are successfully applied in automatic rendering of information into another language.

  • Extraction of entity names.

Speech recognition technology can differentiate notions among a set of similar entities (identifying when it hears first name and last name, email, address, etc.)

  • Summarizing.

This technology makes it possible to summarize long text pieces into shorter information sets.

What are Jibo’s communication skills?

Jibo’s speech recognition skills are based on the embedded software that is not dependent on the Internet connection or cloud services as opposed to other existing speech processing technologies. The TrulyNatural™ machine learning platform from Sensory Inc. made Jibo their first licensee.

The platform is built on a neural network with enabled deep learning, so users can talk to the device naturally, as if it were a real person. The level of accuracy is extremely high. The error rate for TrulyNatural is 8% for one million entries, which is currently the best result among cloud-based solutions.

This platform is specifically suitable for low-power computer systems like Jibo. Speech analysis is held “locally”, and this ensures personal information security, since it is not shared or sent to the Internet.

The TrulyNatural™ speech recognition technology for Jibo combines rich vocabulary knowledge with deep speech-understanding algorithms, so users have absolute control over the device and any of its apps, even if there is noise around them. The speech software for the robot enables recognition of text parts that need to be stressed, adding emotions to the way the robot communicates. Jibo’s speech recognition platform ensures high performance, accuracy and responsiveness, so there is no delay in processing or pausing in speech.


Find out more about emerging technologies by visiting this page.

Content created by our partner, Onix-systems.

Thank you for your time. We look forward to working with you.

Please make an appointment using my Calendy link.
Schedule a Zoom call with this link:

or fill out the form below

* Required