Hellobot — how we created our own voice solution

The world of today’s tech enables having a casual phone call using varied voice technologies. They allow substituting a human voice in conversations characterized by repeatable questions and answers. Thanks to iteo’s voice solution, Hellobot, user’s queries can be comprehensively processed, providing a wide range of AI-based functionalities and integrations with various systems.

A few words of introduction

The basis of every conversation with a bot is a proper scenario. Its level of complexity depends mainly on business goals we want to achieve. It can be compared to a decision tree. Each question asked by a bot and each answer provided by a user moves a speaker further, through subsequent components called actions which apply to different matters. They cover basic abilities of talking and speaking, recognizing intentions, extracting entities, carrying out HTTP queries to varied external systems, as well as GDPR messaging, redirecting a call or reporting conversations’ results. The modularity of creating scenarios allows reusing some of the repeating elements which has a great impact on shortening the time of implementation.

A wide list of Hellobot’s functionalities and abilities allows placing the solution in the category of advanced voice systems, enabling complex speech processing. The most essential functions include:

  1. Speaking and listening
  2. One statement — many pieces of information
  3. Unique conversation
  4. Address verification
  5. Payment handling
  6. Sex recognition
  7. External REST API integration

1. Speaking and listening

In particular moments of conversation a bot says specific, predefined statements prepared by a specialist. They can have any content and length. Main tools used to achieve this goal are Google Text-to-Speech and Azure Text-to-Speech which provide a similar set of functionalities: spelling, reading dates, taking a break while speaking, and many more. Creators can use provided parameters such as: talking speed or a type of voice. An unquestionable advantage of both solutions is Polish language, reasonable price and certainty of developed tools performance.

An answer given to a bot is extremely important and Google Speech-to-Text is a speech recognition tool that brings the best results. Thanks to its flawless work, speech-to-text conversion ends with a full success which has a huge impact on the whole conversation.

2. One statement — many pieces of information

The diagram below shows acquiring two pieces of information about a pizza: ‘size’ and ‘type’. Additionally, the voicebot asks about ‘dough’ because this information is lacking. In the second example, there’s no information about the ordered food, so the bot asks general questions.

After preparing the wit.ai application context and a correct network training, a user’s statement can be categorized to extract the relevant information which has an actual impact on the whole conversation. By designing a conversation scenario properly, one can acquire many pieces of information at the very beginning of the talk which allows us to significantly shorten the order process. As a result, the conversation with the bot becomes more natural, because it gives an impression of the machine understanding the speaker’s needs and saving acquired information. One simple question: “Hi, how can I help you?” makes the conversation more open and casual than a linear scenario built from many schematic questions.

3. Unique conversation

4. Address verification

This functionality can be used in many different situations. Recognizing the address correctly and having it saved allows determining the place of meeting or delivery, and the implemented validation excludes potential errors or misunderstandings.

5. Payment handling

6. Sex recognition

7. External REST API integration

Summing up

human-centric software design & development. check out our website: www.iteo.com