Hellobot — how we created our own voice solution

6 min readAug 13, 2021

The world of today’s tech enables having a casual phone call using varied voice technologies. They allow substituting a human voice in conversations characterized by repeatable questions and answers. Thanks to iteo’s voice solution, Hellobot, user’s queries can be comprehensively processed, providing a wide range of AI-based functionalities and integrations with various systems.

A few words of introduction

Our clients can use many different functions which improve their conversation with a bot and allow them to carry out complicated scenarios compatible with a company’s business assumptions. Voicebot is a great match for a lot of diverse branches. Its only limitation is a clients’ creativity, but they’re always supported by a team of experienced developers.

The basis of every conversation with a bot is a proper scenario. Its level of complexity depends mainly on business goals we want to achieve. It can be compared to a decision tree. Each question asked by a bot and each answer provided by a user moves a speaker further, through subsequent components called actions which apply to different matters. They cover basic abilities of talking and speaking, recognizing intentions, extracting entities, carrying out HTTP queries to varied external systems, as well as GDPR messaging, redirecting a call or reporting conversations’ results. The modularity of creating scenarios allows reusing some of the repeating elements which has a great impact on shortening the time of implementation.

A wide list of Hellobot’s functionalities and abilities allows placing the solution in the category of advanced voice systems, enabling complex speech processing. The most essential functions include:

Speaking and listening
One statement — many pieces of information
Unique conversation
Address verification
Payment handling
Sex recognition
External REST API integration

1. Speaking and listening

It’s every voice solution’s basic task — its correct implementation and usage are the first step to a scenario’s success.

In particular moments of conversation a bot says specific, predefined statements prepared by a specialist. They can have any content and length. Main tools used to achieve this goal are Google Text-to-Speech and Azure Text-to-Speech which provide a similar set of functionalities: spelling, reading dates, taking a break while speaking, and many more. Creators can use provided parameters such as: talking speed or a type of voice. An unquestionable advantage of both solutions is Polish language, reasonable price and certainty of developed tools performance.

An answer given to a bot is extremely important and Google Speech-to-Text is a speech recognition tool that brings the best results. Thanks to its flawless work, speech-to-text conversion ends with a full success which has a huge impact on the whole conversation.

2. One statement — many pieces of information

Nowadays, formulating and asking a question, and then recognizing the answer correctly is not enough. A user talks to a bot to solve a certain problem which requires adequate understanding and interpretation of acquired information. To meet clients needs, Hellobot is equipped with proper AI mechanisms provided by a wit.ai tool. They allow categorizing the statements and extracting particular information.

The diagram below shows acquiring two pieces of information about a pizza: ‘size’ and ‘type’. Additionally, the voicebot asks about ‘dough’ because this information is lacking. In the second example, there’s no information about the ordered food, so the bot asks general questions.

After preparing the wit.ai application context and a correct network training, a user’s statement can be categorized to extract the relevant information which has an actual impact on the whole conversation. By designing a conversation scenario properly, one can acquire many pieces of information at the very beginning of the talk which allows us to significantly shorten the order process. As a result, the conversation with the bot becomes more natural, because it gives an impression of the machine understanding the speaker’s needs and saving acquired information. One simple question: “Hi, how can I help you?” makes the conversation more open and casual than a linear scenario built from many schematic questions.

3. Unique conversation

The main reason that people don’t like talking to bots is a belief that they’re dull and monotonous. Repeated conversations with the same bot held in order to solve a problem usually lead to aversion caused by listening to the same statements over and over again. For this purpose, the Hellobot team prepared a functionality that allows having a unique conversation. While designing bot’s statements, instead of providing one, fixed statement, an author has an opportunity to prepare many different versions that’ll occur randomly. After preparing an adequately large base of statements, each conversation becomes unique, because the probability of repeating exactly the same lines is quite low. A speaker who talks with a bot a couple of times is not bored with its performance and can wait for further answers with interest.

4. Address verification

This functionality brings a totally new light to our voice solution. During a conversation, Hellobot is able to recognize an address. Its correctness is validated with the use of Azure Maps. If the address is correct, the bot continues the conversation. In case of any errors or lacks, it can ask for missing parameters — a street or a building number.

This functionality can be used in many different situations. Recognizing the address correctly and having it saved allows determining the place of meeting or delivery, and the implemented validation excludes potential errors or misunderstandings.

5. Payment handling

In the times of pandemic, a possibility of limiting unnecessary contact with other people is extremely important and highly valued. That’s why Hellobot is equipped with mechanisms that allow processing online payments. Such a solution is valid in many everyday situations like paying for an order or the amount due. And the process is super easy. When users decide on an online payment, the bot sends an SMS with a link to their phone. By clicking it, a payment can be made by using the most popular online tools. In this case, money goes straight to the receiver which significantly increases the convenience of use.

6. Sex recognition

The bot is able to determine what is the sex of the speaker based on his or her statement. It makes the scenario more personalized, because a user hears properly inflected sentences. Verification is based on the statement’s transcription. It allows excluding improper sex recognition of a man with a high or a woman with a low voice.

7. External REST API integration

During a conversation, Hellobot uses data downloaded abreast from client’s external systems through HTTP queries communication. Analogically, it informs the client’s system about the conversation’s results. It allows providing indispensable data with ease, conveniently and efficiently. There are many different cases in which such a communication facilitates the process performed by a bot. The need of acquiring information can be a result of a conversation or a first step which, depending on the data, will adjust the conversation to a specific user. It’s a two-way communication. Hellobot can inform about the effects of a conversation straight away or handle the most essential processes dependent on the scenario.

Summing up

Hellobot gives an opportunity to have an open conversation based on artificial intelligence that can analyze and interpret the user’s statements. The implementation is performed accordingly, and the option of scalability allows processing many calls simultaneously — that’s how the bot copes with constant load. Using the best external companies’ solutions makes its work easier, allowing it to process complex scenarios from different everyday life branches. Its characteristics based on modularity shortens the time of the whole process and gives an opportunity of constant development dependent from the market requirements or newest trends in the voice tech industry.