Understand better the technology behind our voice assistant
HiJiffy’s conversational AI has been excelling in processing text messages and responding to all kinds of guest queries in the same form. As we always keep an eye on trends in communications, we have been innovating and developing our Aplysia OS to bring powerful new features to guests and hoteliers.
Communicating through voice notes is gaining popularity at an astounding pace – as many as seven billion of them are sent daily on WhatsApp alone. In line with that, the expectation of providing voice assistance by businesses is also increasing. With the latest technological advances at HiJiffy, AI-powered voice assistants are now also available in hospitality.
What are voice assistants?
Voice assistants are already a reality in various industries, and people incorporate them in their everyday life for convenience, comfort and time-saving, for example:
Make a call, send a message, and receive, open, and read messages;
Search for news, weather predictions, currency, and definitions;
Make notes and reminders;
Schedule and reschedule events;
Set an alarm, make a screen brighter, turn on/off Wi-Fi, or play music, among other standard screen functions;
Display the route from point A to place B in navigation searches;
Navigate through leisure: find fun things to do in the city, movies to watch, and weekend getaway destinations.
Voice assistants or voicebots are a subset of conversational agents powered by artificial intelligence (AI) that can interpret natural human speech and answer with an artificial (yet human-sounding) voice. Voice assistants can hold conversations and provide answers using voice recognition, artificial intelligence, and natural language processing (NLP).
Say hello to the new voice in your hotel
The mission of HiJiffy is to better connect hotels with their guests by developing the most advanced conversational AI for hospitality. Our voice assistant understands guest requests made through audio format, such as check-in and check-out times or hotel spa and restaurant opening hours. The voice assistant will be able to use its existing knowledge to deliver answers not only by text but also through voice messages.
These are the four key elements necessary to have a functional voice assistant system, yet other smaller processes enhance the system.
Our voice assistant is based on the architecture of HiJiffy and Aplysia OS, creating a solid foundation that will allow users to access the tools and features already existing in our Guest Communications Hub.
The organisational system of our voice assistant is as follows:
Receiving audio from the user.
Transforming the audio into text (STT).
Predicting the best response to the text (Decision making).
Transforming the response into audio (TSS).
Returning the answer to the user.
Speech-to-Text (STT)
Turning audio files or spoken input from a microphone into text is known as speech-to-text. An ideal STT should be able to “perceive” the given input (audio), “recognise” the spoken words and then subsequently use the recognised words as input (final text).
We provide a generic model currently widely used among the many available models and variants. It is a statistical historical approach consisting of three key components.
Extraction of features
Obtaining different features, such as power, pitch, and vocal tract configuration from audio. In this way, it is possible to recognise the essential audio parts, such as what is not background noise and irrelevant information.
Acoustic model
Turning the extracted features into a statistical parametric speech model, predicting what phoneme each waveform corresponds to, typically at the character level.
Language model
Determining whether word combinations are feasible with the use of a language model. It uses grammar principles and probabilities that specific sounds appear together in sentences.
There are other approaches available; this is just an example to demonstrate how to get a text from audio.
Text-to-Speech (TTS)
The inverse of speech-to-text conversion is text-to-speech, a process that models natural language and converts text into speech for audio presentation. The most recent TTS models follow the following structure:
Text preprocessing and normalisation
Simply the precursor step for the input text. It will be converted into the target language linguistic features in the form of a vector input into the acoustic model.
Acoustic model
Conversion of the preprocessed/normalised text into a sequence of waveform blocks which will then create the voice of the voice assistant.
In this way, a computer can reproduce voice through text. Technology has advanced so much that it is possible to clone voices; for instance, to generate a voice that sounds exactly like yours so that a voice assistant can use it.
Decision Making
To decide the most appropriate answer to the user’s message, one must first grasp the substance of the message that the user has sent. Like HiJiffy’s chatbot, the voice assistant will employ models, Aplysia’s NLP, and our optimised hospitality models to determine the best answer to the customer’s request.
Architecture
The HiJiffy architecture will be employed, and all the processes and functionality that the conversational AI currently provides will also be available in our voice assistant. In other words, the features available in our conversational AI will also be available in our voice assistant and chatbot, one of them being the ability to determine the language spoken by the user.
Some minor operations improve our voice assistant’s functionality in addition to the main ones mentioned above; for example, choosing a specific voice for your hotel’s voice assistant to best match the brand.
In conclusion, the HiJiffy architecture, distinguished for its reliability and quality, and the Aplysia OS, which delivers all the underlying innovation, will serve as the foundation for Voice Assistant.
This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
Strictly Necessary Cookies
Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.
If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.
3rd Party Cookies
This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.
Keeping this cookie enabled helps us to improve our website.
Please enable Strictly Necessary Cookies first so that we can save your preferences!