Speech recognition is one of the most difficult aspects of programming and, at the same time, one of the most desirable features of new-age applications. One of the potential implementations is using speech recognition in call-centers systems to automate the mundane work of operators.
The main question is how to make a machine understand natural language, transform it into text and return an adequate response? Many development companies such as Google, Amazon, IBM, Apple, and Microsoft are now highly involved in the creation of the best mechanisms to cope with these tasks. Thus, some time ago Amazon announced a service for creating conversational interfaces for applications – Amazon Lex.
Actually, we actively use AWS products in many of our projects and know their specifics, so, in spite of the fact that Lex was still quite raw (new and poorly documented), we were determined to crack it. We hoped that this would be one more positive story that would give us more points toward our overall expertize score.
However, our experience turned out to be quite controversial. If you plan to use Lex for your automatic speech recognition (ASR) bot, you may find it helpful to read this article. We would like to caution you.
It all started with a request from one of our existing clients that sells cosmetics in the U.S. via teleshopping. The challenge was to develop a contact center with an automatic speech recognition bot that would reduce the company’s routine of taking the orders.
Previously, all calls were received and recorded by the IVR system. Then, the call center operators listened to the records to extract such important data as payment and delivery details (card number, customer name, and address) and manually entered them into the company CRM. As far as these manipulations were time-consuming and did not require any logical thinking, it was rational to delegate this part of the job to machines.
As all servers of the existing system (including CRM) were already on Amazon, it was quite logical to continue building the system with other AWS products.
In general, Amazon is a giant world of services that simplify the life of developers providing ready-made solutions for storage, database, machine learning, AR&VR, IoT, etc. Some tasks, that we normally used to do manually (like the configuration of servers), are now fulfilled instantly with Amazon services.
For the purposes of this project, we’ve selected a combination of three Amazon products: Connect, Lambda and Lex.
The first thing we did was create a contact center using Amazon Connect, a platform specifically designated for building contact centers with all necessary general functionalities, even without programming skills. Using the drag-and-drop interface, you can quickly build a contact flow (this is how the system will guide the incoming calls). It looks like a block-schema that can be easily modified:
Example of Amazon Connect Flow
By means of Amazon Connect we completed the following tasks:
Built interactive voice response (IVR) workflows. Depending on the possible scenario, each flow can redirect the user to other subflows basing on the user’s responses. Thus, it is easier to manage the flows.
Example of workflows list in Amazon Connect
Created and managed queues (specifying by agent skills, prioritizing and forwarding to the next available agent, placing on hold, etc.)
Created different roles or permissions, for example, agent, manager, or administrator.
Using another Amazon service known as AWS Lambda, we integrated the contact center with the customer relationship management (CRM) and storage.
Example of User Management in Amazon Connect
Among the benefits and features of Amazon Connect there are:
Creating as many agents as you need for your business
Claiming phone numbers (up to ten numbers)
Paying only for minutes when you use Amazon Connect engaging with customers.
As we have already mentioned, we used Lambda to integrate a contact center to the already existing customer relationship management system.
In our project, we used Lambda to transfer the data collected in the conversation with a bot from Amazon Connect to the CRM and quickly create a new Lead. As the conversation with bot goes on, we save each new piece of information to the Lead profile.
Lambda allows running the isolated code, making the application fulfill some additional tasks to extend the contact center with custom logic. Thus, we needed Lambda to implement such functions as verification of the customer card data, as Amazon Connect itself cannot do this.
Lambda function block in the Amazon Connect flow
The key benefits of Lambda include:
Allows you to focus on coding and forget about server managing
You can write code in Java, Node.js, C#, and Python and use any third party library
You pay only for the requests served and the computing time required to run your code
Lambda automatically scales up on increased loads.
Finally, to supply the contact center with a conversational interface, we needed to use a combination of Amazon Connect and Lex. This will allow us to substitute the existing IVR for a system with automatic speech recognition.
What is Lex in AWS?
Amazon Lex is a service intended to create chatbots for mobile devices, web applications, and chat platforms like Facebook Messenger, Kik, and Slack.
As stated on the official AWS website, “you supply just a few example phrases and Amazon Lex builds a complete natural language model through which your user can interact using voice and text, to ask questions, get answers, and complete sophisticated tasks.”
One of the additional functions of Amazon Lex chatbots is the capability to recognize human speech that is based on the same Machine Learning algorithms as Alexa.
Amazon Connect integrates Lex blocks into the flow scheme. During the conversation, after a user asks a question, the bot should be able to provide corresponding options to continue the dialog to complete the order. In Lex terminology, this is called an intent. For example, if a user says: “I want to cancel my order”, the bot should provide an adequate answer like: “What is the number of your order?” and wait for the response.
The system expects to get a definite type of answer to fulfill the intent. These answers are recorded to slots. For more accurate recognition, each slot has its own type. For our purposes, the slots were assigned for collecting the customer data (First Name, Last Name, Address, Card Number) sufficient to process an order.
When the Lex bot collects necessary data, it returns it to Amazon Connect that, in its turn and with the help of Lambda, passes it to the CRM.
So, How to Use Amazon Lex Chatbots?
In the scheme below, you can see one of the flows where the bot is intended to get the user info. For example, a new customer phones to the contact center. Click the image to zoom it in.
When the customer is connected with Lex bot (1), it asks him/her say or spell the FirstName initiating the “GetName” intent (2).
If Lex recognizes the Name, it saves it in the attributes (3) and repeats the name to get the user’s confirmation.
If the name is not confirmed, Lex initiates the step 2 again (4).
If the name is confirmed, the bot can pass to the Last Name intent (5). After that, Amazon Connect invokes Lambda function (6) to save the First and Last Name data in the CRM creating a new Lead.
If the bot does not identify the Name (7), it transfers the user to another flow or hangs up (8).
Example of Amazon Connect workflow with Lex bot
In theory, it sounds clear! But what can you expect in practice?
What’s Wrong with Lex?
When we finally assembled the whole scheme with Amazon Connect, Lambda, and Lex, everything worked correctly except the Amazon Lex bot. It could ask questions of a customer, but there was a problem with understanding the answers. It worked well only with dates, numbers and “yes/no” questions but did not adequately captured names and, what is more important for delivery, addresses. Instead of the correct text, it returned a set of weird symbols in these areas. We wrote to Amazon Support about this problem but did not receive any additional assistance.
We did the best to configure the bot, considering we had poor official Lex documentation and very few technical articles at our disposal.
As an option, we decided to make the bot record the conversation to make further manipulations with information manually. But Lex allows to do this only when the flow redirects the user to the operator, and as in our project there was no operator this option was not applicable.
The most obvious reasons why Lex did not work as expected were that the technology was still raw and the primary designation of Lex was the chatbot functionality, including recognition of text rather than voice.
Thus, after a long struggle, we decided to leave our attempts to beat out Lex until the technology is further enhanced in the future.
What Are the Alternatives?
Of course, Lex was not the only possible solution. But, as the CRM was already deployed on Amazon servers, it was reasonable to use other AWS products for new functionalities. We relied on Amazon’s reputation and our experience with their services.
Now, we’ve learned of a new service that has recently been announced: Amazon Transcribe, specially created to transcribe audio and video files into text. With this option, the text will be enriched with punctuation and formatting, thanks to Deep Learning technology, to provide a better understanding. For now, it will support only English and Spanish, but other languages will be delivered, soon.
This technology will significantly help in such cases as telephone calls, interviews, meetings, creating subtitles for video content, etc.
Natural language understanding is one of the most challenging IT subjects today. The hardest thing about automatic speech recognition (ASR) is to understand what the user actually means and transcribe it into text. Amazon struggles with this challenge, providing solutions for developers to easily and quickly create conversational bots that would reduce the mundane tasks of multi-channel contact centers.
At Greenice, we are always eager to learn new technologies and work on non-trivial tasks. Of course, there is always a risk when trying new solutions, because of their unstable behavior and lack of technical documentation. So it is always “travel with an unpredictable end.”
Using Amazon Lex with Amazon Connect may be reasonable only for its chatting abilities. But we would not recommend relying on Lex in the creation of ASR systems as it definitely is not tailored for this, in spite of declared ambitions. Maybe that’s why Amazon is rushing to release their Amazon Transcribe to fill this gap.
Nevertheless, this situation with Lex did not restrain us from further working with Amazon. We have been using AWS services in most of our projects for the last two years and all our previous experience with Amazon was highly positive. Among the products that we really love and have vast experience with are Amazon S3, Elasticsearch, EC2, Lambda, RDS, ElasticCache, Connect, and some others. These services allow saving time and efforts of the team in making highly scalable, secure, quick, and flexible applications. Amazon Lex became our first and hopefully the last disappointment with AWS.
Co-Author: Svyatoslav Mordashev, Team Leader at Greenice and main specialist in Amazon Web Services.