Software Development

Using AWS Comprehend and AWS Elasticsearch for NLP

Vivek Padia • Aug 27, 2020

AWS

Machine Learning

Natural Language Processing

python

Photo by Markus Winkler on Unsplash

Introduction

In this article, I’ve implemented a program to be used as a quote suggestion system. For this exercise, I used programming quotes API from the Heroku app. Ideally, the software would return similar quotes in case a user likes a quote, and return different quotes if a user dislikes a quote.

The Approach

To begin with, I needed some sort of a method to measure the relative value of a given quote against other quotes under consideration. To find the most relevant quote, I measured the sentiment of sentences using the following AWS services:

AWS Comprehend to measure the sentiment of the sentence
AWS Elasticsearch to store resultant data

Additionally, I used the boto3 library from Python to connect with AWS and use its services.

Sentiment analysis

Sentiment analysis is the classification of text into emotions using Machine Learning. This method allows companies to identify user-sentiment towards products by analyzing ratings and comments on social media.

Consider the following example:

“The best minds of my generation are thinking about how to make people click ads.” — classified as Neutral sentiment, as without context the statement on its own implies neither positive nor negative emotion.
“A program that produces incorrect results twice as fast is infinitely slower.” — classified as Negative sentiment considering that slower programs are most definitely undesirable therefore causing negative emotion.
“Walking on water and developing software from a specification are easy if both are frozen.” — classified as Positive sentiment.

There are multiple NLP methods for finding the sentiment of text. But instead of doing it from scratch, I used AWS Comprehend pre-trained models.

AWS Comprehend

AWS Comprehend is a text analysing language processing service for getting a variety of insights. It offers a few pre-trained models for direct usage. Below is an example of a model I used for sentiment analysis.

The two lines in this example use the boto3 library from python to connect with the AWS Comprehend service. This service provides a detect_sentiment function to call sentiment analysis API.

For the snippet used, I received the following response:

Note that the response generated shows a percentage value for every emotion. The emotion that has the highest value is displayed as the Sentiment of text.

AWS Elasticsearch

Elasticsearch is used for storing information about quotes with its corresponding sentiment in order to compare it with other quotes. It is a type of NoSQL database for storing documents. Before storing the document, the quote is merged with the respective Sentiment of text returned from AWS Comprehend. Now, whenever the user likes a quote, certain queries are fired to get the most relevant quote based on text Sentiment.

Also read – Applying Machine Learning to Solve the House Price Prediction Problem

The following code uses boto3 library for connection with Elasticsearch service and gives us the ability to fire queries on the database:

Documents in elasticsearch are listed below:

After adding all the quotes along with its detected sentiment to our database, I used queries to get the closest positive Sentiment quote. An example of one such query is below:

Conclusion

Getting insights related to NLP is easier and faster with AWS Comprehend. It only takes a few minutes to set up and runs with pre-trained models. Comprehend has multiple similar services for analyzing text using Machine Learning.

Vivek Padia

I work with Aubergine Solution as a Machine Learning engineer. We believe in having a problem-solving attitude. I have worked with several different technologies related to ML and integrating them with cloud-based services.