In recent years, sentiment analysis has become an increasingly popular tool in the world of data science. By analyzing the sentiment of customer reviews, businesses can gain valuable insights into how their products or services are perceived by the public. This can be incredibly helpful for identifying areas for improvement.
The use of natural language processing (NLP) techniques, such as those provided by the NLTK library, has made it possible to quickly and easily analyze large amounts of text data. By using these tools, I was able to uncover valuable insights and produce charts to visualize my findings. I hope you enjoy reading about my project as much as I enjoyed conducting it!
The project involved collecting customer feedback data from skytrax.com and using it to gain insights into the experiences of British Airways customers. These insights were then used to make data-driven decisions that could impact the business. In this blog post, I'll share some of my findings and the methods I used to uncover them.
STEP 1. To collect the customer feedback data for my project, I used the Python library Beautiful Soup to scrape 100 reviews per page on 20 paginated webpages. In order to traverse through all 20 pages, I used a while loop, which allowed me to efficiently collect the data from each page and store it in a data frame.
STEP 2. Using Beautiful Soup and a while loop made it possible for me to quickly and easily collect the customer feedback data that I needed for my project. I was then able to use this data to uncover valuable insights and present them in a clear and concise way.
STEP 3. Once I had collected the customer feedback data, the next step was to prepare it for analysis. The data was quite messy and contained a lot of unnecessary information, so I needed to perform some data cleaning in order to make it usable.
STEP 4. One of the key steps in preparing the data was text preprocessing, which involved removing unnecessary characters and words from the text. For example, I removed the checkmark symbol "✅" that appeared before some reviews, as well as the phrase "Trip Verified" that appeared at the beginning of others.
STEP 5. By performing text preprocessing, I was able to clean up the data and make it more usable for my analysis. This was an important step, as it allowed me to focus on the content of the customer reviews rather than being distracted by extraneous information
STEP 6. After I had cleaned up the customer feedback data, the next step was to use the Natural Language Toolkit (NLTK) to perform sentiment analysis on the reviews. In order to do this, I created an instance of the Sentiment Intensity Analyzer from the NLTK.sentiment package. This object was used to analyze the sentiment of each review, and it allowed me to quickly and easily determine the overall sentiment of the data
STEP 7. Once I had calculated the sentiment scores for each review, I added a new column to the data frame named "SENTIMENT" that contained these scores. The scores were calculated using the compound score method, which gives a value between -1 and 1 indicating the overall sentiment of the review.
STEP 8.Once I had calculated the sentiment scores for each review, I created a new column in the data frame that categorized these scores into three different categories: positive, negative, and neutral. If a review had a sentiment score greater than 0, it was classified as positive; if the score was less than 0, it was classified as negative; and if the score was 0, it was classified as neutral.
STEP 9. I then used this data to create a pie chart that displayed the percentages of each sentiment type. The pie chart included labels and colors for each category, and I used the 'autopct' parameter to display the percentages in the chart. This made it easy to see at a glance how the sentiment of the customer feedback data was distributed
STEP 10. In the second part of my analysis, I focused on the key topics and words that were mentioned most frequently in the customer feedback data. To do this, I first downloaded a list of stopwords from the nltk library and used it to filter out common words from the reviews. This allowed me to focus on the more significant words and topics that were mentioned in the data.
STEP 11. Next, I created a frequency chart that showed the top 20 key words from the reviews, along with the number of times each word was mentioned. This chart made it easy to see which topics were most commonly discussed by customers, and which words were used most frequently to describe their experiences
STEP 12. The final analysis involved creating a 'wordcloud' for vizualization.The 'wordcloud' library is used to generate a word cloud based on the words used in the reviews. The generated word cloud is then displayed using the matplotlib library. This is a simple way to visualize the most common words used in the set of reviews, which can be useful for understanding the overall sentiment of the reviews.
Through my analysis, I was able to determine the overall sentiment of the customer feedback data, as well as the key topics and words that were mentioned most frequently in the reviews. This information can be incredibly valuable to British Airways, as it provides insight into how customers feel about their experiences with the company and can help identify areas for improvement.
In conclusion, my data science project was a success. By scraping customer feedback data from a third-party source and using the Natural Language Toolkit (nltk) to perform sentiment analysis, I was able to gain valuable insights into the experiences of British Airways customers.
Overall, I'm very happy with the results of my project and I'm excited to see how my findings can be used to make data-driven decisions and improve their operations. I hope you enjoyed reading about my project and that you learned something new along the way!