Week 3 and 4 of my Outreach'23 internship have flown by, and I can hardly believe how quickly time has passed😌. As a second-year undergraduate student from IGDTUW, I feel incredibly grateful to have been selected as an Outreachy'23 intern.
In this blog post, I will share my experiences and progress during these two weeks of my internship. If you're interested in learning more about Outreachy, I invite you to check out my previous blogs🚀
During these weeks, our main task was to develop a machine-learning model capable of classifying heartbeat sounds. Sumaya (my co-intern) and I decided to keep our initial goal to create a model that could distinguish between "normal" and "abnormal" heartbeat sounds.
To begin, I delved deeper into the topic of sound classification with YAMNet. YAMNet is a pre-trained deep neural network developed by Google AI, specifically designed for predicting audio events from a wide range of classes. It utilizes the MobileNetV1 depthwise-separable convolution architecture, known for its efficiency and accuracy in classifying audio events. YAMNet was trained on the extensive AudioSet corpus, which consists of over 2 million audio clips sourced from YouTube videos. ( To know more checkout this repository - Link)
While exploring YAMNet, I came across an interesting project called "Heartbeat Audio Classification." This project was developed by Nittala Venkata Sai Aditya, Gandhi Disha, Saibhargav Tetali, Vishwak Venkatesh, and Soumith Reddy Palreddy as part of their Advanced Machine Learning Course in the Masters program in Business Analytics at The University of Texas at Austin. The project aimed to find the machine learning model with the highest accuracy in classifying heartbeat sound files as normal or abnormal. Since this project aligned perfectly with our weekly goal, I decided to draw inspiration from their work and train our models accordingly.
For the project, I obtained data from Peter Bentley's "Classifying Heart Sounds Challenge." This dataset included 585 labeled audio files and 247 unlabeled audio files, sourced from a clinical trial using a digital stethoscope and the iStethoscope Pro iPhone app. Since we focused on building our machine learning model, only the labeled audio files were utilized for training.
The dataset consisted of five major heart sound classes: normal, murmurs, extra heart sounds, extrasystole, and artifacts. These classes represented different characteristics of heartbeat sounds, such as distinct lub-dub patterns, whooshing or rumbling sounds, irregular rhythms, or non-heartbeat sounds. To gain insights from the data, I conducted exploratory data analysis and referred to the project team's blog, which provided valuable information about the various classes and their characteristics.
From the amplitude waveplots of the different classes, some interesting observations emerged -
Normal heart sounds exhibited a consistent distribution of amplitudes with a clear lub-dub pattern. In contrast, murmur heart sounds displayed less consistency and had additional sound waves between the lub and dub, indicating the presence of whooshing sounds. Extrasystole heart sounds had higher amplitudes and irregularities between the sound waves, indicating irregular heart rhythms. Extrahls had irregular patterns compared to normal heart sounds, with a few high-amplitude sound waves representing galloping sounds associated with certain heart conditions. Artifact class waveplots showed a wide range of different sounds, including feedback squeals, echoes, speech, music, and noise, which were unrelated to heartbeats.
Combining all the wave plots together, it became evident that extrasystole heart sounds had higher amplitudes compared to other classes, and all the heartbeat classifications exhibited irregular rhythms compared to normal heart sounds.
During the analysis, I also discovered that our dataset suffered from class imbalance, where the number of normal heartbeat sound samples far exceeded the other classes. This class imbalance posed a challenge as it could introduce bias in the model, leading to inaccurate results and an insufficient understanding of the underlying patterns distinguishing between classes.
To address this issue, we employed data augmentation techniques. With only 585 audio files for training, we decided to increase the dataset size by generating synthetic data using two popular methods: adding noise and changing pitch and speed.
Once we augmented the dataset and extracted the relevant features, we obtained 1,755 rows and 162 features. These features were essential for enabling our machine-learning models to understand the audio data. To extract the features, we utilized the librosa package in Python, a powerful tool for music and audio analysis. This package provided us with features such as Zero Crossing Rate (ZCR), Chroma, Mel Frequency Cepstral Coefficients (MFCC), RMS (Root Mean Square), and Melspectrogram, which captured different aspects of the sound waves and aided our machine learning model.
With the augmented dataset and extracted features in hand, we proceeded to train and evaluate several machine learning models on the training and test sets. The models included:
❄️Random Forest Classifier: This ensemble learning method constructs multiple decision trees and produces a majority vote for classification. We used it as our baseline model.
❄️Light Gradient Boosting Machine (LightGBM): As boosting techniques generally outperform bagging techniques, we opted for LightGBM, a faster version of Gradient Boosting Machine (GBM).
❄️CatBoost: Another boosting technique that supports categorical features and provides fast predictions.
❄️Convolutional Neural Network 1-D (CNN): CNNs excel in audio and image data analysis. We trained two CNN models, one with ReLU activation functions and another with 'tanh' activation functions to introduce nonlinearity.
All five models were trained with and without upsampling the training dataset as well with unsampled dataset. Additionally, the multi-class classification problem was converted into a binary class problem.
Upon evaluating the models' performance on the test set, we found that the random forest model and LGBMBoosting outperformed the others, achieving an accuracy of 81% and 82% respectively after class imbalance was treated.
Throughout this internship experience, I have gained valuable learnings. As a beginner in machine learning, I initially struggled to understand the codebase. However, thanks to the well-written blog and documented GitHub repository, I was able to overcome these challenges and grasp the concepts effectively.✨
As I move forward in my internship journey, I look forward to further improving our heartbeat sound classification model and exploring advanced techniques to enhance its accuracy and versatility. Additionally, I aim to create a user-friendly interface that will make our model accessible to medical professionals and researchers in the field.
In conclusion, weeks 3 and 4 of my Outreach'23 internship have been incredibly enlightening🤩. I have acquired knowledge in sound classification with YAMNet, applied it to the domain of heart beat sounds, and developed a machine learning model with promising results. I am excited to continue working on this project, facing new challenges, and witnessing its progression in the coming weeks💫.
If you're interested in knowing about my internship journey follow me on my socials, be sure to check out my previous blogs, where I have share insights into the Outreachy program and my experiences as an intern.
Stay tuned for more updates as I delve deeper into the world of machine learning and tackle new obstacles!
References
Top comments (0)