DEV Community

git-leo-here
git-leo-here

Posted on

TDoC 2024 - Day 2 : Basics of Audio Processing, Mel Spectrograms, and Librosa

TDoC 2024 - Day 2: Introduction to CLI Tools and Audio Processing

Overview

Welcome to Day 2 of TDoC 2024! Today, we explored command-line interface (CLI) tools and audio processing fundamentals, including the creation of CLI tools using argparse, and working with the numpy and librosa libraries to manipulate audio files. Below is a detailed walkthrough of the concepts, code implementations, and applications covered during this session.


What Are CLI Tools and Why Are They Important?

Definition

A Command-Line Interface (CLI) is a text-based interface where users can issue commands to perform specific tasks.

Advantages of CLI Tools

  1. Lightweight: No graphical interface overhead.
  2. Flexible: Perfect for automation and batch processing.
  3. Efficient: Faster for experienced users compared to GUIs.

Examples of Popular CLI Tools

  • git
  • curl
  • pip

Basics of Command-Line Argument Parsing

Command-line arguments are parameters passed to a script during execution.

For example:

python script.py --input "data.txt" --output "result.txt"
Enter fullscreen mode Exit fullscreen mode
  • --input and --output: Options.
  • "data.txt" and "result.txt": Values for the respective options.

Popular Python Libraries for Parsing Command-Line Arguments:

  1. argparse: Standard library module for robust CLI tool creation.
  2. click: Simplifies parsing with decorators and better user experience.
  3. typer: A modern library built on click, ideal for rapid development.

Creating a CLI Tool with argparse

Below is a basic template for building a CLI tool using argparse:

import argparse

def main():
    parser = argparse.ArgumentParser(description="A sample CLI tool.")
    parser.add_argument("--input", type=str, required=True, help="Path to the input file.")
    parser.add_argument("--output", type=str, required=True, help="Path to save the output file.")

    args = parser.parse_args()

    # Access arguments
    print(f"Input File: {args.input}")
    print(f"Output File: {args.output}")

if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode

Run the Script

python script.py --input "data.txt" --output "result.txt"
Enter fullscreen mode Exit fullscreen mode

Audio Processing Basics

Definition

Audio processing involves the analysis and manipulation of sound signals. Applications include:

  • Speech synthesis
  • Music production
  • Machine learning (e.g., voice recognition, audio classification)

Key Operations

  1. Time Domain Analysis: Examining the waveform of audio signals over time.
  2. Frequency Domain Analysis: Decomposing signals into their frequency components (e.g., Fourier Transforms).
  3. Effects and Transformations: Modifying audio properties like speed, pitch, and reverberation.

Mel Spectrograms

What is a Mel Spectrogram?

  • A Mel Spectrogram visualizes the spectrum of frequencies in an audio signal over time, mapped to the Mel scale (a perceptual scale approximating human pitch perception).
  • Applications: Speech synthesis, music analysis, audio classification.

Generate a Mel Spectrogram with Librosa

Here’s how to compute and visualize a Mel Spectrogram using librosa:

import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np

# Load an audio file
audio_file = 'example.wav'
audio, sr = librosa.load(audio_file, sr=None)

# Compute Mel Spectrogram
mel_spec = librosa.feature.melspectrogram(audio, sr=sr, n_fft=2048, hop_length=512, n_mels=128)

# Convert to decibels for better visualization
mel_spec_db = librosa.power_to_db(mel_spec, ref=np.max)

# Plot the Mel Spectrogram
plt.figure(figsize=(10, 4))
librosa.display.specshow(mel_spec_db, sr=sr, hop_length=512, x_axis='time', y_axis='mel', cmap='viridis')
plt.colorbar(format='%+2.0f dB')
plt.title('Mel Spectrogram')
plt.tight_layout()
plt.show()
Enter fullscreen mode Exit fullscreen mode

Code Walkthrough: CLI Tool for Audio Manipulation

This section breaks down the essential components of the CLI tool for audio manipulation, including the setup, audio loading, effects application, saving processed audio, and error handling.


1. Command-Line Interface Setup

Here’s a basic template for setting up the CLI tool using argparse:

import argparse
import librosa
import soundfile as sf

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Audio Manipulation CLI Tool")
    parser.add_argument("input_file", type=str, help="Path to the input audio file")
    parser.add_argument("output_file", type=str, help="Path to save the processed audio file")
    parser.add_argument("--effect", type=str, choices=["speed", "pitch", "reverse", "echo"], required=True, help="Type of audio effect to apply")
    parser.add_argument("--value", type=float, default=1.0, help="Magnitude of the effect (e.g., speed factor, pitch steps, echo delay)")

    args = parser.parse_args()

    manipulate_audio(input=args.input_file, output=args.output_file, effect=args.effect, value=args.value)
Enter fullscreen mode Exit fullscreen mode

Arguments

  • input_file: Path to the input audio file.
  • output_file: Path to save the processed audio file.
  • --effect: Type of audio effect to apply (speed, pitch, reverse, or echo).
  • --value: Parameter controlling the magnitude of the effect (default = 1.0).

2. Loading Audio

The librosa library is used for audio loading and manipulation.

audio, sr = librosa.load(input_file, sr=None)
Enter fullscreen mode Exit fullscreen mode
  • audio: The waveform data of the audio file.
  • sr: The sampling rate of the audio file. By passing sr=None, the original sampling rate of the file is preserved.

3. Applying Effects

Effects are applied based on the value of the --effect argument.

Effect Options

Speed Adjustment

   if effect == 'speed':
       audio = librosa.effects.time_stretch(audio, rate=value)
Enter fullscreen mode Exit fullscreen mode
  • Increases or decreases playback speed using librosa.effects.time_stretch().
  • value: Speed factor (e.g., 1.5 for faster, 0.5 for slower).

Pitch Shifting

   elif effect == 'pitch':
       audio = librosa.effects.pitch_shift(audio, sr=sr, n_steps=value)
Enter fullscreen mode Exit fullscreen mode
  • Shifts the pitch of the audio using librosa.effects.pitch_shift().
  • value: Number of pitch steps to shift (positive for higher pitch, negative for lower).

Reversing Audio

   elif effect == 'reverse':
       audio = audio[::-1]
Enter fullscreen mode Exit fullscreen mode
  • Reverses the audio signal by flipping the waveform array.

Echo Effect

   elif effect == 'echo':
       echo = librosa.util.fix_length(audio, size=(len(audio) + int(sr * value)))
       echo[-len(audio):] += audio * 0.6
       audio = echo
Enter fullscreen mode Exit fullscreen mode
  • Creates an echo effect by extending the audio and blending delayed repetitions.
  • value: Delay duration (in seconds) for the echo.

4. Saving Processed Audio

The soundfile library (sf) is used to save the processed audio to the output file.

sf.write(output_file, audio, sr)
Enter fullscreen mode Exit fullscreen mode
  • output_file: File path to save the processed audio.
  • audio: The manipulated waveform data.
  • sr: The sampling rate.

5. Error Handling and Success Messages

To provide feedback, you can wrap the logic in a try-except block.

def manipulate_audio(input, output, effect, value):
    try:
        audio, sr= librosa.load(input, sr=None)
        if effect=="pitch":
            librosa.effects.pitch_shift(audio,sr=sr,n_steps=value)
        if effect=="rev":
            audio= audio[::-1]
        if effect=="echo":
            echo=librosa.util.fix_length(audio, size= len(audio)+ int(sr*value))
            echo[-len(audio):]+=0.6*audio
            audio=echo
        if effect=="speed":
            audio=librosa.effects.time_stretch(audio, rate=value)
        sf.write(output, audio, sr)
        print(f"Audio file saved with filename {output} with effect {effect}")

    except Exception as e:
        print(e)
Enter fullscreen mode Exit fullscreen mode

Features of Error Handling

  • Success Messages: Informs the user when the file is successfully processed.
  • Error Messages: Clearly communicates any issues during execution, such as invalid file paths or unsupported effects.

Run the CLI Tool

  1. Install dependencies:
   pip install librosa soundfile
Enter fullscreen mode Exit fullscreen mode
  1. Execute the tool:
   python audio_tool.py input.wav output.wav --effect pitch --value 2
Enter fullscreen mode Exit fullscreen mode

Complete Code is given below :

import argparse
import librosa
import soundfile as sf

def manipulate_audio(input, output, effect, value):
    try:
        audio, sr= librosa.load(input, sr=None)
        if effect=="pitch":
            librosa.effects.pitch_shift(audio,sr=sr,n_steps=value)
        if effect=="rev":
            audio= audio[::-1]
        if effect=="echo":
            echo=librosa.util.fix_length(audio, size= len(audio)+ int(sr*value))
            echo[-len(audio):]+=0.6*audio
            audio=echo
        if effect=="speed":
            audio=librosa.effects.time_stretch(audio, rate=value)
        sf.write(output, audio, sr)
        print(f"Audio file saved with filename {output} with effect {effect}")

    except Exception as e:
        print(e)


if __name__=="__main__":
    parser=argparse.ArgumentParser(description="hello")
    parser.add_argument("input_file", type=str, help="audio file name")
    parser.add_argument("output_file", type=str, help="audio file name")
    parser.add_argument("--effect", type=str, choices=["pitch", "rev", "echo", "speed"], required=True, help="effect type")
    parser.add_argument("--value", type=float, default=1.0, help="magnitude of the effect")
    args=parser.parse_args()
    manipulate_audio(input=args.input_file, output=args.output_file, effect=args.effect, value= args.value)
Enter fullscreen mode Exit fullscreen mode

Examples

Effect Command
Speed Up python audio_tool.py input.wav output_speed.wav --effect speed --value 1.5
Pitch Shift python audio_tool.py input.wav output_pitch.wav --effect pitch --value 3
Reverse Audio python audio_tool.py input.wav output_reverse.wav --effect reverse
Add Echo python audio_tool.py input.wav output_echo.wav --effect echo --value 0.5

What We Achieved Today

By the end of Day 2, participants gained:

  1. An understanding of audio basics (waveforms, frequency, sampling rate).
  2. Experience working with Librosa for audio loading, analysis, and manipulation.
  3. Skills to create CLI tools for automating tasks with user-friendly interfaces.
  4. Knowledge of applying audio effects like:
    • Speed adjustment
    • Pitch shifting
    • Reversing
    • Echo addition

Resources


Your Feedback Matters!

We’d love to hear your experiences and challenges. Share your questions or results in the comments. Happy coding! 🚀

Top comments (0)