git-leo-here

Posted on Dec 16, 2024

TDoC 2024 - Day 2 : Basics of Audio Processing, Mel Spectrograms, and Librosa

#machinelearning #tutorial #beginners

TDoC 2024 - Day 2: Introduction to CLI Tools and Audio Processing

Overview

Welcome to Day 2 of TDoC 2024! Today, we explored command-line interface (CLI) tools and audio processing fundamentals, including the creation of CLI tools using argparse, and working with the numpy and librosa libraries to manipulate audio files. Below is a detailed walkthrough of the concepts, code implementations, and applications covered during this session.

What Are CLI Tools and Why Are They Important?

Definition

A Command-Line Interface (CLI) is a text-based interface where users can issue commands to perform specific tasks.

Advantages of CLI Tools

Lightweight: No graphical interface overhead.
Flexible: Perfect for automation and batch processing.
Efficient: Faster for experienced users compared to GUIs.

Examples of Popular CLI Tools

git
curl
pip

Basics of Command-Line Argument Parsing

Command-line arguments are parameters passed to a script during execution.

For example:

python script.py --input "data.txt" --output "result.txt"

--input and --output: Options.
"data.txt" and "result.txt": Values for the respective options.

Popular Python Libraries for Parsing Command-Line Arguments:

argparse: Standard library module for robust CLI tool creation.
click: Simplifies parsing with decorators and better user experience.
typer: A modern library built on click, ideal for rapid development.

Creating a CLI Tool with `argparse`

Below is a basic template for building a CLI tool using argparse:

import argparse

def main():
    parser = argparse.ArgumentParser(description="A sample CLI tool.")
    parser.add_argument("--input", type=str, required=True, help="Path to the input file.")
    parser.add_argument("--output", type=str, required=True, help="Path to save the output file.")

    args = parser.parse_args()

    # Access arguments
    print(f"Input File: {args.input}")
    print(f"Output File: {args.output}")

if __name__ == "__main__":
    main()

Run the Script

python script.py --input "data.txt" --output "result.txt"

Audio Processing Basics

Definition

Audio processing involves the analysis and manipulation of sound signals. Applications include:

Speech synthesis
Music production
Machine learning (e.g., voice recognition, audio classification)

Key Operations

Time Domain Analysis: Examining the waveform of audio signals over time.
Frequency Domain Analysis: Decomposing signals into their frequency components (e.g., Fourier Transforms).
Effects and Transformations: Modifying audio properties like speed, pitch, and reverberation.

Mel Spectrograms

What is a Mel Spectrogram?

A Mel Spectrogram visualizes the spectrum of frequencies in an audio signal over time, mapped to the Mel scale (a perceptual scale approximating human pitch perception).
Applications: Speech synthesis, music analysis, audio classification.

Generate a Mel Spectrogram with Librosa

Here’s how to compute and visualize a Mel Spectrogram using librosa:

import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np

# Load an audio file
audio_file = 'example.wav'
audio, sr = librosa.load(audio_file, sr=None)

# Compute Mel Spectrogram
mel_spec = librosa.feature.melspectrogram(audio, sr=sr, n_fft=2048, hop_length=512, n_mels=128)

# Convert to decibels for better visualization
mel_spec_db = librosa.power_to_db(mel_spec, ref=np.max)

# Plot the Mel Spectrogram
plt.figure(figsize=(10, 4))
librosa.display.specshow(mel_spec_db, sr=sr, hop_length=512, x_axis='time', y_axis='mel', cmap='viridis')
plt.colorbar(format='%+2.0f dB')
plt.title('Mel Spectrogram')
plt.tight_layout()
plt.show()

Code Walkthrough: CLI Tool for Audio Manipulation

This section breaks down the essential components of the CLI tool for audio manipulation, including the setup, audio loading, effects application, saving processed audio, and error handling.

1. Command-Line Interface Setup

Here’s a basic template for setting up the CLI tool using argparse:

import argparse
import librosa
import soundfile as sf

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Audio Manipulation CLI Tool")
    parser.add_argument("input_file", type=str, help="Path to the input audio file")
    parser.add_argument("output_file", type=str, help="Path to save the processed audio file")
    parser.add_argument("--effect", type=str, choices=["speed", "pitch", "reverse", "echo"], required=True, help="Type of audio effect to apply")
    parser.add_argument("--value", type=float, default=1.0, help="Magnitude of the effect (e.g., speed factor, pitch steps, echo delay)")

    args = parser.parse_args()

    manipulate_audio(input=args.input_file, output=args.output_file, effect=args.effect, value=args.value)

Arguments

input_file: Path to the input audio file.
output_file: Path to save the processed audio file.
--effect: Type of audio effect to apply (speed, pitch, reverse, or echo).
--value: Parameter controlling the magnitude of the effect (default = 1.0).

2. Loading Audio

The librosa library is used for audio loading and manipulation.

audio, sr = librosa.load(input_file, sr=None)

audio: The waveform data of the audio file.
sr: The sampling rate of the audio file. By passing sr=None, the original sampling rate of the file is preserved.

3. Applying Effects

Effects are applied based on the value of the --effect argument.

Effect Options

Speed Adjustment

   if effect == 'speed':
       audio = librosa.effects.time_stretch(audio, rate=value)

Increases or decreases playback speed using librosa.effects.time_stretch().
value: Speed factor (e.g., 1.5 for faster, 0.5 for slower).

Pitch Shifting

   elif effect == 'pitch':
       audio = librosa.effects.pitch_shift(audio, sr=sr, n_steps=value)

Shifts the pitch of the audio using librosa.effects.pitch_shift().
value: Number of pitch steps to shift (positive for higher pitch, negative for lower).

Reversing Audio

   elif effect == 'reverse':
       audio = audio[::-1]

Reverses the audio signal by flipping the waveform array.

Echo Effect

   elif effect == 'echo':
       echo = librosa.util.fix_length(audio, size=(len(audio) + int(sr * value)))
       echo[-len(audio):] += audio * 0.6
       audio = echo

Creates an echo effect by extending the audio and blending delayed repetitions.
value: Delay duration (in seconds) for the echo.

4. Saving Processed Audio

The soundfile library (sf) is used to save the processed audio to the output file.

sf.write(output_file, audio, sr)

output_file: File path to save the processed audio.
audio: The manipulated waveform data.
sr: The sampling rate.

5. Error Handling and Success Messages

To provide feedback, you can wrap the logic in a try-except block.

def manipulate_audio(input, output, effect, value):
    try:
        audio, sr= librosa.load(input, sr=None)
        if effect=="pitch":
            librosa.effects.pitch_shift(audio,sr=sr,n_steps=value)
        if effect=="rev":
            audio= audio[::-1]
        if effect=="echo":
            echo=librosa.util.fix_length(audio, size= len(audio)+ int(sr*value))
            echo[-len(audio):]+=0.6*audio
            audio=echo
        if effect=="speed":
            audio=librosa.effects.time_stretch(audio, rate=value)
        sf.write(output, audio, sr)
        print(f"Audio file saved with filename {output} with effect {effect}")

    except Exception as e:
        print(e)

Features of Error Handling

Success Messages: Informs the user when the file is successfully processed.
Error Messages: Clearly communicates any issues during execution, such as invalid file paths or unsupported effects.

Run the CLI Tool

Install dependencies:

   pip install librosa soundfile

Execute the tool:

   python audio_tool.py input.wav output.wav --effect pitch --value 2

Complete Code is given below :

import argparse
import librosa
import soundfile as sf

def manipulate_audio(input, output, effect, value):
    try:
        audio, sr= librosa.load(input, sr=None)
        if effect=="pitch":
            librosa.effects.pitch_shift(audio,sr=sr,n_steps=value)
        if effect=="rev":
            audio= audio[::-1]
        if effect=="echo":
            echo=librosa.util.fix_length(audio, size= len(audio)+ int(sr*value))
            echo[-len(audio):]+=0.6*audio
            audio=echo
        if effect=="speed":
            audio=librosa.effects.time_stretch(audio, rate=value)
        sf.write(output, audio, sr)
        print(f"Audio file saved with filename {output} with effect {effect}")

    except Exception as e:
        print(e)


if __name__=="__main__":
    parser=argparse.ArgumentParser(description="hello")
    parser.add_argument("input_file", type=str, help="audio file name")
    parser.add_argument("output_file", type=str, help="audio file name")
    parser.add_argument("--effect", type=str, choices=["pitch", "rev", "echo", "speed"], required=True, help="effect type")
    parser.add_argument("--value", type=float, default=1.0, help="magnitude of the effect")
    args=parser.parse_args()
    manipulate_audio(input=args.input_file, output=args.output_file, effect=args.effect, value= args.value)

Examples

Effect	Command
Speed Up	`python audio_tool.py input.wav output_speed.wav --effect speed --value 1.5`
Pitch Shift	`python audio_tool.py input.wav output_pitch.wav --effect pitch --value 3`
Reverse Audio	`python audio_tool.py input.wav output_reverse.wav --effect reverse`
Add Echo	`python audio_tool.py input.wav output_echo.wav --effect echo --value 0.5`

What We Achieved Today

By the end of Day 2, participants gained:

An understanding of audio basics (waveforms, frequency, sampling rate).
Experience working with Librosa for audio loading, analysis, and manipulation.
Skills to create CLI tools for automating tasks with user-friendly interfaces.
Knowledge of applying audio effects like:
- Speed adjustment
- Pitch shifting
- Reversing
- Echo addition

Resources

Your Feedback Matters!

We’d love to hear your experiences and challenges. Share your questions or results in the comments. Happy coding! 🚀

DEV Community

TDoC 2024 - Day 2 : Basics of Audio Processing, Mel Spectrograms, and Librosa

TDoC 2024 - Day 2: Introduction to CLI Tools and Audio Processing

Overview

What Are CLI Tools and Why Are They Important?

Definition

Advantages of CLI Tools

Examples of Popular CLI Tools

Basics of Command-Line Argument Parsing

Popular Python Libraries for Parsing Command-Line Arguments:

Creating a CLI Tool with `argparse`

Run the Script

Audio Processing Basics

Definition

Key Operations

Mel Spectrograms

What is a Mel Spectrogram?

Generate a Mel Spectrogram with Librosa

Code Walkthrough: CLI Tool for Audio Manipulation

1. Command-Line Interface Setup

Arguments

2. Loading Audio

3. Applying Effects

Effect Options

4. Saving Processed Audio

5. Error Handling and Success Messages

Features of Error Handling

Run the CLI Tool

Examples

What We Achieved Today

Resources

Your Feedback Matters!

Top comments (0)

Read next

Circle Cursor Js #GSAP

AI Models Can Now Self-Improve Through Structured Multi-Agent Debates

Building The World's Best Image Diffusion Model

Using React as Static Files in Django: Step-by-Step Guide

TDoC 2024 - Day 2: Introduction to CLI Tools and Audio Processing

Overview

What Are CLI Tools and Why Are They Important?

Definition

Advantages of CLI Tools

Examples of Popular CLI Tools

Basics of Command-Line Argument Parsing

Popular Python Libraries for Parsing Command-Line Arguments:

Creating a CLI Tool with argparse

Run the Script

Audio Processing Basics

Definition

Key Operations

Mel Spectrograms

What is a Mel Spectrogram?

Generate a Mel Spectrogram with Librosa

Code Walkthrough: CLI Tool for Audio Manipulation

1. Command-Line Interface Setup

Arguments

2. Loading Audio

3. Applying Effects

Effect Options

4. Saving Processed Audio

5. Error Handling and Success Messages

Features of Error Handling

Run the CLI Tool

Examples

What We Achieved Today

Resources

Your Feedback Matters!

Read next

Circle Cursor Js #GSAP

AI Models Can Now Self-Improve Through Structured Multi-Agent Debates

Building The World's Best Image Diffusion Model

Using React as Static Files in Django: Step-by-Step Guide

Creating a CLI Tool with `argparse`