DEV Community

loading...

[Discuss] Using machine learning to process audio files?

jochemstoel profile image Jochem Stoel ・2 min read

I use Audacity and various software (VST plugins) that require a DAW/host that supports them to manually remove background noise from audio files (spoken recordings). By background noise I mean the subtle background static that you pretty much always get with affordable microphones and phones. By selecting a range of "silence" in the file (nobody talking, just noise) as a so called audio profile I can improve the results. I repeat the same procedure until the background noise level is entirely zero/gone, sometimes at cost of the voice. The quality of the result depends mostly on the quality of the file and the amount of noise.

Noise in this context does not mean random interruptions from the environment like traffic, it always refers to the static.

The results are often unusally good, it is amazing what these tools can do when you apply them correctly. Correctly however also means manually. The parameters to process the audio are a little different for each single file so in order to process audio in bulk I need to automate it somehow. I have been looking for command line utilities or libraries of some sorts to do that. There are some tools out there but none of them delivers results. I even tried loading vst plugins in a headless VST host with predefined parameters but this is hacky and crap.


PhonicMind

PhonicMind advertises itself as "Online AI Vocal Extractor", an online tool that extracts the vocal track from an audio file using "artificial intelligence" to improve results. It is a commercial product (you have to pay to use it) and there is a https://github.com/andabi/music-source-separation repository that separates singing voice from music based on deep neural networks in Tensorflow. I did not look into and do not care if the two are related.


Lightbulb

So, it follows with a lightbulb. Machine learning for background noise removal? I suspect (pretty much assume) that detecting and removing noise in bulk audio like that is much simpler than what PhonicMind attempts to do. Perhaps simple enough to do it myself.

How would I go about prototyping a solution? I have a general understanding of machine learning and know about TensorFlow but can't say I ever used it for something. If at all possible I prefer a solution that requires the least amount of studying the matter so if you happen to know of a command line utility or framework that I did not find then that is probably even better.

Thanks.

Discussion (2)

pic
Editor guide
Collapse
puritanic profile image
Darkø Tasevski

Spotify uses machine learning and algorithms to analyze music, and the way they are doing it is fascinating, maybe not what you're asking but this is an interesting read anyway:

medium.com/s/story/spotifys-discov...

Collapse
jochemstoel profile image
Jochem Stoel Author

You are right, not what I am looking for.