What is Whisper AI?
An automatic speech recognition system called Whisper was trained on 680,000 hours of supervised web-based multilingual and multitasking data. It was created by OpenAI, the same business that produced ChatGPT and DALLE. Whisper has speech recognition capabilities and the ability to multitask, so it can simultaneously create text from audio files or translate languages. Although it is still in development, it has the capacity to be an effective tool for numerous applications.
What is Google Colab?
Python code can be executed online for free using Google Colab. It is a cloud-based Jupyter Notebook environment that doesn't need to be installed. Colab provides a number of features, such as:
- The ability to run Python code in a web browser. This implies that you don't need to install any software on your computer in order to use Colab to develop and run Python programmes.
- Use of Google's cloud computing and storage capabilities. This means that you won't need to be concerned about your computer's resources when using Colab to run lengthy and intricate Python programmes.
- The ability to communicate and work together on initiatives. You can collaborate in real-time on projects by sharing your Colab notebooks with other users.
Why Google Colab?
For Whisper or other Python projects, you may prefer to use Google Colab rather than your personal computer for a number of reasons.
- Unlike owning and maintaining a machine, Google Colab is available for free.
- Google Colab provides access to strong GPUs that help speed up your Python projects including machine learning.
- Google Colab is accessible from everywhere because it is cloud-based.
- You can collaborate on projects with others in Google Colab's collaborative environment.
- The use of Google Colab does have some possible disadvantages, though.
- Google Colab occasionally runs slowly, especially when usage is at its highest.
- The storage capacity of Google Colab is constrained.
Step-By-Step Guide
Setup
The following command will download and install the most recent version of Whisper (or update to it):
!pip install git+https://github.com/openai/whisper.git
To update the package to the latest version, run:
!pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
Additionally, your system must have the ffmpeg command-line programme installed, which is accessible through most package managers:
!sudo apt update && sudo apt install ffmpeg
Usage(Command-line based)
Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
---|---|---|---|---|---|
tiny | 39 M | tiny.en |
tiny |
~1 GB | ~32x |
base | 74 M | base.en |
base |
~1 GB | ~16x |
small | 244 M | small.en |
small |
~2 GB | ~6x |
medium | 769 M | medium.en |
medium |
~5 GB | ~2x |
large | 1550 M | N/A | large |
~10 GB | 1x |
Recommended: medium
The following command will transcribe speech in audio files, using the medium model:
!whisper "[Add your audio file, Example: english.wav]" --model medium
The default setting (which selects the small model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the --language option:
!whisper "[Add your language-specific audio file, Example: japanese.wav]" --language [Add language, Example: Japanese]
Adding --task translate will translate the speech into English:
!whisper "[Add your language-specific audio file, Example: japanese.wav]" --language [Add language, Example: Japanese] --task translate
Run the following to view all available options:
!whisper --help
Outro
We appreciate you reading our blog post. I sincerely hope you found it useful and enlightening. Please feel free to leave any questions or comments in the space provided below. I'd be delighted to hear from you.
Please spread the word about this article to your followers and friends if you liked it.
Once more, thanks for reading! I value your assistance.
Top comments (0)