Setup on Google Cloud Console
- Create a Project e.g My-Project
- Start your free trial on Google Cloud Platform by adding credit card because the credit card is required for a free trial
- Create or select a project
- Enable the Cloud Speech-to-Text API for that project.
- Create a service account. IAM -> Service Accounts -> Create Service Account
- Download a private key as JSON.
- Create a Bucket. Google Cloud Storage -> Browse
Installations on Local System
Run these command from a terminal
-
Install Google Cloud Storage gem
gem install google-cloud-storage
-
Install Google Cloud Speech gem
gem install google-cloud-speech
Install ffmpeg
sudo apt-get install ffmpeg (Ubuntu)
brew install ffmpeg (Mac)
Ruby Code
require "google/cloud/speech"
require "google/cloud/storage"
#Google cloud project id
project_id = "google_cloud_project_id"
#Downloaded key file
key_file = "file_name.json"
Convert video to aduio
Convert Video file to audio file using ffmpeg.
What is FLAC?
FLAC stands for Free Lossless Audio Codec, an audio format similar to MP3, but lossless, meaning that audio is compressed in FLAC without any loss in quality.
We will use both of these commands for better results.
system "ffmpeg -i video.mp4 aduio_temp.flac"
system "ffmpeg -i audio_temp.flac -ac 1 audio_final.flac"
Upload audio to Google Storage
First access project by Storage API. Then create a new file in the bucket which will be a copy of adio_final.flac.
Note: Here I'm using the first bucket on Google Cloud Storage. If you have more than one buckets then you can select any storage bucket that you want.
storage = Google::Cloud::Storage.new project: project_id, keyfile: key_file
bucket_name = storage.buckets.first.name
puts bucket_name
bucket = storage.bucket bucket_name
local_file_path = 'audio_final.flac'
file = bucket.create_file local_file_path, 'audio_cloud.flac'
puts "Uploaded #{file.name}"
Translate Audio to Text
Now we'll convert the audio that we uploaded on Cloud Storage to text.
Access that file in the following mentioned way gs://bucket-name/file-name. We can use different language_code as if we are using German-language video then "de-DE" etc
speech = Google::Cloud::Speech.new
storage_path = "gs://audio_bucket-1/audio_cloud.flac"
config = { encoding: :FLAC,
language_code: "en-US" }
audio = { uri: storage_path }
operation = speech.long_running_recognize config, audio
audio_text = ''
puts "Operation started"
if !operation.nil?
operation.wait_until_done!
raise operation.results.message if operation.error?
results = operation.response.results
results.each do |result|
audio_text << result.alternatives.first.transcript
puts "Transcription: #{result.alternatives.first.transcript}"
end
end
puts audio_text
Before running this script set the Google Application Credentials.
export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/file_name.json"
Now run this script.
ruby translate_video_to_text.rb
You can find the complete code on Github.
Check it on Github
Top comments (1)
Hello, I am using your code as my reference to get the transcription of a Japanese video? Unfortunately, I encountered some problems? Kindly visit this StackOverflow question I posted. stackoverflow.com/questions/678317...