loading...
Cover image for Add AWS Transcribe to Spring boot App

Add AWS Transcribe to Spring boot App

balvinder294 profile image Balvinder Singh Originally published at tekraze.com ・5 min read

Amazon Transcribe uses a deep learning process called automatic speech recognition (ASR) to convert speech to text quickly and accurately. Amazon Transcribe can be used to transcribe customer service calls, to automate closed captioning and subtitling, and to generate metadata for media assets to create a fully searchable archive. Check here.

| Also Read | Add Amazon Comprehend To Spring Boot

Steps for Integration

1.Create A transcribe Account and setup credentials

2.Setup SDK for Transcribe and S3(Required for upload)

<!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-s3 -->
<dependency>
    <groupId>com.amazonaws</groupId>
    <artifactId>aws-java-sdk-s3</artifactId>
    <version>1.11.759</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-transcribe -->
<dependency>
    <groupId>com.amazonaws</groupId>
    <artifactId>aws-java-sdk-transcribe</artifactId>
    <version>1.11.759</version>
</dependency>

3. Create A java Service file to add code

4. Initialize Clients for Transcribe and S3

AmazonTranscribe transcribeClient() {
log.debug("Intialize Transcribe Client");
BasicAWSCredentials awsCreds = new BasicAWSCredentials(awsAccessKey, awsSecretKey);
AWSStaticCredentialsProvider awsStaticCredentialsProvider = new AWSStaticCredentialsProvider(awsCreds);
return AmazonTranscribeClientBuilder.standard().withCredentials(awsStaticCredentialsProvider)
.withRegion(awsRegion).build();
}
AmazonS3 s3Client() {
log.debug("Intialize AWS S3 Client");
BasicAWSCredentials awsCreds = new BasicAWSCredentials(awsAccessKey, awsSecretKey);
AWSStaticCredentialsProvider awsStaticCredentialsProvider = new AWSStaticCredentialsProvider(awsCreds);
return AmazonS3ClientBuilder.standard().withCredentials(awsStaticCredentialsProvider).withRegion(awsRegion)
.build();
}
  1. File upload/delete methods for S3. Skip if you want to use the file present on S3
    public void uploadFileToAwsBucket(MultipartFile file) {
    log.debug("Upload file to AWS Bucket {}", file);
    String key = file.getOriginalFilename().replaceAll(" ", "").toLowerCase();
    try {
    s3Client().putObject(bucketName, key, file.getInputStream(), null);
    } catch (SdkClientException | IOException e) {
    e.printStackTrace();
    }
    }
    public void deleteFileFromAwsBucket(String fileName) {
    log.debug("Delete File from AWS Bucket {}", fileName);
    String key = fileName.replaceAll(" ", "").toLowerCase();
    s3Client().deleteObject(bucketName, key);
    }
    | Also Read | Things you need to know to be a Developer
    6. Start Transcription Process method
    StartTranscriptionJobResult startTranscriptionJob(String key) {
    log.debug("Start Transcription Job By Key {}",key);
    Media media = new Media().withMediaFileUri(s3Client().getUrl(bucketName, key).toExternalForm());
    String jobName = key.concat(RandomString.make());
    StartTranscriptionJobRequest startTranscriptionJobRequest = new StartTranscriptionJobRequest()
    .withLanguageCode(LanguageCode.EnUS).withTranscriptionJobName(jobName).withMedia(media);
    StartTranscriptionJobResult startTranscriptionJobResult = transcribeClient()
    .startTranscriptionJob(startTranscriptionJobRequest);
    return startTranscriptionJobResult;
    }

    7. Get Transcription Job Results Method

GetTranscriptionJobResult getTranscriptionJobResult(String jobName) {
log.debug("Get Transcription Job Result By Job Name : {}",jobName);
GetTranscriptionJobRequest getTranscriptionJobRequest = new GetTranscriptionJobRequest()
.withTranscriptionJobName(jobName);
Boolean resultFound = false;
TranscriptionJob transcriptionJob = new TranscriptionJob();
GetTranscriptionJobResult getTranscriptionJobResult = new GetTranscriptionJobResult();
while (resultFound == false) {
getTranscriptionJobResult = transcribeClient().getTranscriptionJob(getTranscriptionJobRequest);
transcriptionJob = getTranscriptionJobResult.getTranscriptionJob();
if (transcriptionJob.getTranscriptionJobStatus()
.equalsIgnoreCase(TranscriptionJobStatus.COMPLETED.name())) {
return getTranscriptionJobResult;
} else if (transcriptionJob.getTranscriptionJobStatus()
.equalsIgnoreCase(TranscriptionJobStatus.FAILED.name())) {
return null;
} else if (transcriptionJob.getTranscriptionJobStatus()
.equalsIgnoreCase(TranscriptionJobStatus.IN_PROGRESS.name())) {
try {
Thread.sleep(15000);
} catch (InterruptedException e) {
log.debug("Interrupted Exception {}", e.getMessage());
}
}
}
return getTranscriptionJobResult;
}

8. Download Transcription Result method to fetch result from URI

TranscriptionResponseDTO downloadTranscriptionResponse(String uri){
log.debug("Download Transcription Result from Transcribe URi {}", uri);
OkHttpClient okHttpClient = new OkHttpClient()
.newBuilder()
.connectTimeout(60, TimeUnit.SECONDS)
.writeTimeout(60, TimeUnit.SECONDS)
.readTimeout(60, TimeUnit.SECONDS)
.build();
Request request = new Request.Builder().url(uri).build();
Response response;
try {
response = okHttpClient.newCall(request).execute();
String body = response.body().string();

9. Delete the Transcription Job Method. To delete after processing is done, or it will automatically get deleted after 90 days.

void deleteTranscriptionJob(String jobName) {
log.debug("Delete Transcription Job from amazon Transcribe {}",jobName);
DeleteTranscriptionJobRequest deleteTranscriptionJobRequest = new DeleteTranscriptionJobRequest()
.withTranscriptionJobName(jobName);
transcribeClient().deleteTranscriptionJob(deleteTranscriptionJobRequest);
}

10. Combined method ExtractSpeechToText to Get Result of Transcription As a DTO

public TranscriptionResponseDTO extractSpeechTextFromVideo(MultipartFile file) {
log.debug("Request to extract Speech Text from Video : {}",file);
uploadFileToAwsBucket(file);
String key = file.getOriginalFilename().replaceAll(" ", "_").toLowerCase();
StartTranscriptionJobResult startTranscriptionJobResult = startTranscriptionJob(key);
String transcriptionJobName = startTranscriptionJobResult.getTranscriptionJob().getTranscriptionJobName();
GetTranscriptionJobResult getTranscriptionJobResult = getTranscriptionJobResult(transcriptionJobName);
deleteFileFromAwsBucket(key);
String transcriptFileUriString = getTranscriptionJobResult.getTranscriptionJob().getTranscript().getTranscriptFileUri();
TranscriptionResponseDTO transcriptionResponseDTO = downloadTranscriptionResponse(transcriptFileUriString);
deleteTranscriptionJob(transcriptionJobName);
return transcriptionResponseDTO;
}
11. Now, you can use the above methods to get your video/audio file processed and get the text from Speech. The complete code Link is Here >>> Link to Gist

and for the response DTO check Link to Gist

Some references were taken from Edgardo Genini comment on StackOverflow here

| Also Read | Text Editors for Code

I hope the code helps you, if yes please do share your support by Writing in the comments below.Thanks for reading.

Originally published At Tekraze.com

Posted on by:

balvinder294 profile

Balvinder Singh

@balvinder294

Full Stack Developer and DevOps working remotely in Dehaze.io. Founder and Blogger at Tekraze.com, here to share my journey of code and experiences to help out the coders to give back to dev communiry

Discussion

pic
Editor guide