I recently published a Python package that ingests a text file, and creates a text-to-speech rendering of that text using AWS Polly. Part of the challenge is that there is a character limit for each generated recording, so simply slicing up a long string into 250 character chunks could work, but then you run the risk of breaking up words, and when the audio chunks are reassembled, a word like "the" become "tuh" and "he" when heard across this boundary. This package was partly written to handle this kind of behavior and trim as needed (possibly creating shorter and longer chunks, rather than split evenly).
After installing the package:
pip3 install polly-textfile-cli
and running something like:
polly-textfile --path input.txt --name output-name
inside the package, the textfile is broken into individual words into a list:
def fileChunkList(filePath, limit):
with open(filePath, 'r') as file:
data = file.read().replace('\n','')
#lines = [data[i:i+limit] for i in range(0, len(data), limit)]
lines_in = data.split(" ")
lines = constructSentences(lines_in,limit)
return lines
and then lines = constructSentences(lines_in,limit)
is used to reconstruct each segment to be rendered by Polly into audio:
def constructSentences(words,limit):
ss = []
s = []
for w in words:
if len(w) + len(" ".join(s)) <= limit:
s.append(w)
else:
sentence = " ".join(s)
ss.append(sentence)
s = []
s.append(w)
return ss
So if limit
is 250, before adding a new word to a "sentence" (a string that has a max length of limit
) it checks if length would be exceeded, and if it does, the sentence is added to the list, and a new one started. This is the lines
list in fileChunkList()
, which ends up being the text script for the recordings created in the next function:
def createChunkAudio(id, linesList):
parts = len(linesList)
partsIdList = []
for i in range(1, parts):
resp = streamAudio(linesList[i-1])
stream = resp['AudioStream']._raw_stream
with FileIO("%s-part-%s.mp3" % (id,i), 'w') as file:
for i in stream:
file.write(i)
partsIdList.append(file.name)
return partsIdList
where, for each 250-max-length item in the lines
list, an mp3 file is created (i.e. ${whatever-output-name}-part-1.mp3
) from it by passing it to the streamAudio()
function in the above loop, which is just the one-off call to Polly to create the audio stream:
def streamAudio(inString):
polly = client("polly", "us-east-2")
response = polly.synthesize_speech(
Text=inString,
OutputFormat="mp3",
VoiceId="Matthew")
return response
At this point, you've created, for example for a textfile that needed to be split across 3 segments, mp3 files like output-part-1.mp3
, output-part-2.mp3
, and output-part-3.mp3
, which is not terribly convenient, so the last step is to combine them, using the list of paths for the audio chunks the above functions created:
def concatPartsAudio(pathList, id):
print(pathList)
cmdStr = "concat:"
for p in pathList:
if pathList[-1] == p:
cmdStr = cmdStr + "%s" % (p)
else:
cmdStr = cmdStr + "%s|" % (p)
print(cmdStr)
concat = os.system("ffmpeg -i '%s' -acodec copy '%s.mp3'" % (cmdStr, id))
s = os.system("stat %s.mp3" % (id))
return s
this could be done any number of ways, depending on your preferred audio output settings, but in the simplest format, we're just concatenating each of the files into output.mp3
.
Top comments (0)