Sunil Aleti

Posted on Mar 6, 2021

Building an Audio generator for DEV.to blogposts

#python #showdev #productivity

🎧 LISTEN TO THIS ARTICLE

I'm kind of a person who likes to listen rather than reading. And I feel my productivity increases when listening rather than reading. As dev.to don't have such audio feature(which will be helpful if they bring it natively).

So, I built a tool that takes input any dev.to blogpost URL and outputs Audio which you can also download.

Tool: https://audioblogs.herokuapp.com/

You can check the working of tool in the below video

The workflow of this tool is to scrape the article/blogpost and generate audio using gTTS module.

Modules used:

Requests - will allow us to send HTTP requests to get HTML pages
BeautifulSoup - will help us parse the HTML pages
gTTS - converts the text entered, into audio which can be saved as a mp3 file
Streamlit - A Python library that makes it easy to create and share beautiful, custom web apps

Let's begin:

import all necessary modules

import streamlit as st
import requests
from bs4 import BeautifulSoup
from gtts import gTTS

Getting the contents of a webpage into a variable

results = requests.get("https://dev.to/sunilaleti/building-a-pdf-locker-gui-application-4l67")

In order to make content easy to understand, we are using BeautifulSoup and the content is stored in soup variable

soup = BeautifulSoup(results.text, "html.parser")

We use soup.find to get name of the blogpost and you can get class names by inspecting elements

Article=soup.find("div",{"class":"crayons-article__header__meta"}).find('h1').get_text()

To get name of the Author

Author=soup.find("div",{"class":"crayons-article__subheader"}).find('a').get_text()

To get article content

text=soup.find("div", {"id": "article-body"}).find_all(['p','h1','h2','h3','h4','h5','h6','ol','ul'])

With the content we also get HTML tags. So we need to remove all HTML tags.

def remove_html_tags(text):
    for item in text:
        try:
            blog.append(item.get_text())
        except:
            pass

Now after scraping the content and cleaning, we need to create audio with the help of gTTS module.
gTTS module also supports other languages like French, Spanish etc..

from gtts import gTTS 

language = 'en'
myobj = gTTS(text=Text, lang=language, slow=False) 

# Saving the converted audio in an mp3 file name "Audio"
myobj.save("Audio.mp3")

This webapp is built using streamlit and deployed in herokuapp

Source Code:

import streamlit as st
import requests
from bs4 import BeautifulSoup
from gtts import gTTS 
def app():
    st.set_page_config(page_title="Audio Blog",page_icon="🎧")
    st.title("Generate Audio for dev.to blogposts")
    url=st.text_area("Enter any DEV.TO blog url ").strip()
    if st.button("submit"):
        if len(url)!=0:
            with st.spinner('Miracles take time to happen \n Just kidding 😂 \n Generating audio..'):
                results = requests.get(url)
                soup = BeautifulSoup(results.text, "html.parser") 
                blog=[]
                try:
                    Article=soup.find("div",{"class":"crayons-article__header__meta"}).find('h1').get_text()
                    #st.write(Article)
                    Author=soup.find("div",{"class":"crayons-article__subheader"}).find('a').get_text()
                    #st.write(Author)
                    intro="This blogpost {article} is written by {author}".format(article = Article, author = Author)
                    blog.append(intro)
                    text=soup.find("div", {"id": "article-body"}).find_all(['p','h1','h2','h3','h4','h5','h6','ol','ul'])
                    def remove_html_tags(text):
                        for item in text:
                            try:
                                blog.append(item.get_text())
                            except:
                                pass

                    remove_html_tags(text)
                    Text=""
                    for ele in blog:  
                            Text +=ele+" "

                    myobj = gTTS(text=Text, lang='en', slow=False) 

                    myobj.save("Audio.mp3")
                    audio_file = open('Audio.mp3', 'rb')
                    audio_bytes = audio_file.read()
                    st.success("Play or download the audio")
                    st.audio(audio_bytes, format='audio/mp3')
                except:
                    st.error("Enter a valid url")
        else:
            st.error("Enter a valid url")


if __name__ == "__main__":
    app()

lemme know if you have any issues and feel free to send pull request if you had anything to add

aletisunil / AudioBlog

Top comments (2)

vamsi pavan mahesh gunturu • Mar 7 '21

This is cool, but I am afraid this won't be helpful while reading dev.to posts. Because this tool ignores code blocks, even if it considers code blocks, I am not sure I can follow along by "listening" to code, but this will be super helpful if we can point this to news websites etc

Sunil Aleti • Mar 7 '21

I agree with you but still I can get a overview of what it is

DEV Community

Building an Audio generator for DEV.to blogposts

Modules used:

Source Code:

aletisunil / AudioBlog

Top comments (2)

Read next

How to Install Anytype on Linux

Vedro Hooks

7 Powerful Python Metaprogramming Techniques for Dynamic Code

BCEWithLogitsLoss in PyTorch