DEV Community

loading...
Cover image for Building an Audio generator  for DEV.to blogposts

Building an Audio generator for DEV.to blogposts

sunilaleti profile image Sunil Aleti ・3 min read
🎧   LISTEN TO THIS ARTICLE

I'm kind of a person who likes to listen rather than reading. And I feel my productivity increases when listening rather than reading. As dev.to don't have such audio feature(which will be helpful if they bring it natively).

So, I built a tool that takes input any dev.to blogpost URL and outputs Audio which you can also download.

Tool: https://audioblogs.herokuapp.com/

You can check the working of tool in the below video


The workflow of this tool is to scrape the article/blogpost and generate audio using gTTS module.

Modules used:

  • Requests - will allow us to send HTTP requests to get HTML pages
  • BeautifulSoup - will help us parse the HTML pages
  • gTTS - converts the text entered, into audio which can be saved as a mp3 file
  • Streamlit - A Python library that makes it easy to create and share beautiful, custom web apps

Let's begin:

import all necessary modules

import streamlit as st
import requests
from bs4 import BeautifulSoup
from gtts import gTTS 
Enter fullscreen mode Exit fullscreen mode

Getting the contents of a webpage into a variable

results = requests.get("https://dev.to/sunilaleti/building-a-pdf-locker-gui-application-4l67")
Enter fullscreen mode Exit fullscreen mode

In order to make content easy to understand, we are using BeautifulSoup and the content is stored in soup variable

soup = BeautifulSoup(results.text, "html.parser")
Enter fullscreen mode Exit fullscreen mode

We use soup.find to get name of the blogpost and you can get class names by inspecting elements

Article=soup.find("div",{"class":"crayons-article__header__meta"}).find('h1').get_text()
Enter fullscreen mode Exit fullscreen mode

To get name of the Author

Author=soup.find("div",{"class":"crayons-article__subheader"}).find('a').get_text()
Enter fullscreen mode Exit fullscreen mode

To get article content

text=soup.find("div", {"id": "article-body"}).find_all(['p','h1','h2','h3','h4','h5','h6','ol','ul'])
Enter fullscreen mode Exit fullscreen mode

With the content we also get HTML tags. So we need to remove all HTML tags.

def remove_html_tags(text):
    for item in text:
        try:
            blog.append(item.get_text())
        except:
            pass
Enter fullscreen mode Exit fullscreen mode

Now after scraping the content and cleaning, we need to create audio with the help of gTTS module.
gTTS module also supports other languages like French, Spanish etc..

from gtts import gTTS 

language = 'en'
myobj = gTTS(text=Text, lang=language, slow=False) 

# Saving the converted audio in an mp3 file name "Audio"
myobj.save("Audio.mp3") 
Enter fullscreen mode Exit fullscreen mode

This webapp is built using streamlit and deployed in herokuapp

Source Code:

import streamlit as st
import requests
from bs4 import BeautifulSoup
from gtts import gTTS 
def app():
    st.set_page_config(page_title="Audio Blog",page_icon="🎧")
    st.title("Generate Audio for dev.to blogposts")
    url=st.text_area("Enter any DEV.TO blog url ").strip()
    if st.button("submit"):
        if len(url)!=0:
            with st.spinner('Miracles take time to happen \n Just kidding 😂 \n Generating audio..'):
                results = requests.get(url)
                soup = BeautifulSoup(results.text, "html.parser") 
                blog=[]
                try:
                    Article=soup.find("div",{"class":"crayons-article__header__meta"}).find('h1').get_text()
                    #st.write(Article)
                    Author=soup.find("div",{"class":"crayons-article__subheader"}).find('a').get_text()
                    #st.write(Author)
                    intro="This blogpost {article} is written by {author}".format(article = Article, author = Author)
                    blog.append(intro)
                    text=soup.find("div", {"id": "article-body"}).find_all(['p','h1','h2','h3','h4','h5','h6','ol','ul'])
                    def remove_html_tags(text):
                        for item in text:
                            try:
                                blog.append(item.get_text())
                            except:
                                pass

                    remove_html_tags(text)
                    Text=""
                    for ele in blog:  
                            Text +=ele+" "

                    myobj = gTTS(text=Text, lang='en', slow=False) 

                    myobj.save("Audio.mp3")
                    audio_file = open('Audio.mp3', 'rb')
                    audio_bytes = audio_file.read()
                    st.success("Play or download the audio")
                    st.audio(audio_bytes, format='audio/mp3')
                except:
                    st.error("Enter a valid url")
        else:
            st.error("Enter a valid url")


if __name__ == "__main__":
    app()
Enter fullscreen mode Exit fullscreen mode

lemme know if you have any issues and feel free to send pull request if you had anything to add

Discussion (2)

pic
Editor guide
Collapse
gvpmahesh profile image
vamsi pavan mahesh gunturu

This is cool, but I am afraid this won't be helpful while reading dev.to posts. Because this tool ignores code blocks, even if it considers code blocks, I am not sure I can follow along by "listening" to code, but this will be super helpful if we can point this to news websites etc

Collapse
sunilaleti profile image
Sunil Aleti Author

I agree with you but still I can get a overview of what it is