I'm kind of a person who likes to listen rather than reading. And I feel my productivity increases when listening rather than reading. As dev.to don't have such audio feature(which will be helpful if they bring it natively).
So, I built a tool that takes input any dev.to blogpost URL and outputs Audio which you can also download.
Tool: https://audioblogs.herokuapp.com/
You can check the working of tool in the below video
The workflow of this tool is to scrape the article/blogpost and generate audio using gTTS module.
Modules used:
- Requests - will allow us to send HTTP requests to get HTML pages
- BeautifulSoup - will help us parse the HTML pages
- gTTS - converts the text entered, into audio which can be saved as a mp3 file
- Streamlit - A Python library that makes it easy to create and share beautiful, custom web apps
Let's begin:
import all necessary modules
import streamlit as st
import requests
from bs4 import BeautifulSoup
from gtts import gTTS
Getting the contents of a webpage into a variable
results = requests.get("https://dev.to/sunilaleti/building-a-pdf-locker-gui-application-4l67")
In order to make content easy to understand, we are using BeautifulSoup and the content is stored in soup variable
soup = BeautifulSoup(results.text, "html.parser")
We use soup.find to get name of the blogpost and you can get class names by inspecting elements
Article=soup.find("div",{"class":"crayons-article__header__meta"}).find('h1').get_text()
To get name of the Author
Author=soup.find("div",{"class":"crayons-article__subheader"}).find('a').get_text()
To get article content
text=soup.find("div", {"id": "article-body"}).find_all(['p','h1','h2','h3','h4','h5','h6','ol','ul'])
With the content we also get HTML tags. So we need to remove all HTML tags.
def remove_html_tags(text):
for item in text:
try:
blog.append(item.get_text())
except:
pass
Now after scraping the content and cleaning, we need to create audio with the help of gTTS module.
gTTS module also supports other languages like French, Spanish etc..
from gtts import gTTS
language = 'en'
myobj = gTTS(text=Text, lang=language, slow=False)
# Saving the converted audio in an mp3 file name "Audio"
myobj.save("Audio.mp3")
This webapp is built using streamlit and deployed in herokuapp
Source Code:
import streamlit as st
import requests
from bs4 import BeautifulSoup
from gtts import gTTS
def app():
st.set_page_config(page_title="Audio Blog",page_icon="🎧")
st.title("Generate Audio for dev.to blogposts")
url=st.text_area("Enter any DEV.TO blog url ").strip()
if st.button("submit"):
if len(url)!=0:
with st.spinner('Miracles take time to happen \n Just kidding 😂 \n Generating audio..'):
results = requests.get(url)
soup = BeautifulSoup(results.text, "html.parser")
blog=[]
try:
Article=soup.find("div",{"class":"crayons-article__header__meta"}).find('h1').get_text()
#st.write(Article)
Author=soup.find("div",{"class":"crayons-article__subheader"}).find('a').get_text()
#st.write(Author)
intro="This blogpost {article} is written by {author}".format(article = Article, author = Author)
blog.append(intro)
text=soup.find("div", {"id": "article-body"}).find_all(['p','h1','h2','h3','h4','h5','h6','ol','ul'])
def remove_html_tags(text):
for item in text:
try:
blog.append(item.get_text())
except:
pass
remove_html_tags(text)
Text=""
for ele in blog:
Text +=ele+" "
myobj = gTTS(text=Text, lang='en', slow=False)
myobj.save("Audio.mp3")
audio_file = open('Audio.mp3', 'rb')
audio_bytes = audio_file.read()
st.success("Play or download the audio")
st.audio(audio_bytes, format='audio/mp3')
except:
st.error("Enter a valid url")
else:
st.error("Enter a valid url")
if __name__ == "__main__":
app()
lemme know if you have any issues and feel free to send pull request if you had anything to add
Top comments (2)
This is cool, but I am afraid this won't be helpful while reading dev.to posts. Because this tool ignores code blocks, even if it considers code blocks, I am not sure I can follow along by "listening" to code, but this will be super helpful if we can point this to news websites etc
I agree with you but still I can get a overview of what it is