DEV Community

Cover image for Simple Wikipedia Search App with Streamlit ๐Ÿ๐Ÿ•ธ๏ธ๐Ÿ’ป

Posted on

Simple Wikipedia Search App with Streamlit ๐Ÿ๐Ÿ•ธ๏ธ๐Ÿ’ป

Hey there! ๐Ÿ‘‹ I recently worked on a small project where I created a simple web app that lets you search for Wikipedia articles and display them in a chat-like interface. I used Streamlit to build the app and BeautifulSoup for web scraping. I wanted to share how I did it so you can try it out too!

Image description

What You Need

Before we dive in, make sure you have these Python libraries installed:

  • streamlit: To build the web app.
  • requests: To send requests to websites and get data.
  • beautifulsoup4: To scrape and parse the HTML content.

You can install them using pip:

pip install streamlit requests beautifulsoup4
Enter fullscreen mode Exit fullscreen mode

The Code Explained

1. Setting Up the App

First, I imported the necessary libraries and set up the basic configuration for the Streamlit app.

import streamlit as st
import requests
from bs4 import BeautifulSoup
import time
import random

st.set_page_config(page_title="WikiStream", page_icon="โ„น")
Enter fullscreen mode Exit fullscreen mode

2. Adding Themes and Chat Interface

I added an option in the sidebar for users to switch between Light and Dark themes. I also set up a basic chat interface where the user can enter a topic and see the responses.

theme = st.sidebar.selectbox("Choose a theme", ["Light", "Dark"])
if theme == "Dark":
    .stApp {
        background-color: #2b2b2b;
        color: white;
    """, unsafe_allow_html=True)

if 'messages' not in st.session_state:
    st.session_state.messages = []

for message in st.session_state.messages:
    with st.chat_message(message["role"]):
Enter fullscreen mode Exit fullscreen mode

3. Generating and Fetching Wikipedia Links

Next, I created a function to generate a Google search link based on the userโ€™s input. Then, I scraped the search results to find the actual Wikipedia link and fetched the content from that page.

def generate_link(prompt):
    if prompt:
        return "" + prompt.replace(" ", "+") + "+wiki"
        return None

def generating_wiki_link(link):
    res = requests.get(link)
    soup = BeautifulSoup(res.text, 'html.parser')
    for sp in soup.find_all("div"):
            link = sp.find('a').get('href')
            if ('' in link):
                actua_link = link[7:].split('&')[0]
                return scraping_data(actua_link)
Enter fullscreen mode Exit fullscreen mode

4. Scraping Wikipedia Content

This is where the content gets extracted from Wikipedia. I used BeautifulSoup to grab all the text from the page, clean it up, and display it at a speed chosen by the user.

def scraping_data(link):
    actual_link = link
    res = requests.get(actual_link)
    soup = BeautifulSoup(res.text, 'html.parser')
    corpus = ""
    for i in soup.find_all('p'):
        corpus += i.text
        corpus += '\n'
    corpus = corpus.strip()
    for i in range(1, 500):
        corpus = corpus.replace('[' + str(i) + ']', " ")

    speed = st.sidebar.slider("Text Speed", 0.1, 1.0, 0.2, 0.1)

    for i in corpus.split():
        yield i + " "
Enter fullscreen mode Exit fullscreen mode

5. Getting a Random Wikipedia Topic

I added a fun feature that lets you fetch a random Wikipedia article. Itโ€™s great for those moments when you just want to learn something new without having to think of a topic.

def get_random_wikipedia_topic():
    url = ""
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    return soup.find('h1', {'id': 'firstHeading'}).text
Enter fullscreen mode Exit fullscreen mode

6. Handling User Input and Displaying Content

Finally, I handled the user input and displayed the content in a chat-like interface. I also added options to clear the chat history and summarize the last response.

if st.sidebar.button("Get Random Wikipedia Topic"):
    random_topic = get_random_wikipedia_topic()
    st.sidebar.write(f"Random Topic: {random_topic}")
    prompt = random_topic

if prompt:
    link = generate_link(prompt)
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):

    with st.chat_message("assistant"):
        message_placeholder = st.empty()
        full_response = ""
        for chunk in generating_wiki_link(link):
            full_response += chunk
            message_placeholder.markdown(full_response + "โ–Œ")
    st.session_state.messages.append({"role": "assistant", "content": full_response})

if st.sidebar.button("Clear Chat History"):
    st.session_state.messages = []

if st.sidebar.button("Summarize Last Response"):
    if st.session_state.messages and st.session_state.messages[-1]["role"] == "assistant":
        last_response = st.session_state.messages[-1]["content"]
        summary = " ".join(last_response.split()[:50]) + "..."
        st.sidebar.markdown("### Summary")
Enter fullscreen mode Exit fullscreen mode

Click the link below to start exploring:

Check out the code behind Wiki-Fetch on GitHub!

Happy browsing! ๐Ÿ“š


And thatโ€™s it! ๐ŸŽ‰ Iโ€™ve built a simple yet functional Wikipedia search app using Streamlit and BeautifulSoup. This was a fun project to work on, and I hope you find it just as enjoyable to try out. If you have any questions or feedback, feel free to reach out. Happy coding! ๐Ÿš€

About Me:

Top comments (0)