DEV Community

Somil Gupta for AWS Community Builders

Posted on • Edited on

How I Used Bedrock Agents to Create a Tool — Medium2Markdown

Screenshot of the tool

Recently, I faced a problem when I was creating my personal blog which was using Markdown for all my written content. Every blog was on Medium, and it was taking a lot of time to convert those blogs to Markdown files. Hence, I worked on this project.

This application is a simple way to generate markdown files using your blogs on Medium. This tool provides a solution for fetching HTML content from URLs and converting it to Markdown format using AWS Bedrock agents with any FM of your choice. It consists of two main components:

  1. A Flask-based API for fetching HTML content
  2. A Next.js application (using App Router) that handles the Bedrock integration for converting HTML to Markdown

Step 1 — Flask API for HTML Fetching

We are using Selenium to scrape the HTML from the webpage because the Medium ‘GET’ call for a medium-story page returns a partial result that doesn’t contain complete story content. So we render the page in headless Chrome, wait for it to load, and then get the HTML page. This will allow us to get the complete story in the HTML file.

Below is the simple code using Selenium and Flask to create a simple API that takes in the URL and returns the HTML body.



from flask import Flask, jsonify, request
from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
import time

app = Flask(__name__)


def get_website_html(url):
    # Set up headless Chrome options
    chrome_options = Options()
    # chrome_options.add_argument("--headless")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")

    # Initialize the WebDriver with ChromeDriverManager
    driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()), options=chrome_options)
    driver.get(url)

    print("Pausing for 5 seconds to allow the page to load...")
    time.sleep(5)  # Pause for 5 seconds

    # Get the HTML content of the page
    html_content = driver.page_source

    # Close the WebDriver
    driver.quit()

    return html_content


@app.route('/get_html', methods=['POST'])
def get_html():
    print('Fetching the HTML content of a website...')

    # Get the URL from the request body
    data = request.get_json()
    if not data or 'url' not in data:
        return jsonify({'error': 'URL is required in the request body'}), 400

    url = data['url']
    html_content = get_website_html(url)

    # Parse the HTML content with BeautifulSoup
    soup = BeautifulSoup(html_content, 'html.parser')

    # Return the HTML content as a JSON response
    return jsonify({'html': str(soup)})


if __name__ == '__main__':
    app.run(debug=True)


Enter fullscreen mode Exit fullscreen mode

You can host this flask server using an AWS EC2 instance or any other cloud provider. Just expose this API with some authorization for security reasons.

After we have the HTML, the next step is to create beautiful markdown content using the HTML body content. We will use Bedrock agent for this, with Claude as the FM.


Step 2 — Setup AWS Bedrock "Agent"

Bedrock Agent Config
I have created a basic agent using the Bedrock console with the below prompt. You can modify the prompt further for better results.



You are a helpful assistant who takes HTML as input and then parses it 
and returns the blog as a Markdown blog, and the blog should just contain 
the main content body without the title and subtitle. 
Also, remove the first image of the article from the markdown body, 
as we are putting that in the header.

* In the main content markdown, just keep the main body, remove the title 
and subtitle published date, etc.

On top of the markdown, add these things:
* decide on the title and description of the content
* categories can be travel or engineering
* remove the title and description from the main markdown body
* The image will be the first URL of the markdown blog

Sample to put on the top of the markdown
---
title: The Time When I Got Scammed in Georgia
description: A Reminder to Dodge Scams… Or Collect Them Like Souvenirs?
image: /images/blog/blog-post-4.1.png
date: 2024/6/28
authors:
  - nomadic_bug
categories:
  - travel
---


Enter fullscreen mode Exit fullscreen mode

This prompt will help us get the beautiful markdown file in our desired format.


Step 3 — Next.js App with Bedrock Integration

I used Next.js with an app router to create the basic UI for this project. Below is the primary API to run the agent we have created earlier on AWS Bedrock.

The complete code is available here.



// Outline of what we are doing.
// Initialize Bedrock Agent Client with AWS credentials
Initialize BedrockAgentClient with:
    region = "us-east-1"
    access_key = AWS_ACCESS_KEY_ID from environment
    secret_key = AWS_SECRET_ACCESS_KEY from environment

// Set up Agent details
agent_id = "your-agent-id"
agent_alias_id = "your-agent-alias-id"

// Function to invoke Bedrock Agent
Function InvokeBedrockAgent(session_id, input_text):
    Create new InvokeAgentCommand with:
        agent_id = agent_id
        agent_alias_id = agent_alias_id
        session_id = session_id
        input_text = input_text
    Send command to BedrockAgentClient
        Return the completion from the response

// Main API Handler
Function HandlePostRequest(request):
    Extract message and session_id from request body

    If message is missing OR session_id is missing:
        Return error response:
            status = 400
            message = "Please provide both a message and a sessionId."
    response = InvokeBedrockAgent(session_id, message)
        Return success response:
            status = 200
            data = {
                response: response,
                sessionId: session_id
            }


Enter fullscreen mode Exit fullscreen mode

Output screenshot

The complete code is available on my Github.

Conclusion

This HTML Fetcher and Markdown Converter is a prototype project that converts web content into easily readable and editable Markdown format. My goal was to make this work, and it does. Some improvements can be made, but this project gave me an idea of how to start.


Thanks for reading my story. If you want to read more stories like this, I invite you to follow me.

Till then, Sayonara! I wish you the best in your learning journey.

Top comments (0)