Jonathan Bowman

Posted on Aug 8, 2020 • Edited on Aug 15, 2020

Getting Started with HTTPX, Part 1: Building a Python REST Client (Synchronous Version)

#python #httpx #pythonpoetry

HTTPX is a modern HTTP client library for Python. Its interface is similar to the old standby Requests, but it supports asynchronous HTTP requests, using Python's asyncio library (or trio). In other words, while your program is waiting for an HTTP request to finish, other work does not need to be blocked.

In this first article, we will first build a client that is synchronous. In other words, each request will complete before the next one starts. The second article will build tests for this client, then Part 3 will adapt it to make the requests asynchronously overlap, and then test asynchronously in Part 4.

To interact with HTTPX, let's build a mini-project called pypedia. It will be a command-line tool to list a few Python-related articles from Wikipedia.

Poetry eases Python project and dependency management, so I use that to quickly get a project up and running. If new to Poetry, you may appreciate an article in which I introduce it.

Setup with Poetry

poetry new --src pypedia
cd pypedia
poetry add httpx

Two functions and a command runner

In the src/pypedia directory, create a Python file called synchronous.py.

"""Proof-of-concept Wikipedia search tool."""
import logging
import time

import httpx

EMAIL = "your_email@provider"  # or Github URL or other identifier
USER_AGENT = {"user-agent": f"pypedia/0.1.0 ({EMAIL})"}

logging.basicConfig(filename="syncpedia.log", filemode="w", level=logging.INFO)
LOG = logging.getLogger("syncpedia")


def search(query, limit=100, client=None):
    """Search Wikipedia, returning a JSON list of pages."""
    if not client:
        client = httpx
    LOG.info(f"Start query '{query}': {time.strftime('%X')}")
    url = "https://en.wikipedia.org/w/rest.php/v1/search/page"
    params = {"q": query, "limit": limit}
    response = client.get(url, params=params)
    LOG.info(f"End query '{query}': {time.strftime('%X')}")
    return response


def list_articles(queries):
    """Execute several Wikipedia searches."""
    with httpx.Client(headers=USER_AGENT) as client:
        responses = (search(query, client=client) for query in queries)
    results = (response.json()["pages"] for response in responses)
    # results = (response.json() for response in responses)
    return dict(zip(queries, results))


def run():
    """Command entry point."""
    queries = [
        "linksto:Python_(programming_language)",
        "incategory:Computer_programming",
        "incategory:Programming_languages",
        "incategory:Python_(programming_language)",
        "incategory:Python_web_frameworks",
        "incategory:Python_implementations",
        "incategory:Programming_languages_created_in_1991",
        "incategory:Computer_programming_stubs",
    ]
    results = list_articles(queries)
    for query, articles in results.items():
        print(f"\n*** {query} ***")
        for article in articles:
            print(f"{article['title']}: {article['excerpt']}")

In summary, the above has two significant functions and a command runner.

Using `Client.get()`

The search function accepts a reusable HTTPX Client instance and the query string, then performs a client GET request to the Wikipedia search endpoint.

The HTTPX Client is passed into the search() function as the client variable, so we can use methods like client.get(), passing two arguments: url and params. Whatever key:value pairs are in the params dict will make up the query string appended to the url, such as q (the Wikipedia search terms) or limit.

In case a pre-existing client is not passed to search(), the subsequent requests will use httpx.get(). This makes it easy to use search() by itself and test it in an isolated fashion.

`httpx.Client` in a context manager

The list_articles() function opens an HTTPX Client as a context manager, so that cleanup is assured and automatic. It accepts one parameter, queries, and then iterates over that list, calling search() with every query. It does this inside the context manager. This way, all the client.get() calls in the search() function should benefit from the re-use of a single HTTP persistent connection.

For those familiar with Requests, this is the equivalent to Requests's Session object.

The HTTPX Advanced Usage guide has excellent rationale and instructions for using the Client.

Enable the command runner

The run() function executes whatever we want to have executed when called as a script. In this case, it creates a list of search terms, then sends the list to list_articles(), then parses and prints the result.

With Poetry, the entry point for a script is defined in pyproject.toml. So we add this to that file:

[tool.poetry.scripts]
syncpedia = "pypedia.synchronous:run"

So, the script syncpedia will call the run function of the synchronous submodule of the package pypedia.

poetry install

Synchronous execution

To run:

poetry run syncpedia

Assuming all works well, titles and excerpts of many Wikipedia articles should scroll by.

The calls to the Wikipedia API happened synchronously, in a sequence. One completed before the next began. This can be seen in the log file.

$ cat syncpedia.log
INFO:root:Start query 'linksto:Python_(programming_language)': 05:39:16
INFO:root:End query 'linksto:Python_(programming_language)': 05:39:17
INFO:root:Start query 'incategory:Computer_programming': 05:39:17
INFO:root:End query 'incategory:Computer_programming': 05:39:18
INFO:root:Start query 'incategory:Programming_languages': 05:39:18
INFO:root:End query 'incategory:Programming_languages': 05:39:19
INFO:root:Start query 'incategory:Python_(programming_language)': 05:39:19
INFO:root:End query 'incategory:Python_(programming_language)': 05:39:19
INFO:root:Start query 'incategory:Python_web_frameworks': 05:39:19
INFO:root:End query 'incategory:Python_web_frameworks': 05:39:20
INFO:root:Start query 'incategory:Python_implementations': 05:39:20
INFO:root:End query 'incategory:Python_implementations': 05:39:20
INFO:root:Start query 'incategory:Programming_languages_created_in_1991': 05:39:20
INFO:root:End query 'incategory:Programming_languages_created_in_1991': 05:39:20
INFO:root:Start query 'incategory:Computer_programming_stubs': 05:39:20
INFO:root:End query 'incategory:Computer_programming_stubs': 05:39:21

In the instance above, each call took 1 second or less, executing in a clear order.

In other words, everybody had to wait in line.

HTTPX has the ability to make calls asynchronously, in which each call does not need to wait its turn in line. This can potentially have performance benefits. We will explore the async possibilities later in the series.

For now, we cannot forget to write tests, and the next article will engage this.

Top comments (1)

maraal • Aug 17 '22 • Edited

Hi Jonathan!

When I tried to run your code in python3.9 I got the following error:

Traceback (most recent call last):
  File "/home/goku/.cache/pypoetry/virtualenvs/pypedia-9zrL6fe4-py3.9/bin/syncpedia", line 5, in <module>
    run()
  File "/home/goku/learn/tutorial-httpx/pypedia/src/pypedia/synchronous.py", line 50, in run
    results = list_articles(queries)
  File "/home/goku/learn/tutorial-httpx/pypedia/src/pypedia/synchronous.py", line 35, in list_articles
    return dict(zip(queries, results))
  File "/home/goku/learn/tutorial-httpx/pypedia/src/pypedia/synchronous.py", line 33, in <genexpr>
    results = (response.json()["pages"] for response in responses)
  File "/home/goku/learn/tutorial-httpx/pypedia/src/pypedia/synchronous.py", line 28, in <genexpr>
    responses = (search(query, client=client) for query in queries)
  File "/home/goku/learn/tutorial-httpx/pypedia/src/pypedia/synchronous.py", line 20, in search
    response = client.get(url, params=params)
  File "/home/goku/.cache/pypoetry/virtualenvs/pypedia-9zrL6fe4-py3.9/lib/python3.9/site-packages/httpx/_client.py", line 1039, in get
    return self.request(
  File "/home/goku/.cache/pypoetry/virtualenvs/pypedia-9zrL6fe4-py3.9/lib/python3.9/site-packages/httpx/_client.py", line 815, in request
    return self.send(request, auth=auth, follow_redirects=follow_redirects)
  File "/home/goku/.cache/pypoetry/virtualenvs/pypedia-9zrL6fe4-py3.9/lib/python3.9/site-packages/httpx/_client.py", line 891, in send
    raise RuntimeError("Cannot send a request, as the client has been closed.")
RuntimeError: Cannot send a request, as the client has been closed.

In order to the code run I have to replace the line

responses = (search(query, client=client) for query in queries)

with:

responses = [search(query, client=client) for query in queries]

Do you know whis is this happening?

Thanks!

DEV Community

Getting Started with HTTPX, Part 1: Building a Python REST Client (Synchronous Version)

Setup with Poetry

Two functions and a command runner

Using `Client.get()`

`httpx.Client` in a context manager

Enable the command runner

Synchronous execution

Top comments (1)

Read next

The ultimate guide to Retrieval-Augmented Generation (RAG)

How I Automated My Workflow by Connecting Python to Google Sheets API

LightningChart Python 1.0

Full Stack Python Developer - Day 1

Setup with Poetry

Two functions and a command runner

Using Client.get()

httpx.Client in a context manager

Enable the command runner

Synchronous execution

Read next

The ultimate guide to Retrieval-Augmented Generation (RAG)

How I Automated My Workflow by Connecting Python to Google Sheets API

LightningChart Python 1.0

Full Stack Python Developer - Day 1

Using `Client.get()`

`httpx.Client` in a context manager