HTTPX is a modern HTTP client library for Python. Its interface is similar to the old standby Requests, but it supports asynchronous HTTP requests, using Python's asyncio library (or trio). In other words, while your program is waiting for an HTTP request to finish, other work does not need to be blocked.
In this first article, we will first build a client that is synchronous. In other words, each request will complete before the next one starts. The second article will build tests for this client, then Part 3 will adapt it to make the requests asynchronously overlap, and then test asynchronously in Part 4.
To interact with HTTPX, let's build a mini-project called pypedia
. It will be a command-line tool to list a few Python-related articles from Wikipedia.
Poetry eases Python project and dependency management, so I use that to quickly get a project up and running. If new to Poetry, you may appreciate an article in which I introduce it.
Setup with Poetry
poetry new --src pypedia
cd pypedia
poetry add httpx
Two functions and a command runner
In the src/pypedia
directory, create a Python file called synchronous.py
.
"""Proof-of-concept Wikipedia search tool."""
import logging
import time
import httpx
EMAIL = "your_email@provider" # or Github URL or other identifier
USER_AGENT = {"user-agent": f"pypedia/0.1.0 ({EMAIL})"}
logging.basicConfig(filename="syncpedia.log", filemode="w", level=logging.INFO)
LOG = logging.getLogger("syncpedia")
def search(query, limit=100, client=None):
"""Search Wikipedia, returning a JSON list of pages."""
if not client:
client = httpx
LOG.info(f"Start query '{query}': {time.strftime('%X')}")
url = "https://en.wikipedia.org/w/rest.php/v1/search/page"
params = {"q": query, "limit": limit}
response = client.get(url, params=params)
LOG.info(f"End query '{query}': {time.strftime('%X')}")
return response
def list_articles(queries):
"""Execute several Wikipedia searches."""
with httpx.Client(headers=USER_AGENT) as client:
responses = (search(query, client=client) for query in queries)
results = (response.json()["pages"] for response in responses)
# results = (response.json() for response in responses)
return dict(zip(queries, results))
def run():
"""Command entry point."""
queries = [
"linksto:Python_(programming_language)",
"incategory:Computer_programming",
"incategory:Programming_languages",
"incategory:Python_(programming_language)",
"incategory:Python_web_frameworks",
"incategory:Python_implementations",
"incategory:Programming_languages_created_in_1991",
"incategory:Computer_programming_stubs",
]
results = list_articles(queries)
for query, articles in results.items():
print(f"\n*** {query} ***")
for article in articles:
print(f"{article['title']}: {article['excerpt']}")
In summary, the above has two significant functions and a command runner.
Using Client.get()
The search
function accepts a reusable HTTPX Client instance and the query string, then performs a client GET request to the Wikipedia search endpoint.
The HTTPX Client is passed into the search()
function as the client
variable, so we can use methods like client.get()
, passing two arguments: url
and params
. Whatever key:value pairs are in the params
dict will make up the query string appended to the url, such as q
(the Wikipedia search terms) or limit
.
In case a pre-existing client is not passed to search()
, the subsequent requests will use httpx.get()
. This makes it easy to use search()
by itself and test it in an isolated fashion.
httpx.Client
in a context manager
The list_articles()
function opens an HTTPX Client as a context manager, so that cleanup is assured and automatic. It accepts one parameter, queries
, and then iterates over that list, calling search()
with every query. It does this inside the context manager. This way, all the client.get()
calls in the search()
function should benefit from the re-use of a single HTTP persistent connection.
For those familiar with Requests, this is the equivalent to Requests's Session object.
The HTTPX Advanced Usage guide has excellent rationale and instructions for using the Client.
Enable the command runner
The run()
function executes whatever we want to have executed when called as a script. In this case, it creates a list of search terms, then sends the list to list_articles()
, then parses and prints the result.
With Poetry, the entry point for a script is defined in pyproject.toml
. So we add this to that file:
[tool.poetry.scripts]
syncpedia = "pypedia.synchronous:run"
So, the script syncpedia
will call the run
function of the synchronous
submodule of the package pypedia
.
poetry install
Synchronous execution
To run:
poetry run syncpedia
Assuming all works well, titles and excerpts of many Wikipedia articles should scroll by.
The calls to the Wikipedia API happened synchronously, in a sequence. One completed before the next began. This can be seen in the log file.
$ cat syncpedia.log
INFO:root:Start query 'linksto:Python_(programming_language)': 05:39:16
INFO:root:End query 'linksto:Python_(programming_language)': 05:39:17
INFO:root:Start query 'incategory:Computer_programming': 05:39:17
INFO:root:End query 'incategory:Computer_programming': 05:39:18
INFO:root:Start query 'incategory:Programming_languages': 05:39:18
INFO:root:End query 'incategory:Programming_languages': 05:39:19
INFO:root:Start query 'incategory:Python_(programming_language)': 05:39:19
INFO:root:End query 'incategory:Python_(programming_language)': 05:39:19
INFO:root:Start query 'incategory:Python_web_frameworks': 05:39:19
INFO:root:End query 'incategory:Python_web_frameworks': 05:39:20
INFO:root:Start query 'incategory:Python_implementations': 05:39:20
INFO:root:End query 'incategory:Python_implementations': 05:39:20
INFO:root:Start query 'incategory:Programming_languages_created_in_1991': 05:39:20
INFO:root:End query 'incategory:Programming_languages_created_in_1991': 05:39:20
INFO:root:Start query 'incategory:Computer_programming_stubs': 05:39:20
INFO:root:End query 'incategory:Computer_programming_stubs': 05:39:21
In the instance above, each call took 1 second or less, executing in a clear order.
In other words, everybody had to wait in line.
HTTPX has the ability to make calls asynchronously, in which each call does not need to wait its turn in line. This can potentially have performance benefits. We will explore the async possibilities later in the series.
For now, we cannot forget to write tests, and the next article will engage this.
Top comments (1)
Hi Jonathan!
When I tried to run your code in
python3.9
I got the following error:In order to the code run I have to replace the line
responses = (search(query, client=client) for query in queries)
with:
responses = [search(query, client=client) for query in queries]
Do you know whis is this happening?
Thanks!