In addition to great Python HTTP client tools such as Requests and HTTPX, the standard library itself supplies the necessary ingredients to make a working HTTP client for API calls. This tutorial shares how to construct and customize such a tool for your own scripts.
Consider installing a library
Before proceeding, I should note that in many cases, the approach in this article is not best practice. Instead, I highly recommend using a third-party Python library for features, security, and reliability.
Some suggested libraries:
- urllib3 is the dependency for many other tools, including requests. By itself, urllib3 is quite usable. It may be all you need.
- requests is ubiquitous and well documented.
- HTTPX has an interface almost identical to requests, but with the added benefit of asyncio support. You may be interested in a series of articles I wrote on using HTTPX both synchronously and asynchronously.
- pycurl is less popular as a Python library, but interfaces with the well-known libcurl.
- aiohttp has an asyncio-based HTTP client that is well-documented and well-liked.
If, however, you find yourself needing a solution that does not require external dependencies other than what is already available in the Python standard library, then you may wish to read on.
Summary code
import json
import typing
import urllib.error
import urllib.parse
import urllib.request
from email.message import Message
class Response(typing.NamedTuple):
body: str
headers: Message
status: int
error_count: int = 0
def json(self) -> typing.Any:
"""
Decode body's JSON.
Returns:
Pythonic representation of the JSON object
"""
try:
output = json.loads(self.body)
except json.JSONDecodeError:
output = ""
return output
def request(
url: str,
data: dict = None,
params: dict = None,
headers: dict = None,
method: str = "GET",
data_as_json: bool = True,
error_count: int = 0,
) -> Response:
if not url.casefold().startswith("http"):
raise urllib.error.URLError("Incorrect and possibly insecure protocol in url")
method = method.upper()
request_data = None
headers = headers or {}
data = data or {}
params = params or {}
headers = {"Accept": "application/json", **headers}
if method == "GET":
params = {**params, **data}
data = None
if params:
url += "?" + urllib.parse.urlencode(params, doseq=True, safe="/")
if data:
if data_as_json:
request_data = json.dumps(data).encode()
headers["Content-Type"] = "application/json; charset=UTF-8"
else:
request_data = urllib.parse.urlencode(data).encode()
httprequest = urllib.request.Request(
url, data=request_data, headers=headers, method=method
)
try:
with urllib.request.urlopen(httprequest) as httpresponse:
response = Response(
headers=httpresponse.headers,
status=httpresponse.status,
body=httpresponse.read().decode(
httpresponse.headers.get_content_charset("utf-8")
),
)
except urllib.error.HTTPError as e:
response = Response(
body=str(e.reason),
headers=e.headers,
status=e.code,
error_count=error_count + 1,
)
return response
You are certainly welcome to copy and use the above function, or browse or clone the Github repo.
If, however, you are reading this article for a do-it-yourself approach, I encourage you to build your own function that suits your needs. It may grow simpler or more flexible than the above.
Let's discuss the building blocks.
An introduction to urllib.request.urlopen()
The recommended high-level function for HTTP requests is urlopen()
, available in the standard urllib.request
module.
Unlike the lower-level http.client
module, urlopen()
provides error handling, follows redirects, and provides convenience around headers and data.
An example:
from urllib.request import urlopen, Request
url = "https://jsonplaceholder.typicode.com/posts/1"
if not url.startswith("http"):
raise RuntimeError("Incorrect and possibly insecure protocol in url")
httprequest = Request(url, headers={"Accept": "application/json"})
with urlopen(httprequest) as response:
print(response.status)
print(response.read().decode())
I highly appreciate and recommend JSONPlaceHolder's free fake API, used above. Useful precisely for what we are doing here: testing HTTP clients intended for API work.
Please note the security measures in the above code. Before passing a URL to urlopen()
, make sure that it is a web url and not a local file ("file:///
"). If you want a wake-up call, try urlopen("file:///etc/passwd").read()
on a Linux system (not in production code, though!) and see what happens. Of course, this protocol check is only necessary if the URL comes from user input. If you control the URL string and can assure that it does not start with "file:
" then that is a good thing. You may also be interested in an another approach to hardening urlopen()
by redefining the list of protocol handlers.
Protocol checking aside, the urlopen()
call is fairly simple, as you can see in the above example. I recommend using it alongside with
in a context manager for tidiness, so that closing the response is handled automatically.
We passed a Request object to the urlopen()
function. While we could simply pass a URL string, the Request object offer much more flexibility: we can specify HTTP method
(GET, POST, PUT, HEAD, DELETE), request headers
, and request data
.
The response returned by urlopen
has 4 useful attributes:
- It has a file-like interface that can be
read()
, returning bytes url
-
status
returns the HTTP status code -
headers
returns an EmailMessage object. This functions somewhat like adict
but with case-insensitive keys. It also has some helpful methods such asget_content_type()
andget_content_charset()
. Theget_all()
method is another useful one, for when there may be multiple key/value pairs for the same header name. See the helpful Wikipedia article for a list of possible reponse headers.
HTTP errors
Out of the box, urlopen
handles redirects (status codes 301, 302, 303, or 307). Other than these codes, though, if the status code is not between 200 and 299 (HTTP "OK" codes according to RFC 2616) then an HTTPError exception is raised.
The HTTPError can be captured and analyzed with the appropriate try... except...
block, such as this:
from urllib.error import HTTPError
from urllib.request import urlopen
try:
urlopen("https://github.com/404")
except HTTPError as e:
print(e.status)
print(e.reason)
print(e.headers.get_content_type())
The error (assigned to the "e
" variable in the above) has the following useful properties:
-
status
to get the error code (such as 404) -
headers
as an EmailMessage object. Again, this can be treated like a case-insensitivedict
. -
reason
with the text of the error
In the function at the top of this article, I catch and silence all errors, to make the response uniform, and pass error-handling responsibility downstream. However, this may not be desirable. Perhaps, instead of continuing no matter the error, you want to fail on anything other than a 401 or 429 error:
except urllib.error.HTTPError as e:
if e.code in (401, 429):
response = Response(
body=str(e.reason),
headers=e.headers,
status=e.code,
error_count=error_count + 1,
)
else:
raise e
Of course, logic could be added to deal with errors as appropriate, depending on the status code.
Note the entirely optional auto-incrementing error_count
attribute in my code. Sometimes, I wish to call the http request function recursively. This allows the number of calls to be tracked, and dealt with downstream, hopefully preventing infinite recursion. For instance, I may want to catch 401 errors, parse the "Www-Authenticate" header for a token, then retry the request with the token. But if this fails repeatedly (say, 5 tries), it should stop. I could test for error_count >= 5
and raise and exception if so, meanwhile making sure to pass the current error_count
back to the request function as a parameter, so it continues to be incremented appropriately.
An alternative way to customize error handling is to construct your own subclasses of BaseHandler
, then build an OpenerDirector chain of handlers as appropriate. For instance, you could subclass BaseHandler
and add a method http_error_401
to handle authorization as desired, then pass an instance of that custom class to build_opener()
. Obviously, this requires a deeper dive into the opener innards.
A versatile Response
object
I find it helpful to create a Python class that can contain the bits of the HTTP response that I care about. This could be a dict
, but I like to add a method or two, such as a JSON decoder.
If using Python 3.7 or later, consider using a dataclass
. An example:
@dataclass(frozen=True)
class Response():
body: str
headers: Message
status: int
error_count: int = 0
def json(self) -> typing.Any:
"""
Decode body's JSON.
Returns:
Pythonic representation of the JSON object
"""
try:
output = json.loads(self.body)
except json.JSONDecodeError:
output = ""
return output
Enabling frozen
is strictly optional, and reflects my preference for this object being immutable (attributes can't be changed after initialization).
Another option, as demonstrated in the code at the beginning of the article, is a typed NamedTuple. I chose this for its immutability, ease of setup, and backwards compatibility.
Of course, a custom class will work, or attrs, or whatever container works for you.
Requests with data
Depending on the API with which you are interfacing, you may encounter various scenarios for accepting data. In each scenario, we can start with a Python dict
and convert it into the required format.
The query string
Sometimes, data is passed in through the query string. To encode a Python dict
as a query string that can be appended to the URL (after the "?
"), use urllib.parse.urlencode
:
from urllib.parse import urlencode
from urllib.request import urlopen
url = "https://jsonplaceholder.typicode.com/posts"
params = {"userId": 1, "_limit": 3}
url += "?" + urlencode(params, doseq=True, safe="/")
with urlopen(url) as response:
print(response.read().decode())
While not relevant to the above request, I did pass two parameters to urlencode
that I have found helpful:
-
doseq
will allow lists to be encoded as multiple parameters. For instance, if we passed in{"usernames": ["John Doe", "Jane Doe"]}
, then the end result would be "usernames=John+Doe&usernames=Jane+Doe
". -
safe
defines the characters that will not be url-encoded. In some APIs I encounter, such as the Docker API, it is better to leave slashes unencoded, so I added that to thesafe
string. Adapt as you see fit.
Sending data in the request body
Similarly, data can be encoded with urllib.parse.urlencode
and then passed into the Request object via the data
parameter:
from urllib.parse import urlencode
from urllib.request import Request, urlopen
url = "https://api.funtranslations.com/translate/yoda.json"
data = {"text": "HTTP POST calls are remarkably easy"}
postdata = urlencode(data).encode()
httprequest = Request(url, data=postdata, method="POST")
with urlopen(httprequest) as response:
print(response.read().decode())
Sending JSON in the request body
Many APIs accept and even require the request parameters to be sent as JSON. In these cases, it is important to first encode the Python dict
(or other object) as JSON, then set the "Content-Type
" request header appropriately:
import json
from urllib.parse import urlencode
from urllib.request import Request, urlopen
url = "https://jsonplaceholder.typicode.com/posts"
data = {
"userid": "1001",
"title": "POSTing JSON for Fun and Profit",
"body": "JSON in the request body! Don't forget the content type.",
}
postdata = json.dumps(data).encode()
headers = {"Content-Type": "application/json; charset=UTF-8"}
httprequest = Request(url, data=postdata, method="POST", headers=headers)
with urlopen(httprequest) as response:
print(response.read().decode())
In the above, we used Python's built-in JSON module to dump the data dict
into a string, then encode it to bytes so it could then handled as POST data.
We set the content type header to application/json
. In addition, we specified the character encoding as UTF-8. Given that UTF-8 is the required JSON character encoding, this is redundant and probably unnecessary, but it never hurts to be explicit.
Parsing JSON in the response body
Because most APIs I use return JSON, and some return other formats such as XML unless JSON is specified, I typically set the Accepts
header in the request to application/json
. If you are pulling other types of data, such as text/csv
, you would want to tweak that header. Set it to */*
if you don't care.
In the Response object we created earlier, there is an example of a JSON decoder, the result of which will likely be a Python dict
or list
, but could conceivably be a string or boolean, depending on what the server returns. Here is another example of similar functionality, but loading the JSON directly from the file-like response:
import json
from urllib.parse import urlencode
from urllib.request import Request, urlopen
url = "https://jsonplaceholder.typicode.com/posts?_limit=3"
with urlopen(url) as response:
try:
jsonbody = json.load(response)
except json.JSONDecodeError:
jsonbody = ""
print(jsonbody)
In the above example I decided to fail silently, offering an empty string when JSON decoding fails. If this is not desired, just use json.load()
(or json.loads()
from a string) and let any exceptions float up as they occur.
Other tricks or suggestions?
I am very curious if you use urlopen
and how. Are there optimizations to the above that I am missing? Does this raise any questions or confusion? Feel free to post in the comments.
Top comments (3)
Great article!!
Can we have
def json(self) -> Optional[Dict]
instead ofdef json(self) -> typing.Any
inResponse
class?You certainly can, as long as you are 100% sure that all your API calls will only return a JSON object, not a number, string, boolean, or
null
. Since JSON supports all those types, one cannot be assured that only a JSON object (the equivalent of a Pythondict
) will be returned.I know of several APIs that may return an array instead of just an object, so I would be somewhat concerned with this approach. But if you know or control the endpoints and return values, then you can define this how you would like.
using context manager is necessary?