With the advent of Microservices™, ingress routing and routing between services has been an every-increasing demand. I currently default to nginx for this - with no plausible reason or experience to back this decision, just because it seems to be the most used tool currently.
However, the often needed proxy_pass
directive has driven me crazy because of it's - to me unintuitive - behavior. So I decided to take notes on how it works and what is possible with it, and how to circumvent some of it's quirks.
First, a note on https
By default proxy_pass
does not verify the certificate of the endpoint if it is https (how can this be the default behavior, really?!). This can be useful internally, but usually you want to do this very explicitly. And in case that you use publicly routed endpoints, which I have done in the past, make sure to set proxy_ssl_verify
to on
. You can also authenticate against the upstream server that you proxy_pass
to using client certificates and more, make sure to have a look at the available options at https://docs.nginx.com/nginx/admin-guide/security-controls/securing-http-traffic-upstream/.
A simple example
A proxy_pass
is usually used when there is an nginx instance that handles many things, and delegates some of those requests to other servers. Some examples are ingress in a Kubernetes cluster that spreads requests among the different microservices that are responsible for the specific locations. Or you can use nginx to directly deliver static files for a frontend, while some server-side rendered content or API is delivered by a WebApp such as ASP.NET Core or flask.
Let's imagine we have a WebApp running on http://localhost:5000 and want it to be available on http://localhost:8080/webapp/, here's how we would do it in a minimal nginx.conf:
daemon off;
events {
}
http {
server {
listen 8080;
location /webapp/ {
proxy_pass http://127.0.0.1:5000/api/;
}
}
}
You can save this to a file, e.g. nginx.conf, and run it with
nginx -c $(pwd)/nginx.conf
.
Now, you can access http://localhost:8080/webapp/ and all requests will be forwarded to http://localhost:5000/api/.
Note how the /webapp/ prefix is "cut away" by nginx. That's how locations work: They cut off the part specified in the location
specification, and pass the rest on to the "upstream". "upstream" is called whatever is behind the nginx.
To slash or not to slash
Except for when you use variables in the proxy_pass
upstream definition, as we will learn below, the location and upstream definition are very simply tied together. That's why you need to be aware of the slashes, because some strange things can happen when you don't get it right.
Here is a handy table that shows you how the request will be received by your WebApp, depending on how you write the location
and proxy_pass
declarations. Assume all requests go to http://localhost:8080:
location | proxy_pass | Request | Received by upstream |
---|---|---|---|
/webapp/ | http://localhost:5000/api/ | /webapp/foo?bar=baz | /api/foo?bar=baz |
/webapp/ | http://localhost:5000/api | /webapp/foo?bar=baz | /apifoo?bar=baz |
/webapp | http://localhost:5000/api/ | /webapp/foo?bar=baz | /api//foo?bar=baz |
/webapp | http://localhost:5000/api | /webapp/foo?bar=baz | /api/foo?bar=baz |
/webapp | http://localhost:5000/api | /webappfoo?bar=baz | /apifoo?bar=baz |
In other words: You usually always want a trailing slash, never want to mix with and without trailing slash, and only want without trailing slash when you want to concatenate a certain path component together (which I guess is quite rarely the case). Note how query parameters are preserved!
$uri and $request_uri
You have to ways to circumvent that the location
is cut off: First, you can simply repeat the location in the proxy_pass
definition, which is quite easy:
location /webapp/ {
proxy_pass http://127.0.0.1:5000/api/webapp/;
}
That way, your upstream WebApp will receive /api/webapp/foo?bar=baz in the above examples.
Another way to repeat the location is to use $uri or $request_uri. The difference is that $request_uri preserves the query parameters, while $uri discards them:
location | proxy_pass | request | received by upstream |
---|---|---|---|
/webapp/ | http://localhost:5000/api$request_uri | /webapp/foo?bar=baz | /api/webapp/foo?bar=baz |
/webapp/ | http://localhost:5000/api$uri | /webapp/foo?bar=baz | /api/webapp/foo |
Note how in the proxy_pass
definition, there is no slash between "api" and $request_uri or $uri. This is because a full URI will always include a leading slash, which would lead to a double-slash if you wrote "api/$uri".
Capture regexes
While this is not exclusive to proxy_pass
, I find it generally handy to be able to use regexes to forward parts of a request to an upstream WebApp, or to reformat it. Example: Your public URI should be http://localhost:8080/api/cart/items/123, and your upstream API handles it in the form of http://localhost:5000/cart_api?items=123. In this case, or more complicated ones, you can use regex to capture parts of the request uri and transform it in the desired format.
location ~ ^/api/cart/([a-z]*)/(.*)$ {
proxy_pass http://127.0.0.1:5000/cart_api?$1=$2;
}
Use try_files with a WebApp as fallback
A use-case I came across was that I wanted nginx to handle all static files in a folder, and if the file is not available, forward the request to a backend. For example, this was the case for a Vue single-page-application (SPA) that is delivered through flask - because the master HTML needs some server-side tuning - and I wanted to handle nginx the static files instead of flask. (This is recommended by the official gunicorn docs.)
You might have everything for your SPA except for your index.html available at /app/wwwroot/, and http://localhost:5000/ will deliver your server-tuned index.html.
Here's how you can do this:
location /spa/ {
root /app/wwwroot/;
try_files $uri @backend;
}
location @backend {
proxy_pass http://127.0.0.1:5000;
}
Note that you can not specify any paths in the proxy_pass
directive in the @backend for some reason. Nginx will tell you:
nginx: [emerg] "proxy_pass" cannot have URI part in location given by regular expression, or inside named location, or inside "if" statement, or inside "limit_except" block in /home/daniel/projects/nginx_blog/nginx.conf:28
That's why your backend should receive any request and return the index.html for it, or at least for the routes that are handled by the frontend's router.
Let nginx start even when not all upstream hosts are available
One reason that I used 127.0.0.1 instead of localhost so far, is that nginx is very picky about hostname resolution. For some unexplainable reason, nginx will try to resolve all hosts defined in proxy_pass
directives on startup, and fail to start when they are not reachable. However, especially in microservice environments, it is very fragile to require all upstream services to be available at the time the ingress, load balancer or some intermediate router starts.
You can circumvent nginx's requirement for all hosts to be available at startup by using variables inside the proxy_pass
directives. HOWEVER, for some unfathomable reason, if you do so, you require a dedicated resolver
directive to resolve these paths. For Kubernetes, you can use kube-dns.kube-system here. For other environments, you can use your internal DNS or for publicly routed upstream services you can even use a public DNS such as 1.1.1.1 or 8.8.8.8.
Additionally, using variables in proxy_pass
changes completely how URIs are passed on to the upstream. When just changing
proxy_pass https://localhost:5000/api/;
to
set $upstream https://localhost:5000;
proxy_pass $upstream/api/;
... which you might think should result in exactly the same, you might be surprised. The former will hit your upstream server with /api/foo?bar=baz
with our example request to /webapp/foo?bar=baz
. The latter, however, will hit your upstream server with /api/
. No foo. No bar. And no baz. :-(
We need to fix this by putting the request together from two parts: First, the path after the location prefix, and second the query parameters. The first part can be captured using the regex we learned above, and the second (query parameters) can be forwarded using the built-in variables $is_args
and $args
. If we put it all together, we will end up with a config like this:
daemon off;
events {
}
http {
server {
access_log /dev/stdout;
error_log /dev/stdout;
listen 8080;
# My home router in this case:
resolver 192.168.178.1;
location ~ ^/webapp/(.*)$ {
# Use a variable so that localhost:5000 might be down while nginx starts:
set $upstream http://localhost:5000;
# Put together the upstream request path using the captured component after the location path, and the query parameters:
proxy_pass $upstream/api/$1$is_args$args;
}
}
}
While localhost is not a great example here, it works with your service's arbitrary DNS names, too. I find this very valuable in production, because having an nginx refuse to start because of a probably very unimportant service can be quite a hassle while wrangling a production issue. However, it makes the location directive much more complex. From a simple location /webapp/
with a proxy_pass http://localhost/api/
it has become this behemoth. I think it's worth it, though.
Better logging format for proxy_pass
To debug issues, or simply to have enough information at hand when investigating issues in the future, you can maximize the information about what is going on in your location
that uses proxy_pass
.
I found this handy log_format
, which I enhanced with a custom variable $upstream, as we have defined above. If you always call your variables $upstream in all your locations that use proxy_pass
, you can use this log_format
and have often much needed information in your log:
log_format upstream_logging '[$time_local] $remote_addr - $remote_user - $server_name to: $upstream: $request upstream_response_time $upstream_response_time msec $msec request_time $request_time';
Here is a full example:
daemon off;
events {
}
http {
log_format upstream_logging '[$time_local] $remote_addr - $remote_user - $server_name to: "$upstream": "$request" upstream_response_time $upstream_response_time msec $msec request_time $request_time';
server {
listen 8080;
location /webapp/ {
access_log /dev/stdout upstream_logging;
set $upstream http://127.0.0.1:5000/api/;
proxy_pass $upstream;
}
}
}
However, I have not found a way to log the actual URI that is forwarded to $upstream, which would be one of the most important things to know when debugging proxy_pass
issues.
Conclusion
I hope that you have found helpful information in this article that you can put to good use in your development and production nginx configurations.
Top comments (27)
hello, Daniel thanks for this post.
However, I am having trouble passing all paths from a request to the server.
example:
servername = 127.0.0.1;
location / {
proxy_pass 192.168.4.22:3000/;
}
I want all requests with dynamic paths coming from e.g 127.0.0.1/ be matched to the above location and passed to the proxy.
I get a 404 from the server when my URL looks like 127.0.0.1/foo/bar but when its just 127.0.0.1, it works fine.
is there something i am not doing right?
Hey Peters,
was it just a formatting issue in the comment, or are you missing the // in http://127.0.0.1 in the proxy_pass statement?
Hi Daniel,
It's a formating issue.
Hm... looks correct, though. Maybe the error lies elsewhere?
Any idea where to look?
I have two sites enabled. The default nginx site running on port 80 and my API site running on a different port.
Oh, then you forgot to provide the correct port in the URL. What you posted uses the default port 80, so it uses the default nginx site. Use http://127.0.0.1:<you-port>/foo/bar
I actually have the correct port in the file. The one I posted here is just an example. The IP is different
Hi Daniel,
Here is how the site looks
i am thinking, could the root there be the problem?
I have removed it and it's still the same thing.
Hi again Daniel,
I have been able to solve it.
I had to comment this line in the location
Thanks for your time. It's really appreciated
In my case I have react + node app + nginx setup and I want to run webflow website only on /blog/* routes. I have added location blog like this:
But in that case react + node routes and /blog/* routes work fine. Only issue is /blog route gives 404. Any suggestions on what can be done to fix it?
Heya Aditya!
Could you have a closer look at which server gives you the 404? i.e. does the server logs of myblog-blog.webflow.io show an incoming request that is answered with 404, or is nginx already giving the 404? Maybe you can tell from how the 404 page looks like.
404 page of webflow is displayed. Do you think this university.webflow.com/lesson/href... can help in my case? I am currently not using paid plan so cannot use href-prefix. So I wanted to know if there is something which I can do in nginx config file.
Daniel,
I ended up here by chance. However, I was so impressed by the good work you've done in organizing this article that I decided to join dev.io - so I could post this and hopefully contribute back to the community.
This article is both useful and inspiring. Thanks!
"with no plausible reason or experience to back this decision, just because it seems to be the most used tool currently" - we all do, mate. Props for being honest though!
Really useful article, I wish I'd had this when I started my first job.
Back then I just assumed nginx was magic.
I was completely clueless why the proxy_pass wasn't working with the "set directive". Thanks for sharing. :)
In fact, the "set directive" is also the only way to force nginx to resolve the domain name again (using the parameters defined in the resolver). Otherwise, it will be cached forever...
In some particular use-cases - especially when using static configuration - this is THE main reason why nginx is not production ready. One temporarily unresolved backend host can cause nginx farm not restartable. This is exact opposite behavior from very disliked nowadays: Apache httpd.
From the other hand for dynamic generated configuration this is a good tool, but has many good successors too: traefik, istio, linkerd, fabio...
There is another piece: missing a global configuration
ProxyPreserveHost Off
- you cannot do that globally in nginx. You need to specify it per eachproxy_paas
using
proxy_set_header Host foo.bar
. That makes config very unreadable.Hi Daniel,
With regard to the actual URI that is forwarded, is it not $host that you need?
In the attached picture the yellow highlight is $host and the blue is the upstream $upstream.
Anyway, thanks for the post it really helped setting up our nginx proxy with decent logging.
Adrian
Hi - found this very useful, and have a question around the $upstream variable in the log format. Using nginx 1.24.0.
I couldn't get
$server_name to: "$upstream_addr":
to work - it barked withnginx: [emerg] unknown "upstream" variable
Based on the variables here - nginx.org/en/docs/stream/ngx_strea..., I changed it to $upstream_addr and it worked.
Entire log:
Anyone know if this is a variable change or I misunderstood what $upstream in the log format was?
Thank you for this article. I couldn't get a location regex to work with proxy_pass that would work with my upstream's routing; you explained why. (I ended up sticking the file name in an X- header and using via proxy_pass_request_headers.) Now I'm trying to figure out how to get the numeric filename that nginx assigned to my upload in my upstream JS.
Hi Daniel,
First of all, thanks for the post.
I am facing the same issue with nginx does not start when not all upstream hosts are available.
I was trying to reverse proxy the
public.domain.com/mydomain/dev06/api ===>
aws-mydomain-dev06.aws.org/api/
so I changed it to following,
it returns 404, not found
using string proxy_pass aws-mydomain-dev06.aws.org/; works fine
Is there something I am not doing right? Or where can I have a look to fine out more?
Regards