so you have followed the basic steps to get a service deployed into kubernetes, let’s say the service is Nginx, because why not …
$ kubectl create deployment ngx --image=nginx:latest
let’s scale that up to 2, because, really running just 1 pod is a bit too basic
$ kubectl scale deployment --replicas 2 ngx
you have 2 Nginx pods running on your k8s “cluster”, I use the built-in k8s installation that comes with Docker Edge
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
ngx-5cb59c856c-cmn8p 1/1 Running 0 52m
ngx-5cb59c856c-s65nh 1/1 Running 0 52m
now that we have 2 pods, we need to somehow access the service so we can do some basic checking ourselves, e.g. to check that things work as expected, before we make this service publicly available via a LoadBalancer or Ingress — so we expose it as an internal service for now
$ kubectl expose deployment ngx --port 8080 --target-port 80
using
$ kubectl proxy &
you can then access your service (securely, it’s actually a encrypted tunnel) via
http://localhost:8001/api/v1/namespaces/default/services/ngx:8080/proxy/
details of how that works are here, but in a nutshell, you can access any internal service via
http://localhost:8001/api/v1/namespace/{namespace}/services/{service-name}:{port}/
so this is where you notice that one of the containers is behaving strangely, and sometimes seems to be unable to connect to a backend or just works intermittently, etc. (something’s wrong, basically) and so you decide, you’d like to run ngrep
or tcpdump
or strace
to figure out what’s going on, but you also don’t want to modify the container image … so what can we do?
As long as you have access to the node running the container instance, you’re in luck — in this example we’ll just use the local Docker4Mac installation, but it works with any node running Docker containers.
Find out which node runs the troublesome container with
$ kubectl describe pod ngx-5cb59c856c-cmn8p
Name: ngx-5cb59c856c-cmn8p
Namespace: default
Node: docker-for-desktop/192.168.65.3
and login to that node; I initially tried editing the pod with
$ kubectl edit pod ngx-5cb59c856c-cmn8p
to add a debug container, but saving that config will fail as you cannot add/remove containers from a pod :(
As we’re on the node that runs the container however, we can create a custom debug container (locally) and run that inside the same pid and network namespace as the existing ngx-5cb59c856c-cmn8p.
The Dockerfile could be something like this (shamelessly copied/used from https://medium.com/@rothgar/how-to-debug-a-running-docker-container-from-a-separate-container-983f11740dc6)
FROM alpine
RUN apk update && apk add strace
CMD ["strace", "-p", "1"]
run
$ docker build -t strace .
and once the container is built (in reality, you’d build a consistent debug container and make that available for you to pull anytime from GCR or ECR or wherever) you run it with --privileged
(yes, we don’t care about security in this example, see Justin Garrison’s post on how to do this more restrictive, so you’re able to do all the things and not have to fight permissions).
To attach to the pid
and net
namespace, you need the container Id
or name, that’s easy to find with
$ docker ps | grep nginx
6b6e65ebc7c8 nginx "nginx -g 'daemon of…" About an hour ago Up About an hour k8s\_nginx\_ngx-5cb59c856c-cmn8p\_default\_402e0d53-a933-11e8-93cb-025000000001\_0
e245d91ba045 nginx "nginx -g 'daemon of…" About an hour ago Up About an hour k8s\_nginx\_ngx-5cb59c856c-s65nh\_default\_36a3a1e7-a933-11e8-93cb-025000000001\_0
So we’ll use the first one which is 6b6e65ebc7c8
(you should obviously use the one that's causing trouble) …
docker run -ti --pid=container:6b6e65ebc7c8 --net=container:6b6e65ebc7c8 --privileged strace /bin/ash
Once executed, you need the PID that is actually doing the work, PID 1 is actually the parent Nginx process, but that is not processing any requests, it’s just managing the child processes
/ # ps -ef
PID USER TIME COMMAND
1 root 0:00 nginx: master process nginx -g daemon off;
6 101 0:00 nginx: worker process
44 root 0:00 /bin/ash
50 root 0:00 ps -ef
/ #
okay, so let’s do an strace
on PID 6 as that is actually doing work …
/ # strace -fp 6
strace: Process 6 attached
gettimeofday({tv\_sec=1535294355, tv\_usec=751153}, NULL) = 0
epoll\_wait(8, [{EPOLLIN, {u32=2097902081, u64=139627189727745}}], 512, 61010) = 1
gettimeofday({tv\_sec=1535294359, tv\_usec=908313}, NULL) = 0
recvfrom(3, "GET / HTTP/1.1\r\nHost: localhost:"..., 1024, 0, NULL, NULL) = 213
stat("/usr/share/nginx/html/index.html", {st\_mode=S\_IFREG|0644, st\_size=612, ...}) = 0
open("/usr/share/nginx/html/index.html", O\_RDONLY|O\_NONBLOCK) = 11
fstat(11, {st\_mode=S\_IFREG|0644, st\_size=612, ...}) = 0
writev(3, [{iov\_base="HTTP/1.1 200 OK\r\nServer: nginx/1"..., iov\_len=238}], 1) = 238
sendfile(3, 11, [0] =\> [612], 612) = 612
write(5, "10.1.0.1 - - [26/Aug/2018:14:39:"..., 111) = 111
close(11) = 0
epoll\_wait(8, [{EPOLLIN, {u32=2097902081, u64=139627189727745}}], 512, 65000) = 1
gettimeofday({tv\_sec=1535294361, tv\_usec=971440}, NULL) = 0
recvfrom(3, "GET / HTTP/1.1\r\nHost: localhost:"..., 1024, 0, NULL, NULL) = 213
stat("/usr/share/nginx/html/index.html", {st\_mode=S\_IFREG|0644, st\_size=612, ...}) = 0
open("/usr/share/nginx/html/index.html", O\_RDONLY|O\_NONBLOCK) = 11
fstat(11, {st\_mode=S\_IFREG|0644, st\_size=612, ...}) = 0
writev(3, [{iov\_base="HTTP/1.1 200 OK\r\nServer: nginx/1"..., iov\_len=238}], 1) = 238
sendfile(3, 11, [0] =\> [612], 612) = 612
write(5, "10.1.0.1 - - [26/Aug/2018:14:39:"..., 111) = 111
close(11) = 0
epoll\_wait(8,
^Cstrace: Process 6 detached
\<detached ...\>
And there it is, 2 requests to this particular Nginx instance without having to
- redeploy the Pod with an additional debug container (you could argue that this would be better, but you may not be able to re-produce the issue straight away and may need to run it for a long time which costs resources)
- modify the Dockerfile in any way (install debug tools)
- change privileges on the running container, it can keep running in its more secure environment vs the debug container which has additional capabilities
the nice thing about this pattern is that you can create yourself a debug container that you can re-use to debug applications running on any node that runs Docker (ECS, on-prem K8S, EKS, AKS, GKE).
Happy debugging!
Credits & Resources:
- https://kubernetes.io/docs/tasks/administer-cluster/access-cluster-services/#manually-constructing-apiserver-proxy-urls
- https://medium.com/@rothgar/how-to-debug-a-running-docker-container-from-a-separate-container-983f11740dc6
- https://medium.com/google-cloud/kubernetes-nodeport-vs-loadbalancer-vs-ingress-when-should-i-use-what-922f010849e0
Top comments (0)