Alex Leonhardt

Posted on Sep 2, 2019 • Originally published at Medium on Aug 26, 2018

access and debug your kubernetes service / docker container

#kubernetes #docker #troubleshooting #devops

so you have followed the basic steps to get a service deployed into kubernetes, let’s say the service is Nginx, because why not …

$ kubectl create deployment ngx --image=nginx:latest

let’s scale that up to 2, because, really running just 1 pod is a bit too basic

$ kubectl scale deployment --replicas 2 ngx

you have 2 Nginx pods running on your k8s “cluster”, I use the built-in k8s installation that comes with Docker Edge

$ kubectl get pods

NAME READY STATUS RESTARTS AGE
ngx-5cb59c856c-cmn8p 1/1 Running 0 52m
ngx-5cb59c856c-s65nh 1/1 Running 0 52m

now that we have 2 pods, we need to somehow access the service so we can do some basic checking ourselves, e.g. to check that things work as expected, before we make this service publicly available via a LoadBalancer or Ingress — so we expose it as an internal service for now

$ kubectl expose deployment ngx --port 8080 --target-port 80

using

$ kubectl proxy &

you can then access your service (securely, it’s actually a encrypted tunnel) via

http://localhost:8001/api/v1/namespaces/default/services/ngx:8080/proxy/

details of how that works are here, but in a nutshell, you can access any internal service via

http://localhost:8001/api/v1/namespace/{namespace}/services/{service-name}:{port}/

so this is where you notice that one of the containers is behaving strangely, and sometimes seems to be unable to connect to a backend or just works intermittently, etc. (something’s wrong, basically) and so you decide, you’d like to run ngrep or tcpdump or strace to figure out what’s going on, but you also don’t want to modify the container image … so what can we do?

As long as you have access to the node running the container instance, you’re in luck — in this example we’ll just use the local Docker4Mac installation, but it works with any node running Docker containers.

Find out which node runs the troublesome container with

$ kubectl describe pod ngx-5cb59c856c-cmn8p

Name: ngx-5cb59c856c-cmn8p
Namespace: default
Node: docker-for-desktop/192.168.65.3

and login to that node; I initially tried editing the pod with

$ kubectl edit pod ngx-5cb59c856c-cmn8p

to add a debug container, but saving that config will fail as you cannot add/remove containers from a pod :(

As we’re on the node that runs the container however, we can create a custom debug container (locally) and run that inside the same pid and network namespace as the existing ngx-5cb59c856c-cmn8p.

The Dockerfile could be something like this (shamelessly copied/used from https://medium.com/@rothgar/how-to-debug-a-running-docker-container-from-a-separate-container-983f11740dc6)

FROM alpine
RUN apk update && apk add strace
CMD ["strace", "-p", "1"]

run

$ docker build -t strace .

and once the container is built (in reality, you’d build a consistent debug container and make that available for you to pull anytime from GCR or ECR or wherever) you run it with --privileged (yes, we don’t care about security in this example, see Justin Garrison’s post on how to do this more restrictive, so you’re able to do all the things and not have to fight permissions).

To attach to the pid and net namespace, you need the container Id or name, that’s easy to find with

$ docker ps | grep nginx

6b6e65ebc7c8 nginx "nginx -g 'daemon of…" About an hour ago Up About an hour k8s\_nginx\_ngx-5cb59c856c-cmn8p\_default\_402e0d53-a933-11e8-93cb-025000000001\_0
e245d91ba045 nginx "nginx -g 'daemon of…" About an hour ago Up About an hour k8s\_nginx\_ngx-5cb59c856c-s65nh\_default\_36a3a1e7-a933-11e8-93cb-025000000001\_0

So we’ll use the first one which is 6b6e65ebc7c8 (you should obviously use the one that's causing trouble) …

docker run -ti --pid=container:6b6e65ebc7c8 --net=container:6b6e65ebc7c8 --privileged strace /bin/ash

Once executed, you need the PID that is actually doing the work, PID 1 is actually the parent Nginx process, but that is not processing any requests, it’s just managing the child processes

/ # ps -ef
PID USER TIME COMMAND
 1 root 0:00 nginx: master process nginx -g daemon off;
 6 101 0:00 nginx: worker process
 44 root 0:00 /bin/ash
 50 root 0:00 ps -ef
/ #

okay, so let’s do an strace on PID 6 as that is actually doing work …

/ # strace -fp 6
strace: Process 6 attached
gettimeofday({tv\_sec=1535294355, tv\_usec=751153}, NULL) = 0
epoll\_wait(8, [{EPOLLIN, {u32=2097902081, u64=139627189727745}}], 512, 61010) = 1
gettimeofday({tv\_sec=1535294359, tv\_usec=908313}, NULL) = 0
recvfrom(3, "GET / HTTP/1.1\r\nHost: localhost:"..., 1024, 0, NULL, NULL) = 213
stat("/usr/share/nginx/html/index.html", {st\_mode=S\_IFREG|0644, st\_size=612, ...}) = 0
open("/usr/share/nginx/html/index.html", O\_RDONLY|O\_NONBLOCK) = 11
fstat(11, {st\_mode=S\_IFREG|0644, st\_size=612, ...}) = 0
writev(3, [{iov\_base="HTTP/1.1 200 OK\r\nServer: nginx/1"..., iov\_len=238}], 1) = 238
sendfile(3, 11, [0] =\> [612], 612) = 612
write(5, "10.1.0.1 - - [26/Aug/2018:14:39:"..., 111) = 111
close(11) = 0
epoll\_wait(8, [{EPOLLIN, {u32=2097902081, u64=139627189727745}}], 512, 65000) = 1
gettimeofday({tv\_sec=1535294361, tv\_usec=971440}, NULL) = 0
recvfrom(3, "GET / HTTP/1.1\r\nHost: localhost:"..., 1024, 0, NULL, NULL) = 213
stat("/usr/share/nginx/html/index.html", {st\_mode=S\_IFREG|0644, st\_size=612, ...}) = 0
open("/usr/share/nginx/html/index.html", O\_RDONLY|O\_NONBLOCK) = 11
fstat(11, {st\_mode=S\_IFREG|0644, st\_size=612, ...}) = 0
writev(3, [{iov\_base="HTTP/1.1 200 OK\r\nServer: nginx/1"..., iov\_len=238}], 1) = 238
sendfile(3, 11, [0] =\> [612], 612) = 612
write(5, "10.1.0.1 - - [26/Aug/2018:14:39:"..., 111) = 111
close(11) = 0
epoll\_wait(8,
^Cstrace: Process 6 detached
 \<detached ...\>

And there it is, 2 requests to this particular Nginx instance without having to

redeploy the Pod with an additional debug container (you could argue that this would be better, but you may not be able to re-produce the issue straight away and may need to run it for a long time which costs resources)
modify the Dockerfile in any way (install debug tools)
change privileges on the running container, it can keep running in its more secure environment vs the debug container which has additional capabilities

the nice thing about this pattern is that you can create yourself a debug container that you can re-use to debug applications running on any node that runs Docker (ECS, on-prem K8S, EKS, AKS, GKE).

Happy debugging!

Credits & Resources:

DEV Community

access and debug your kubernetes service / docker container

Top comments (0)

Read next

Day 40: Implementing Advanced Role-Based Access Control (RBAC) with OPA Gatekeeper

Day 39: Deploying Stateful Applications with StatefulSets (MongoDB)

Deploying the DeepSeek Model on a Local PC

Understanding the `.github` Repository