Discussion on: Lessons learned from one year with Kubernetes and Istio

View post

Thanks for your article and experience sharing. I'm working with my teammates on our Kubernetes (on AKS) and Istio journey for 2 months, and we have already seen some challenges when things got broken and we need to drill into so many different layers (issues on In/egress gateway, Envoy sidecar, Policies ... etc) in order to troubleshoot. Would you mind sharing any tips, or any monitoring strategy from your experience for Prod environment? In particular, which kind of logs you will be monitoring closely? any recommendations or best practices on the Istio settings? or any commands or logs you would check first in order improve the efficiency when troubleshooting issues on Istio?

Marcos Brizeno • Jan 18 '20

Hi, yeah I think everyone has been in a situation where things just don't work and you can't reproduce the problem.
I think understanding all the pieces in between someone making a request and the application inside the pod responding to it is really important and then you will be able to look at the right place faster. This talk "the life of a packet through istio" (youtube.com/watch?v=cB611FtjHcQ) is really good and it goes into a lot of details.
For logs I usually look at Ingress, Mixer and the application sidecar. Doing a port-forward and setting the istio-proxy log level to debug gives a lot of information and then you can read through everything to try and find what could be wrong.
And for best practices I think that this is where, Istio in particular, is struggling the most simply because you can do so much with it that is hard to create any sort of convention or capture best practices. One thing that helped us was running istioctl validate (istio.io/docs/reference/commands/i...) on resources being deployed to avoid potential issues - we made it part of our pipelines and also an admission controller validation.
Good luck on your journey!