After going through monitoring and tracing solutions in Prometheus, DataDog, NewRelic, and other players like LightStep, HoneyComb, Instana, etc, I still don't see a product that is simple and easy to use for people who don't need to do the heavyweight RCA.
DataDog still remains the only option for companies that spend in the ticket size < 2000 USD per month in APM solutions, but they seem to be very complex to me. Another option is shifting to OSS tools using Prometheus, OpenTracing, OpenTelemetry. But then you need to spend a lot of time in learning PromQL, HA setup, Storage and building Grafana dashboard.
All vendors doing tracing don't seem to sample data to enable metrics collected over traces and enable RCA which come at a huge cost of storage (the pricing plan of these vendors can make a small company sweat). Sending data when my application is running fine seems to add little value to cost.
I see a product gap that tries to address the low ticket-size users (< $2000 spend per month on APM) of all APM players with below plans and is based on OSS tools like Prometheus/OpenTracing/Opentelemetry:
Plan 1 - 40% of the cost by other vendors (only Metrics) - Converting OpenTracing instrumentation to useful Prometheus metrics like in chapter 11 of Mastering Distributed Tracing. A rather detailed metrics from APM perspective like RPS + Latencies + Slowest queries of Redis, Mongo, MySql, etc. Also, metrics aggregated by endpoints of the application.
Plan 2 - 60% of the cost by other vendors (Metrics + sampled traces) - Tail Based Sampling based on anomaly found by gathered metrics from plan 1. This will send only the trace needed for debugging the anomaly and thus will be a huge cost saver.
Plan 3 - 100% of the cost by other vendors (100% of traces) - Full-fledged enterprise plan sending full traces for RCA and better debugging.
My current understanding is all the APM players are focussing on higher ticket customers (Plan 3) right now.
Wondering if a solution that is lightweight and cost-effective ( < 2000 USD per month) built natively for Kubernetes will be interesting to you? What are the features you would like to see there?