Code[ish]
Demystifying the User Experience with Performance Monitoring
In this episode of Codeish, Greg Nokes, distinguished technical architect with Salesforce Heroku, talks with Innocent Bindura, a senior developer at Raygun about performance monitoring.
Raygun provides tools and utilities for developers to improve software quality through crash reporting and browser and application performance monitoring.
According to Innocent, the absence of crash reports does not mean that software is performing well. Software can work - but not be optimal. Thus, Innocent takes a holistic view:
“I look at the size of my audience, and if it's something sizable, that gets a lot of traffic, for example, a shopping cart that gets a lot of traffic on a Black Friday. I would want to be in a comfort zone when I know that during the peak periods my application is still performing, so I tend to look at the end-user, how their experience looks like during very high peak periods. And from there I start working my way back to the technology that is supporting that application.”
Raygun really shines in monitoring the time spent in different functions and helping to improve the performance of highly hit endpoints. This includes performance telemetry of browser pages, the current application running, and server-side performance application monitoring. Raygun has lightweight SDKs or lightweight providers that can be injected into code. These provide a catch-all to deal with unhandled exceptions. They also encourage best practices for developers.
Greg asks how to track a user's journey through the application in order to see the endpoints being hit, and the user experience. A RAM tool can provide opt-in user information. In the case of Javascript, an SDK is integrated with code to create a session ID that follows the user through every single page that they visit. This internal ID can also be associated with crash reports.
Over time, Raygun can provide a complete picture of how the user session performed “from the point they visited your page, logged in, visited a couple of pages, and then left your application. The crash reports and the traces relating to that particular user are also tied up with that session on the Raygun side.” Innocent highlights a sampling strategy that reduces the noise of APM data.
Raygun also provides a birds-eye application view that provides aggregated stats on application performance: “For the run product, you will have each page aggregated over time, regardless of how many users you've had in a period of time. You want to look at the individual sessions. That information is aggregated and you're able to see, for example, your median, your P90, and P99.”
Innocent focuses on the P99 figure because “whoever is in there has had a terrible time, and that forms the basis of my investigations. I want to know why there are so many sessions in that P99, and that P99 is probably a six or seven-second load time. I want to move that to a sub-three-second.” Innocent provides a definition of P99 for new customers undergoing the journey of performance optimization.
Next, Innocent asserts that decisions should be based on numbers and empirical evidence. He has found that the use of actionable data has enabled him to redesign applications and focus on the mission-critical command needed in real time.
Innocent concludes: “I think the life of a developer is an interesting one. We fit in everywhere situations permit, and we definitely take different routes to develop our careers. But ultimately what we should all be concerned about is the quality of the products that I produce. This definitely reflects on my capability as a software developer. What sets me apart from the next developer is not the number of cool techniques I can do with code, it's delivering a product that actually works and what better way of knowing what works when you actually measure things. Everybody should live by the philosophy of assuming nothing, measure everything. Everything and everything should be measured.”