This post originally appeared on the Scout blog.
If you are hosting your application with Heroku, and find yourself faced with an unexplained error in your live system. What would you do next? Perhaps you don’t have a dedicated DevOps team, so where would you start your investigation? With Scout APM of course! We are going to show you how you can you use Scout to find out exactly where the problem lies within your application code. We are going to walk through two of the most common Heroku error codes and show you how to diagnose the problem with Scout quickly and efficiently.
First of all let’s take a look at how a typical error occurs for Heroku users and then we will look at how we can use Scout to debug the problem. Any errors that occur in Heroku result in an Application error page being displayed to the user with a HTTP 503 status code indicating “service unavailable”. You have probably seen this page many times before if you are a frequent Heroku user.
As a developer, this doesn’t give us much to go on unfortunately. But we can see the specifics of the error, by digging into Heroku’s logging system. We can view these logs on the Activity tab of the Heroku Dashboard or by using the Heroku CLI application:
$ heroku logs --tail
Heroku uses the letters H, R and L in its error codes to differentiate between HTTP errors, runtime errors, and logging errors respectively. A full list of all the different types of errors that Heroku reports can be found here. As you can see from the screenshot below, the error being shown by Heroku in this case is a H12 error.
So now that we’ve checked the logs and found out what type of error occurred, we can cross reference this with the Heroku’s documentation and just fix it, right? Well, not really. You see, these error descriptions are a little vague, don’t you think? They clearly tell us what happened but they don’t really tell us why they happened. This is the job of an Application Performance Monitoring (APM) tool like Scout. When you need to know why and where, Scout should be your go-to tool.
Fortunately, the Heroku logs do tell us when the error occurred and what type of error occurred, so now let's jump into Scout APM and investigate what happened at that time of the error using two common example scenarios.
Heroku throws this error when a request takes longer than 30 seconds to complete. In Scout, any request that takes longer than 30 seconds to complete should show up very clearly on the main overview page as a spike in the chart, because this type of response time would be dramatically different from your usual traffic. So that would be the first place to start your investigation.
Once you have found the relevant spike on the chart, drag and drop a box around it, and a handy list of endpoints that occurred within that spike will be presented to you. Pick the endpoint you think is the culprit and then pick a trace on that endpoint which occurred at the same time of the error.
At this point, we can examine the trace and see a breakdown of where time or memory was spent during this trace, organized by layer. For example, we might see that a large portion of time was spent on a database query that originated from a line of code in one or our models, and if that is the case then we can see a backtrace of that query and the line of application code.
But in this specific example below, we can see that a large proportion of the time is actually being spent in the Edit action of the Articles controller. In fact, we can see that it took 40 seconds to process this controller action, and this is the reason that Heroku timed out with a H12 error code.
We can debug this particular example even further by enabling the AutoInstrument feature, which further breaks down uninstrumented controller time, line-by-line. Now we can see an exact line of code within our controller and a backtrace which shows where our problem lies. Here we can see that the reason that this application ran for 40 seconds was because of a call to sleep(40) on line 22 of the edit action. Now this line of code was obviously put there just for demonstrative purposes, but it gives you an idea of the level of detail that you can get to when given very little information to go on.
- Have you ever experienced somebody telling you that they have had a timeout error on an endpoint but after accessing the endpoint yourself, everything looks fine? Perhaps you keep seeing this error message intermittently, at certain times of the day or only from certain users, but no matter what you do, you just can’t recreate the problem yourself. Maybe Scout’s traces by context can help you diagnose here.
- The Trace index page allows you to view traces by different criteria, and even by custom context criteria that you can define yourself. For example, you could view all the traces associated with a particular user, or maybe even all users on large pricing plans. Once you have found the offending trace, you can follow similar steps to what we described earlier to get to the root cause.
Probably the most common Heroku error code you will come across is the R14 “Memory quota exceeded” error. This error (and it’s older brother R15 “Memory quota vastly exceeded”) occur when your application exceeds its server’s allocated amount of memory. The amount of memory your server is allocated depends on which Heroku plan you are on.
When your application runs out of memory and the R14 error occurs, the server will start to page to swap space, and you will start to see a performance slowdown across your entire application. If it continues to increase and reaches 2x your quota then an R15 will occur and your server will be shutdown.
This is a great example of an error message that you might come across but are unable to debug without an APM tool. But fear not because analysing memory bloat is an area where Scout really shines in comparison to its competitors.
The first place that you can look for memory anomalies is on the main overview page of Scout, you can choose to show memory as a sparkline on the graph. In our example shown below, you can see how the memory usage shot upwards rapidly at a certain point in time. Furthermore, our Memory Bloat Insights feature (towards the bottom of the screenshot) identified ArticlesController#index as a potential issue to investigate.
When we take a look at the trace that occurred at this time (shown below), we can clearly see which layers are using all this memory (the View layer). Also we can determine from the backtrace that the cause is that there is no filter or pagination on the query, and so all records are being loaded onto a single page.
- A great example of a hard-to-trackdown memory issue is when only large users are generating R14 errors. In this case, viewing by context on the Traces page will help you find this issue.
- Looking at the Web Endpoints index page, you can filter by Max Allocations to find endpoints using a lot of memory. This gives you a different approach to finding endpoints to investigate rather than using the main overview page’s timeline.
- If you want to read more about the causes of memory bloat, and how you can use Scout to fix memory bloat issues, take a look at this article.
Are you using Heroku and coming across a different sort of Heroku error that we didn’t cover here? Contact us today on support and we’ll show you how to debug the problem with Scout APM!
Heroku is a fantastic platform for hosting your applications, that’s why it is so popular amongst developers. But it is also clear to see how useful an APM tool is for debugging critical production issues when they arise and overall application performance health monitoring. So definitely sign up for a free trial today if you are not currently using Scout!