Notes on The Every Computer Performance Book by Bob Wescott.
This is an introductory book to performance work that you wish you had read earlier. Perhaps not so useful if you’re already familiar with time series data, metrics, modeling, capacity planning, alerting, and so on.
Bob Wescott clearly explains concepts and makes it super easy to start getting into the right mindset to measure the performance of any system.
I marked some passages of the book for myself that I’m going to share here and I hope it instigates you to read it too. It’s a great book!
Right Tool For The Job
Performance Monitoring
“In performance monitoring you need to know three things: the incoming workload, the resource consumption and what is normal.”
Capacity Planning
“Capacity planning starts by gathering key performance meters at a peak time on a reasonably busy day. Almost any day will do, as long as the system load is high enough to clearly differentiate it from the idle system load.”
“Many key resources do not have a utilization meter, and the ones that do can lie to you. These resources take a little more work and creativity to capacity plan for, but this is completely doable.”
Load Testing
“For a load test be really useful it must test the entire part of the transaction path that you care about. If your product is your website then you need to test from where your users are: all the way in and all the way back to where they live.”
Modeling
“How accurate does a model need to be? The flip answer is: accurate enough. A useful answer is: accurate enough to answer your question with a reasonable margin of safety.”
Useful Laws And Things I’ve Found To Be True
Finding the service time a different way
“It is easier to scale a number 160 milliseconds CPU than a more complex number like 4% CPU busy.”
Seeing is believing
“Since the human mind is optimized to find patterns visually, not to comprehend thousands of numbers at a time, the data is often best understood in a chart.”
“Put up a slide full of numbers and you’ll lose your audience in seconds.”
Extraordinary claims
“An extraordinary claim requires extraordinary proof.” — Marcello Truzzi
“If your conclusions are controversial, disruptive, or very expensive to fix, then you will need overwhelming proof because there will be strong resistance to your findings.”
The hidden bottleneck
“When capacity planning, it is important to explain this drawing to the decision makers so they comprehend how one bottleneck can hide a downstream bottleneck.”
“Most corporations only learn through pain.”
“If that problem only inconveniences others, and the fix required is non-trivial, or costs money, 99% of the time they would do exactly nothing.”
Performance does not scale linearly
“at some point in any application all roads lead to a place where key data has to be protected from simultaneous updates or a key resource is shared. At some point more resources don’t help, and the throughput is limited. This is the bad news buried in Amdahl’s Law - which is something you should read more about.”
https://en.wikipedia.org/wiki/Amdahl's_law
Application knowledge
“don’t panic if you don’t know every little detail of an application. You can do spectacularly useful work knowing almost nothing about how a given process transforms the bits.”
All meters are bad
“You need metering data to do any performance work but metering data never perfectly adds up, never aligns 100%, and typically contains many numbers that are meaningless to you.”
“If meters that should agree differ by more than 10%, and that 10% is significant to your work, then try to solve that mystery as that will teach you new things about the meters.”
The perils of serving undercooked ideas
“Reporting early results, before you’ve double-checked your work, is the road to disaster. You wouldn’t give a hungry guest raw chicken; don’t give hungry coworkers your raw, unchecked, conclusions.”
Getting an answer from a group
“The best way to get an opinion from a group is with the Delphi technique.”
- Ask multiple experts for their opinion privately.
- Consolidate those opinions removing the names.
- Feed the consolidated opinions back to the experts.
- Repeat and watch their thoughts converge.
https://en.wikipedia.org/wiki/Delphi_method
Performance Monitoring
What you need to know about any meter
“First and foremost, you need to know when the data was collected because no meter is an island.”
“The more precisely you understand what the meter measures, the more cool things you can do with it.”
Begin where you are and grow
“To begin, just begin. Start with whatever metering is in place. Don’t wait for perfect meters or perfect understanding of the meters. There will always be some mystery in the meters. That’s OK.”
Collect meters all the time
“if you want to solve the problems that pop up out of nowhere, you really need to have continuous monitoring.”
“Paying attention to the good and bad news in the meters results in less wasted time and a more focused approach to finding the root cause of this problem as your metering data shrinks the problem space.”
Collect meters at the right frequency
“Performance data is usually noisier than the pretty graphs we draw, and the lines on those pretty graphs can fool us into thinking we know more than we do. When someone or some tool shows you a pretty graph be sure to ask what the sample frequency and sample length are if you really want to get something out of it.”
Meter end-user response time
“Some errors are unavoidable. You will always see a few of them in the data. The key is to know what’s normal. When monitoring errors, notice when there are a lot more errors than usual for a given transaction rate. Investigate that.”
Collect capacity data
“Keep and eye on what you can run out of, notice trends, and use that data to help you plan to bring new resources online at a convenient time of your choosing well before a crisis.”
Switching meters
“There will come a time when new meters will replace the old meters. Sometimes this is your choice, and sometimes it is a corporate decision to standardize on a given tool. In any case, th key thing to do is run both metering systems in parallel so that you can see if the old and new meters tell the same story.”
Confidence in a small sample
“Suppose you are comparing 100 samples of response time data before and after an upgrade to see if things are better or worse. Before the upgrade the average response time of 100 transactions was 4.5 seconds and after it was 4.1 seconds. To be sure a small difference is a real difference, you need to calculate the confidence interval.”
How to meter a short-duration problem
“Like a physician, your primary goal should be: First, do no harm.”
“correlation does not always mean causality”
“If the problem has no known trigger and seems to happen randomly, you’ll have to intensively meter for it until it happens again.”
How to meter for load tests
“If key resources are unusually idle or busy during the load test then the load test itself is not doing the best job of replicating a user-generated load.”
“The higher the utilization, the more precise you have to be, because at high utilizations a small increase in utilization can cause a big increase in response time.”
“Since load tests typically don’t simulate all transactions, they tend to under-utilize things. If some resource is close to a limit, you still might recommend adding more just to be safe.”
How to meter for building a model
“you never want to plan for anything to be at, or near, 100% busy at peak”
Preserve your data
“Never delete old data. Compress (zip) it and leave it for future generations to explore or ignore.”
Capacity Planning
What capacity planning can do for you
“Capacity planners (i.e. you) should learn from previous failures. If you run out of something, figure out how to meter that thing, and add it to the next capacity plan. Don’t fall in the same hole twice.”
Safety margin
“Every company has a level of corporate courage and a certain aversion to pain. These attributes are shaped by their people, their culture, how much money they have to spend, and by their recent disasters.”
“Capacity planning looks only at the question of “enough”.”
How busy is too busy
“It is not that hard to write a program that keeps some device busy, but busy is not the same thing as backed-up with work. The real response time pain of a busy device comes from the line of transactions waiting to run before you do.”
How busy is too busy for a process
“CPU consumption for a process does not tell the whole story of how busy it is.”
“Processes wait for things, and when they wait, they consume no CPU.”
Doing the math of capacity planning
Utilization * Scaling Factor * Safety Margin = Projected Peak
Disk usage
“Way too much of a resource can be a problem, too. Not a performance problem, but a political one. Budgets are tight and “wasted” resources are targets.”
Monsters under the bed
“Capacity planning should not be sold as guarantee that all will be well at the next peak. No matter how good a performance person you are, you can’t offer that guarantee. Capacity planning is more like a pre-trip checklist to ensure you have what you need, and all systems on this list a good-to-go.”
The human response
“Adjustments can be made, but do what you can to make sure the plan sticks to the truth. If your boss tries to force you to lie or profoundly fudge the numbers or “reframe the truth”, do your best to resist.”
Capacity planning limits
- “Modeling is better for evaluating the relative wonderfulness of potential design strategies, as no expensive and time consuming code/rewiring is required to run the model.”
- “Load testing can only be done on things that exist.”
- “Modeling and load testing can predict future response times under load, but most managers I’ve worked with feel much more confident in believing load test results.”
- “Modeling is often an easier way to thin out the bad ideas by quickly showing that proposals 1 and 3 will not work.”
- “Load testing is the best way to show that things that require third party content will work with reasonable response time under a given load, as you can’t model the third party.”
Load Testing
Test validation
“you are testing the load test itself to see if it emulates a moderate real user load with “good enough” fidelity. The key output of this test is confidence in the test, not the charts and graphs it creates.”
Running a load test
“First validate that the workload you are bringing to the system mimics the live users well enough that you’ll have confidence in the results when you emulate the big upcoming peak.”
“When setting goals for the load test, you should have well-defined upper limits for response time and number of errors.”
“After you do a big load test, you should take time to review your notes, logs, and memories looking for things that worked well and things that did not. Improve your tools for next time. This is also a good time to spend time publicly thanking those who helped you.”
Modeling
Brainstorm, Refine, and Choose
“To build a model to find an answer to a question you first have to guess the answer; then you can build a model to see if it is really the answer. To guess the answer, you first brainstorm a list of possible answers and then thin that list down to the best candidates.”
Capacity models for a changing workload
“No model is perfect. Simplify the model as much as you can. As long as the model gives you an answer that you can trust and explain clearly to others, then you are good to go.”
To boldly go…
“Choose the modeling technique based on the question you are trying to answer.”
“Always, begin, and end, with the question.”
Presenting Your Results
To reveal the future…
“How clearly and convincingly you present your results determines how successful you are.”
Designing your presentation
“When writing and presenting be a minimalist.”
“Make sure that each point you make requires your audience to remember no more than two numbers at the same time.”
“People like a repeating pattern of information in a presentation. They find this comforting and an aid to overall understanding.”
Preparing for your presentation
“If you don’t deeply trust in the results, then that lack of trust will show on your face, and whatever you say won’t matter.”
“Check everything. Check it twice.”
Giving your presentation
“The bigger the problem, and/or the bigger the cost to fix it, the higher up the management chain you will present.”
“There’s an interesting book called The Paradox of Choice by Schwartz that points out that the typical human response to too many choices is to make no choice at all.”
https://www.goodreads.com/book/show/10639.The_Paradox_of_Choice
“There’s a temptation to use a dramatic style when presenting the results of your work because you naturally want to tell a story that builds in excitement and drama and finishes with thunderous applause. That is a fine thing to do, but it works much better if you tell them very early in the presentation that all will be well. Then the audience can relax and enjoy the ride.”
“It is the impression of you that lasts the longest, so try very hard to be helpful, accurate, insightful, creative, and clear.”
Bob’s rules of performance:
- “The less a company knows about the work their system did in the last five minutes, the more deeply screwed up they are.”
- “What you fail to plan for, you are condemned to endure.”
- “Most corporations only learn through pain.”
- “If they don’t trust you, your results are worthless.”
- “Always preserve and protect the raw performance data.”
- “The meters should make sense to you at all times, not just when it is convenient.”
- “To a corporation, nothing is more important than money. Follow the money.”
- “If the response time is improving under increased load, then something is broken.”
- “You’ll do this again. Always take time to make things easier for your future self.”
- “Ignore everything to the right of the decimal point.”
- “Always serve bad news with a side order of possible solutions.”
- “Never offer more than two possible solutions or discuss more than three.”
- “Always tell the truth in a kind and helpful way.”
If you liked this post, consider subscribing to my newsletter Bit Maybe Wise.
Top comments (0)