TL;DR notes from articles I read today.
- A high-quality pipeline must be fast. This needs quick feedback. To achieve this, let your CI tool parallelize all tasks that don’t have mutual dependencies and avoid running multiple checks together.
- Have pipelines reflect in the code and call shell scripts that also work locally for easier testing before pushing to deploy, enabling a faster feedback loop.
- To ensure your pipeline is reliable and reproducible, use containers to run each task in isolation and build the containers within the pipeline, a fresh container at each step.
- While a persistent workspace saves time, it can build in flakiness, for which a good tradeoff may be improving speed by caching dependencies instead of downloading them each time.
- Keep your pipeline highly visual and avoid over-abstraction. Visualization makes builds easy to understand and allows failed builds to be traced back quickly.
- Your system must be scalable across multiple pipelines. Avoid duplication (slows the pipelines down) and parametrize tasks instead, so that you configure them by passing variables, and build a library of tasks that lets you reuse code across pipelines, while also reducing coupling between tasks and pipelines.
Full post here, 10 mins read
- Analyzing logs is as, or more, important than logging. Only log what you intend to analyze.
- Separate production and logging (collecting, handling and archiving) so that log analysis does not create an additional load on production systems and also, logs are safeguarded from attackers trying to hide their trail.
- Transport logs to a centralized log server with appropriate access rights and archiving policies. Also, preserve the logs as raw as possible for later analysis and do not aggregate them in earlier phases.
- Before log analysis, ensure you have created a clear understanding of your system’s baseline behavior. You will then know what to log, how long to retain the logs, and can add flexible tools to help you analyze the logs quickly and effectively in any format.
- Enable automated reporting of event occurrences after setting baselines and thresholds. This way, you will be sure to look at logs whenever something important transpires.
Full post here, 6 mins read
- Common profiling measurements are CPU, memory, and frequency of function calls. There are two approaches to taking these measurements - event-based profiling & statistical profiling.
- In event-based profiling, you track all occurrences of certain events such as function calls, returns and thrown exceptions. Statistical profiling is about sampling data by probing the call stack periodically. And it is less accurate but faster, with lower overheads.
- Pinterest’s API gateway service is written in Python. So, for memory profiling, tracemalloc package was used to track memory blocks.
- To calculate operational costs, Pinterest needed to combine resource utilization data and request metrics that show the popularity of each endpoint. This helped them identify the most costly endpoints and they also identified the engineers/teams they belonged to. This encouraged ownership and proactive performance monitoring by the respective teams.
- Dead code - unused, unowned code such as old experiments, tests and files, even lines of code in a file never actually called on in practice - can clutter repositories. Pinterest used a standard Python test coverage tool to identify dead code and then got rid of it.
Full post here, 7 mins read