This summer, I had the incredible opportunity to join the LFX Mentorship program, where I contributed to the open-source project Thanos, a tool designed to improve system monitoring on a large scale. The mentorship connects aspiring contributors with impactful open-source projects, providing hands-on experience alongside experienced engineers. I owe a special thanks to my mentors, Michael and Saswata, whose guidance and expertise were instrumental in my learning journey.
Prometheus and Thanos are powerful tools used to monitor complex computer systems. While Prometheus collects real-time data from applications and systems, Thanos enhances Prometheus by enabling scalability, long-term storage, and cross-instance querying. Think of Prometheus as a “watchdog” keeping track of system performance, while Thanos ensures that data can be efficiently stored and retrieved, even in large, distributed environments.
Thanos and Its Components
Thanos was created to solve key challenges in Prometheus, such as:
- Horizontal scaling: Prometheus struggles with handling data across multiple servers.
- Querying across instances: It’s difficult to retrieve data from multiple Prometheus instances.
- Costly long-term storage: Storing data for extended periods in Prometheus can become inefficient and expensive.
To solve these issues, Thanos introduces several key components:
- Query Frontend: Balances and optimizes user requests to make data queries more efficient.
- Query: Gathers and processes data from multiple sources.
- Store Gateway: Retrieves and manages access to long-term stored metrics.
- Compact: Compresses and optimizes data blocks for storage, reducing overall data size.
- Ruler: Evaluates predefined rules and triggers alerts based on specific conditions.
- Receive: Accepts data from Prometheus or external sources and stores it for later use.
Each component plays a crucial role in making Prometheus more scalable and adaptable to large-scale environments.
Thanos Architecture
The following diagram illustrates how Prometheus interacts with various Thanos components, such as the Query Frontend, Compact, and Ruler, showing the data flow between collection, querying, and storage:
My Contributions to Thanos
Throughout the mentorship, I had the opportunity to work on two specific areas that directly improved Thanos’ functionality.
Improving the Visibility of the Compaction Process
A key contribution I made was adding a feature to display Planned Blocks in the Thanos UI. Previously, users could see the Global Blocks and loaded blocks, representing the current state of stored data, but they had no insight into Planned Blocks, which represent upcoming tasks. With this new feature, system administrators can see which blocks are scheduled for compaction, giving them a better understanding of upcoming operations and enabling them to manage system performance more proactively.Enhancing Rule Evaluation Warnings
Another key contribution was improving rule evaluation warnings. I added ‘file’ and ‘group’ labels to the warning metric, making it easier for developers to identify which rule file and group triggered a warning. This update provides more clarity for debugging and helps improve alert setups, leading to faster and more efficient troubleshooting.
Challenges and Lessons Learned
This experience wasn’t without its challenges, and overcoming them was an essential part of my personal and professional growth:
- Learning Golang: Golang’s structure differs significantly from object-oriented languages, so I had to adjust to its unique error-handling approach and concurrency models.
- Understanding Thanos’ Architecture: With its interconnected components like Query, Ruler, and Receive, learning how each piece fits into the larger system was a complex process that required thorough research and hands-on experience.
- UI Embedded in Binary: Thanos UI files are embedded into the binary, meaning that making updates to the UI required working with specific tools to regenerate the static files and test them within the system.
Final Thoughts
Contributing to Thanos through the LFX Mentorship has been an incredible experience. I learned the intricacies of large-scale system monitoring, developed my Golang skills, and played a part in improving a tool that’s used by teams all over the world.
If you’re considering contributing to open-source, I highly recommend taking the leap—there’s so much to learn, and it’s a fantastic way to make a tangible impact in the tech community.
Top comments (0)