Podcast.__init__
Gnocchi: A Scalable Time Series Database For Your Metrics with Julien Danjou
Summary
Do you know what your servers are doing? If you have a metrics system in place then the answer should be “yes”. One critical aspect of that platform is the timeseries database that allows you to store, aggregate, analyze, and query the various signals generated by your software and hardware. As the size and complexity of your systems scale, so does the volume of data that you need to manage which can put a strain on your metrics stack. Julien Danjou built Gnocchi during his time on the OpenStack project to provide a time oriented data store that would scale horizontally and still provide fast queries. In this episode he explains how the project got started, how it works, how it compares to the other options on the market, and how you can start using it today to get better visibility into your operations.
Preface
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so check out Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute.
- And to keep track of how your team is progressing on building new features and squashing bugs, you need a project management system designed by software engineers, for software engineers. Clubhouse lets you craft a workflow that fits your style, including per-team tasks, cross-project epics, a large suite of pre-built integrations, and a simple API for crafting your own. Podcast.__init__ listeners get 2 months free on any plan by going to pythonpodcast.com/clubhouse today and signing up for a trial.
- Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email hosts@podcastinit.com)
- To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
- Your host as usual is Tobias Macey and today I’m interviewing Julien Danjou about Gnocchi, an open source time series database built to handle large volumes of system metrics
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by describing what Gnocchi is and how the project got started?
- What was the motivation for moving Gnocchi out of the Openstack organization and into its own top level project?
- The space of time series databases and metrics as a service platforms are both fairly crowded. What are the unique features of Gnocchi that would lead someone to deploy it in place of other options?
- What are some of the tools and platforms that are popular today which hadn’t yet gained visibility when you first began working on Gnocchi?
- How is Gnocchi architected?
- How has the design changed since you first started working on it?
- What was the motivation for implementing it in Python and would you make the same choice today?
- One of the interesting features of Gnocchi is its support of resource history. Can you describe how that operates and the types of use cases that it enables?
- Does that factor into the multi-tenant architecture?
- What are some of the drawbacks of pre-aggregating metrics as they are being written into the storage layer (e.g. loss of fidelity)?
- Is it possible to maintain the raw measures after they are processed into aggregates?
- One of the challenging aspects of building a scalable metrics platform is support for high-cardinality data. What sort of labelling and tagging of metrics and measures is available in Gnocchi?
- For someone who wants to implement Gnocchi for their system metrics, what is involved in deploying, maintaining, and upgrading it?
- What are the available integration points for extending and customizing Gnocchi?
- Once metrics have been stored, aggregated, and indexed, what are the options for querying and analyzing the collected data?
- When is Gnocchi the wrong choice?
- What do you have planned for the future of Gnocchi?
Keep In Touch
- jd on GitHub
- Website
- @juldanjou on Twitter
Picks
- Tobias
- Julien
Links
- Gnocchi
- RedHat
- OpenStack
- Object Oriented Programming
- O’Reilly
- Debian
- Ceilometer
- Prometheus
- Time Series
- MySQL
- Gerrit
- Zuul
- GitHub
- GitLab
- Graphite
- DataDog
- RabbitMQ
- InfluxDB
- Ceph
- S3
- OpenStack Swift
- Cassandra
- Honeycomb Observability Service
- AMQP
- Redis
- DSL (Domain Specific Language)
- Golang
- RBAC (Role-Based Access Control)
- CollectD
- StatsD
- Gnocchi Client
- Telegraf
- Grafana
- TimescaleDB
- OpenStack Heat
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA