DEV Community

Cover image for What is CrateDB? ๐Ÿค” FAQ (1)
Carlota Soto for CrateDB

Posted on • Updated on

What is CrateDB? ๐Ÿค” FAQ (1)

๐Ÿ‘‰ The basics

CrateDB is a distributed SQL database, purpose-built for querying huge volumes of machine data in real-time.

  • It is open-source (Apache 2)
  • It is natively distributed, with a shared-nothing architecture, automatic data rebalancing, and automatic table partitioning
  • It performs aggregations, JOINs and sub-selects
  • Its schemas are completely dynamic, being possible to add columns anytime without slowing performance or downtime
  • Written in Java

๐Ÿ‘‰ Can I use CrateDB for time-series data?

Yes, CrateDB is actually an amazing choice for time series, especially if you have very high volumes: machine data is time-series data. CrateDB has excellent performance with real-time queries, together with group byโ€™s or general roll-up queries over a huge data set without losing accuracy. The dynamic schemas of CrateDB are also great for time series.

๐Ÿ‘‰ What we mean by "purpose-built for machine data"?

CrateDB was built with the features that are most important for handling challenging machine data use-cases, as industrial IoT. And its architecture was designed for this matter: for example, instead of focusing on strong consistency, CrateDB focuses on data availability and partition tolerance. Efficiency is another key pillar of CrateDB, which focuses on maximizing data availability. At the same time, the SQL access makes it easy for developers.

๐Ÿ‘‰ What's the story?

Some years ago, our founders (in lovely Austria) were working on some of the biggest websites in Europe at that time. They soon got obsessed with data, being inspired to build a database that gave Elasticsearch the SQL access it lacked. The first iteration of CrateDB was soon bornโ€”winning the TechCrunch Disrupt Europe award.

๐Ÿ‘‰ How far can I scale CrateDB?

CrateDB is built to make it easy to scale indefinitely. Sharding, replication, and data rebalancing are automaticโ€”to grow the database capacity, just add more nodes to the cluster... Or choose the ease of CrateDB Cloud to increase capacity anytime.

๐Ÿ‘‰ Why using CrateDB over other databases?

Our own users can answer this one better than me:

CrateDBโ€™s unmatched concurrency capabilities and simple scaling made it the best solution for us. We tried other solutions, including MongoDB, but it was difficult and expensive to scale for our needs
(Waseem Javid Nasiri, Senior Developer โ€“ Roomonitor)

Postgres couldnโ€™t keep up with the data we have; Datastax Enterprise had ingest scaling issues with spatial data; Cassandra didnโ€™t have spatial query operations. CrateDB was the only database we found that could smoothly process data for our users. We fell in love with it immediately
(Kartik Venkatesh, CTO โ€“ Spatially Health)

We didnโ€™t want to use a sharded and clustered MySQL database, since maintaining it would have been labor-intensive and use up our engineering resources. I started looking at CrateDB and was impressed by the quality of the code. Switching from MySQL to CrateDB took only a couple of days
(Jeff Nappi, Director of Engineering โ€“ ClearVoice)

CrateDBโ€™s ability to query enormous amounts of data expands the realm of whatโ€™s possible with Clickdrive. We tried a few different SQL and NoSQL databases, and CrateDB offered the best combination of high performance, scalability, and ease-of-use
(Mark Sutheran, Founder - Clickdrive.io)


There's nothing better than trying things by yourself! Download CrateDB or sign up for a CrateDB Cloud free trial. Experiment... And tell us what you think ๐Ÿ˜

Apart from Dev.to, you can reach to the Crate.io team in:

See you around ๐Ÿ

Top comments (2)

Collapse
 
iambudi profile image
I am Budi

Is it suitable to use cratedb as OLAP Database with a lot of aggregation and columns grouping?

Collapse
 
proddata profile image
Georg Traar • Edited

CrateDB isn't a typical OLAP database, as joins (you might need for star-schemas, etc.) are - although not impossible - rather expensive in a distributed manner.
However it performs very well on aggregations and groupings on very large data sets, as everything is distributed ;)