Data Models and Query Languages

#ddia #distributedsystems #summary

In complex applications, APIs could be built upon APIs. Each data model hiding the complexity of the layers below it by providing a clean data model.

One of the best data models known today is SQL, based on the relational model.

Most application development today is done in object-oriented programming languages and therefore a translation layer is sometimes required between the objects in the application code and the database model of tables, rows and columns - aka impedance mismatch. This is where object-relational mapping is used (e.g. of ORM is sqlalchemy in python).

Using document stores or JSON representation can sometimes help reduce the impedance mismatch but this data model also has its disadvantages (to be explored further in later chapters).

JSON representation has better locality than multi-table relational models. e.g. to obtain a piece of data you might need to perform multi-way joins in a relational database whereas in a document model, all the relevant information is in one place.

Relational vs Document Databases

Document databases offer schema flexibility and better performance due to locality. Relational databases allow you to perform joins and support many-to-one and many-to-many relationships.

If your application needs to access "entire" datasets then the locality of document databases can be advantageous. If data is split across multiple tables, multiple index lookups and joins can require more disk seeks and therefore add time.

On the flip side, document databases typically load the entire document even if you only want to access a small portion of it.

NB: Locality is not a feature offered only by document databases. Google Spanner DB offers locality properties by allowing schema to declare that a table's rows should be interleaved within a parent table. Other such offerings also exist.

Query Languages for Data

Imperative programming languages describes the exact operations and the order of those operations (e.g. Python, JS etc). Declarative languages (e.g. SQL) on the other hand just describe the data you want, any conditions the results must meet and any transformations (e.g. group by). It is up to the database system's query optimiser to decide how to achieve the result (using joins, indexes etc).

Declarative languages have a better chance of faster parallel execution since they only specify the pattern of results and not the actual algorithm to obtain the results.

(summary of chapter 2 from Designing Data-Intensive Applications by Martin Kleppmann)

DEV Community

Data Models and Query Languages

Relational vs Document Databases

Query Languages for Data

Top comments (0)

Read next

Day 22: How the Tables have turned 🏓

I used [ Figma] to bring this concept to life and truly enjoyed every step of the process.

Language Models Boost Robot Learning with Limited Training Data

New AI Model Processes Text 4x Faster While Using 75% Less Memory