Discussion on: What differentiates schema on read from schema on write?

View post

The first thought I had was, if the extra column is unnecessary for some consumers, like the data analyst, traditional relational databases have the select statement to select only some columns, or there are other operators that can help view only some parts of the table. Would that also be a solution to that problem?

Another thought I had was - in between I got the sense that - data collection is easier if schema is not checked before writing data, which is the case for traditional relational databases with schema already defined, so, we could use databases / stores that don't enforce schema. But when reading does it mean the if a data does not conform to the schema I expect, then that data alone won't be parsed / read ?

Krithika • Jun 25 '20

Using select statement for selecting only some columns would work. Still, in big data world, where we are storing data in denormalized format, there might be more than 100 columns. So, a consumer has as additional overhead to find and select the columns they need. This can be overcome by having schema based on specific needs of the consumer. The consumer need not even know about other columns.

When reading the data, if schema does not conform, we would face issues in reading the data. Eg: If my schema has a column which data does not have, I cannot read the data based on that schema.