DEV Community πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’»

Discussion on: Consider SQL when writing your next processing pipeline

Collapse
 
simkimsia profile image
simkimsia • Edited on

Thanks for writing this. I’m interested to learn more.

expressing pipelines

First the term β€œexpressing pipelines” I am not sure if I understand that term completely.

Isn’t it a case of say running queries to extract data for reports? Meaning pure read only.

Does it also include the case of extracting data from a data source and then inputting in another source? I.e. read and write

difference between running sql pipelines and something like blaze

Are you familiar with github.com/blaze/blaze?

So their philosophy is that as data grows bigger it’s easy to send code to data than data to code for processing.

Conceptually are you suggesting the same thing? Except you recommend directly using sql queries

Thanks

Collapse
 
benbirt profile image
BenBirt

Thanks for reading my post!

To my mind, a processing pipeline is anything that reads data from a number of source(s), joins/transforms/filters those data, and outputs the results to some number of destination(s). (Note that it is rare, but occasionally the output destination is the same as the input source.) So I would say both of your examples would qualify.

I wasn't familiar with Blaze, but having had a quick look, it does look like I am suggesting a similar approach, but indeed just going straight to SQL instead.

Collapse
 
simkimsia profile image
simkimsia

Actually when you define processing pipeline as "anything that reads data from a number of source(s), joins/transforms/filters those data, and outputs the results to some number of destination(s)."

You're talking essentially about ETL right?

Thread Thread
 
benbirt profile image
BenBirt

More or less, yes!