DEV Community


Discussion on: Consider SQL when writing your next processing pipeline

mdpopescu profile image
Marcel Popescu

If you want to check exactly what data is being output by any given processing stage of a SQL pipeline, you can simply pull out those results into some relevant SELECT query.

Ok. You have a 1200-line stored procedure that returns garbage. What exactly do you do to figure out what the "relevant SELECT query" is?

Oh, I forgot: the stored procedure also modifies some tables, because why not? That means you can't easily re-run it without affecting data, which can create heisenbugs. Good luck! :D

benbirt profile image
BenBirt Author

Yup, this would be a tricky situation! To be honest, my advice would be:

  • only run SQL that is checked in to some source control somewhere - this rules out stored procedures, at least unless you have some release process that you trust that pushes your code to a stored procedure
  • figuring out the relevant SELECT is a matter of (a) having the original SQL be modular enough that you can easily do so, and (b) a reasonably standard debugging procedure (look at the output, if it's wrong then look one step before the output, etc)
  • it's bad practice to be running any form of processing pipeline that modifies underlying schemas, at least unless the modification is eventually consistent, for exactly the reason you mention