DEV Community

Max Myroshnychenko
Max Myroshnychenko

Posted on • Updated on

Things I wish I knew before starting a relational database

Why relational DAG

  • Checks on data inegrity means you can pass your project to the maintainers
  • Lower barrier to try out ideas on old data

Ability to try new features without breaking existing workflow

No-code parallization with linear scaling (up to n channels sppedup for preprocessing)

Workflow graphics

No-extra-work unit testing (in theory)

No extra work for usage documentation. This frees up devs to document features/science

Engineering process pointers

  • Write two schemas at a time, one you're thinking about now and its child
  • First, populate a few recording sessions, not all datasets
  • Once the current schema and its child seem to work fine, get them populating and move on the the next node.
  • Work on whole analysis simultaneously - no need to wait for all step to finish.
  • Can start next as soon as a few keys have populated.
  • My data entry stage is during recording when I make the filename
  • Keep Datetime as primary key for restrictions on populate calls (Not so sure anymore)
  • If you work on children and parents simultaneously, parents may not show up for a while. Wait until mysql processes them first.

Organizational principles

  • Separate phases of analysis: ingestion/condensing, chunking, computation, organization, plotting
  • baby steps at the top, cram many at the bottom - bottom is high overhead due to multiplication of keys (eg events), whereas dropping top nodes means recalculate all children

  • For populate calls, restrict by list of sessions if you suspect that some are not populating due to errors

  • if not all items are showing up at the end of populate, repeat with reserve=False

  • start with reserve=False for schemas that require little cpu

    Gotchas

  • Fractional primary keys must be stored as decimal (10,30)

Top comments (0)