Would be curious to know if there are any best practices for testing in data science models? What would that look like?
I think writing unit tests for statement coverage and integration tests is the most important thing. You want to make really easy for yourself by writing a testing script that you can just run every time you have a new iteration to your model or the functions you use for preprocessing so that you know you haven’t broken anything and you can trust that your code works.
I’ll start writing some documentation to provide examples and make this concept easier to digest but for now a quick google search might help you (: Hope this helps!
Right on - in your experience, is it something that happens with a lot of data science teams? (The writing of tests, I mean.)
I'm a weird convert towards testing, if it isn't immediately obvious hehe, but I'm always surprised at just how few tests can be found out there sometimes.
No, I don’t see a lot of testing in the data science world (: I agree, I think there could definitely be a lot more of it. It would make writing data models a lot easier to scale, instead of building code and fix models. But I feel like a lot of data scientists aren’t taught how to write good tests and that’s why they’ve been able to survive without it. Are you a data scientist? Where did you learn how to test?
We're a place where coders share, stay up-to-date and grow their careers.
We strive for transparency and don't collect excess data.