- When should we use one over the other?
- Which is the best approach?
- What is the criteria used to guide the decision?
For further actions, you may consider blocking this person and/or reporting abuse
For further actions, you may consider blocking this person and/or reporting abuse
mahdi -
Michael Di Prisco -
Arpit -
Alex -
Top comments (4)
Production data has some issues:
It's still important to test against production data -- second. Functional coverage is more important, and you can only be sure of testing the most possibilities if you generate fixture data. If you're working in a smaller team, testing against live data sets will likely be all manual.
Fixtures are tricky to do right, and the obvious solution of a monolithic test dataset is a dead end for reasons best explained by Jorge Luis Borges. I wrote something a while ago about a more flexible modular approach based on the post-structuralist idea of rhizomes, and published a drop-in JavaScript implementation; the PHP O/RM Doctrine does something similar as well.
We have tremendous fun with 100+ suppliers (API vendors of various sorts), who all bring some sort of test interface, frequently synthetic data, none of them compatible with each other...what to do?
This is super hard. I do not quite understand how people can abstract away the complexity of data and state. We seem to do this for configuration and for systems but when it comes to a user has this property at this time with this value then everything goes out the window. I am still not sure what the best approach is, maybe if the system is small enough but crossing boundaries of systems it feels like this all goes out the window. An alternative approach to test data might entail capturing the state of a user at a given time and reproducing that in the staging system or disabling changes in the production system for that user to reproduce an issue. This feels like one of the last properties software teams think about in development.
Synthetic data is often generated to represent the production data.
It is normally used to protect privacy and confidentiality of production data, e.g. in testing and creating many different types of systems such fraud detection and churn prediction systems.
There is a number approaches to generate synthetic data described by the folks from Synthesized (synthesized.io/) in this blog post
blog.synthesized.io/2018/11/28/thr...