This is mostly a question of design and implementation, with regards to a quasi-multitenant application.
Let me try to explain the current system, and what the goals are with how the application should work.
I'm creating an application for a client, who in part as their own clients (we'll call these sub-clients going forward). The sub-clients do not have access to the system. The core of the application is to store sub-client data in separate databases and be able to download exported versions (filtered, cropped down etc.) of the data in the tables. Basically the analytical staff need a quick way to export database data for use in other applications.
The frontend of the application is the same regardless of which sub-client a users selects to see data from, the only difference between them is the data that is being stored.
Here is where it gets interesting, and while I have implemented a solution, I'm not sure if it is a good one.
Each sub-client has a potentially unique database schema. While this is fairly easy to get around using node + knex.js, using what seems like any other setup (like Go, or Rust), not knowing your SQL schema is a major pain. This becomes increasingly difficult when trying to add in automated data insertion, where users can upload data files to be added to the database. Without knowing the structure of the tables it is very hard to verify what they are trying to upload fits.
Not using an ORM makes this a bit easier, as with knex I can just query for each table's columns prior to continuing on, or I can just use
SELECT * in my queries and just get everything, which for displaying the tables in the browser is what is needed. Data is currently uploaded/updated semi-manually using some python scripts with pandas.
The other potentially hard part of this is that new sub-clients may need to get up and running quickly, and they may have a fair number of tables (10-20 at the start). With just using pandas to quickly dump the data into the database you can get going quite quickly, but adding to the data then becomes a bit more of a task. On the other hand, if we used an ORM, we would need to create new models for each table for new sub-clients, prior to that sub-clients data being accessible and updateable.
Being able to take the data insertion capabilities out of my sole responsibility would be a huge asset to this project, as then I am not the bottleneck in getting new data into the application while also running the devops on it as well.
I have been reading up on multi-tenancy, but I'm not totally sure that is what this application is. It both is and isn't. There is only one version of the application, and it is only available to one group of people. The data that the application uses, pertains to various different groups, who have different data structures.
I keep feeling like there must be a better way to handle the database. With each sub-client having a different schema, it makes using something like an ORM almost impossible from a maintenance perspective, but not using an ORM could lead to improper data being inserted into a table.
I think a part of this is when using python for data extraction you don't have to worry about types as pandas + sqlalchemy do a good job of fitting your data into the tables. When using more strongly typed languages, without knowing what data is coming in and the structure, everything ends up being strings, when most of the data is numerical. While I don't really like python, the flexibility in data management does make it easier.
Does anyone have suggestions of how I may better approach a nice solution?