BigQuery has become a major player in the field of Data analytics solutions. It provides an ever-growing list of powerful features in an easy, performant and cost-effective way. However BigQuery definitely is OLAP, while the sensible option for an application backend is OLTP.
Therefore, using BigQuery as a backend may seem weird, and... it is indeed! Just to scratch the surface of many problems that would arise:
- BigQuery is not optimized for writing, but for performing complex queries.
- Single-line inserts are discouraged.
- BigQuery does not enforce keys, foreign keys nor constraints
- BigQuery does not perform well with normalized schema, on the contrary it encourages denormalization
So, if you think about BigQuery to be the storage backend of a full-blown application, you should definitely think again.
There can be some situation though, where this design has some interest...
Think of a huge analytics data platform revolving around BigQuery. Now imagine a corner case of the platform where some configuration have to be stored in a relational fashion, and this configuration data will have an impact on how data will be accessed. In order to update the configuration, an API is exposed. There are now 2 options to store the configuration values:
The normal way: set up a transactional database like PostgreSQL or MySQL, maybe through CloudSQL since we're dealing with the Google Cloud Platform. Use it as application's storage backend. Setup redundancy and backup strategy. Sort out IAM permissions. Query configuration data from BigQuery via federated queries. All of this will of course cost you some extra dollars.
The hacky way: store the configuration in a BigQuery dataset somewhere, and profit of the near-free hosting and redundancy provided by this serverless database. Use the bigquery client API to integrate with the application. Query like any other dataset.
Don't try this at home ! This stunt is being performed by professionals
All warnings having been exposed, let's proceed with the implementation. It's not really difficult because Google provide BigQuery client libraries for many languages. It is also possible to use the REST Api.
The problem is, even if you agreed on cutting corners by not provisioning a proper OLTP database, you may still need to enforce some constraints.
Imagine that your configuration table consists in the following schema:
| Id (STRING) | Attribute (STRING) | Value (STRING) |
And you want to enforce that
Id values are unique. Normally, this would be a simple
UNIQUE(Id) statement in OLTP databases. But such a statement doesn't exist in BigQuery.
Luckily, there is a new feature of BigQuery to the rescue: Transactions
Multi-statement transactions feature is covered by Pre-GA offerings at the time of writing. It enables the wrapping of standard SQL scripts into atomic transactions.
So, to compensate the absence of a traditional UNIQUE constraint, we can implement the following sequence when saving or modifying an entry into the table:
- 1) Open a transaction
- 2) Search if the Id to save already exists, raise an error if found
- 3) Insert the new entry
- 4) Commit the transaction
Here is an implementation of this script:
-- 1 BEGIN TRANSACTION; -- 2 SELECT * FROM ( SELECT COUNT(1) AS conflict FROM `configds.configtable` WHERE Id=@input_id ) WHERE IF (conflict=0, TRUE, ERROR("Id already exists)); -- 3 INSERT INTO `configds.configtable` VALUES(@input_id,@input_attribute,@input_value); -- 4 COMMIT TRANSACTION;
At step 2,
ERROR() will automatically rollback the transaction, so that step 3 will not occur. Named parameters are used here to protect against SQL injection.
Call this script from the application backend on config saving, and this will protect the table against concurrent inserts from the application's clients.
As mentioned earlier, this way of implementing constraints is not to be generalized. Here are the most prominent limitations:
- Transactions are only Pre-GA at the moment
- The unicity is enforced at the application level, but not at the database level. Nothing prevents another BigQuery client, like the BigQuery console itself, from inserting rows regardless of unicity of Id. Only the application is safe
- Performance is very poor: it takes several seconds to perform the script.
- Not supported by ORMs, you have to write plain SQL queries and be careful of SQL injections
The new BigQuery Multi-statement Transactions feature enables the usage of BigQuery as a somewhat-workable application backend, which can come in handy if used with high caution. Still, carefully consider the trade-offs vs a traditional OLTP database, and be prepared to defend your choices if you follow this path !