Author: shiyuhang
Recently, we published a new release of dbt-tidb, an adapter that enables dbt to work with TiDB.
Here, I will introduce some experience of how to upgrade a dbt adapter to support new dbt-core. For how to build a new adapter, please refer to the official doc.
Version Rules
dbt-core follows the Semantic Versioning. So I suggest you also use Semantic Versioning when you publish your own adapter.
Although dbt-core will try to avoid break changes, compatibility issues still occur. For example, dbt-core v1.2.0 has added retry_connection
method in BaseConnectionManager
to support the retry connection feature. If the adapter implements this method, it can't run on dbt-core v1.1.0.
In order to avoid compatibility issues, dbt-tidb will follow the version number of dbt-core. For example, dbt-tidb v1.2.0 will only support dbt-core v1.2.0. I suggest you do the same for your adapter.
Investigation
When we support the new dbt-core, the first step is to investigate which features need to be supported.
Here are a few investigation methods you can use in combination:
Visit the official Version migration guides, which may include suggestions for adapter maintainers.
Check out the dbt-core release note, focusing on new features for adapters. Sort out the new features that need to be implemented.
Sometimes, the dbt official will list TODO in the Github Discussions. You can upgrade your adapter according to the discussion.
Refer to other adapters, you can find all the adapters apps in Available adapters.
Not recommended: You can also do nothing but only upgrade the version of dbt-core at the setup.py. If you are lucky, it will still work for the new dbt-core, but you can't enjoy the new features brought by the new dbt-core.
According to the investigation methods, dbt-tidb sorts out the features that need to be implemented as follows:
For dbt-tidb 1.1.0:
- Deprecate Python 3.7 and support Python 3.10
- Implement the new adapter testing framework
- Support multiple unique keys in incremental
For dbt-tidb 1.2.0:
- Support Connection retry feature
- Support grants feature
- Support Cross-database macros (some macros under the dbt-util package have been migrated to dbt-core
- Added BaseDocsGenerate and BaseValidateConnection tests
Test
I would like to introduce how to test before code because I strongly recommend using Test Driven Development(TTD) to develop the dbt adapter. That is: write the test first, and then implement the corresponding feature. If the test passes, the feature is considered to be supported.
Since dbt-core v1.1.0, dbt has provided a new testing framework for adapter maintainers. The new testing framework is included in dbt-core. With it, you can write your own tests easier. For more detail, you can refer to Testing a new adapter. dbt-tidb v1.1.0 started to use the new testing framework, and introduced the basic package to test the basic features.
dbt-tidb v1.2.0 adds the following tests based on the new features
basic package: add BaseValidateConnection and BaseDocsGenerate for testing connection and document generation
grant package: add a new grant package for testing grant features
util: add a new util package for testing Cross-database macros which are migrated from dbt-util
Let's take grants
as an example, according to the newly added tests in dbt-core v1.2.0. We need to add the following tests for grants:
class TestModelGrantsTiDB(BaseModelGrants):
pass
class TestIncrementalGrantsTiDB(BaseIncrementalGrants):
pass
class TestSeedGrantsTiDB(BaseSeedGrants):
pass
class TestSnapshotGrantsTiDB(BaseSnapshotGrants):
pass
class TestInvalidGrantsTiDB(BaseInvalidGrants):
pass
Here, we use pass
without any implementation, the tests will inherit the default implementation of the test framework.
Code
Implement the feature
Now, it is time to implement the features.
Generally, dbt can be expanded by overriding the default macros or some methods. About which one needed to implement, here is a reference:
- dbt officials may explain how to implement features in the GitHub discussions.
- Refer to dbt-core's implementation of this feature.
- Refer to the implementation of the other adapters.
According to the Upgrading to dbt-core 1.2.0 in the dbt-core's discussions. We find that grant is mainly implemented by overriding the macros of dbt-core, and the following macros need to be implemented:
get_show_grant_sql
Returns auth information, and the return format needs to be grantee (username in some adapter) + privilege_type.
For TiDB, we query the mysql.tables_priv
table to obtain the auth information. Then filter the given DB and table. Next poll Select, Insert, Update, and Delete. Finally, output in the format of grantee,privilege_type
The corresponding SQL is as follows:
{% macro tidb__get_show_grant_sql(relation) %}
select case(Table_priv) when null then null else 'select' end as privilege_type, `User` as grantee from mysql.tables_priv where `DB` = '{{relation.schema}}' and `Table_name` = '{{relation.identifier}}' and Table_priv like '%Select%'
union ALL
select case(Table_priv) when null then null else 'insert' end as privilege_type, `User` as grantee from mysql.tables_priv where `DB` = '{{relation.schema}}' and `Table_name` = '{{relation.identifier}}' and Table_priv like '%Insert%'
union ALL
select case(Table_priv) when null then null else 'update' end as privilege_type, `User` as grantee from mysql.tables_priv where `DB` = '{{relation.schema}}' and `Table_name` = '{{relation.identifier}}' and Table_priv like '%Update%'
union ALL
select case(Table_priv) when null then null else 'delete' end as privilege_type, `User` as grantee from mysql.tables_priv where `DB` = '{{relation.schema}}' and `Table_name` = '{{relation.identifier}}' and Table_priv like '%Delete%'
{% endmacro %}
get_grant_sql
Use standard grant SQL to give privileges to multiple users. Note that users need to use double quotes in TiDB. The corresponding SQL is as follows:
{%- macro tidb__get_grant_sql(relation, privilege, grantees) -%}
grant {{ privilege }} on {{ relation }} to {{ '\"' + grantees|join('\", \"') + '\"' }}
{%- endmacro -%}
get_revoke_sql
Use standard revoke SQL to revoke privileges. Double quotes are also needed in TiDB. The corresponding SQL is as follows:
{%- macro tidb__get_revoke_sql(relation, privilege, grantees) -%}
revoke {{ privilege }} on {{ relation }} from {{ '\"' + grantees|join('\", \"') + '\"' }}
{%- endmacro -%}
Find the bug by tests
After implementing the feature, we need to test it with the testing framework. In most cases, you will find the tests fail. Here are some suggestions for you to fix the bugs:
Find the root cause according to the errors shown by dbt and fix it. For example, most of the SQL syntax errors can be found in this way.
-
Read the pr for the feature in dbt-core.
- Check if any macros/methods that have been overridden by your adapter changed. If so, we may need to change the adapter too.
- See if there are other macros/methods added.
Refer to the implementation of the other adapters to see if you miss anything.
When we support grants in dbt-tidb. We find that the incremental materialization and snapshot materialization are changed in the dbt-core's pr of grant feature. Unfortunately, dbt-tidb has overridden them before. Thus, we need to update them in dbt-tidb:
{% materialization incremental, adapter='tidb' %}
-- other code
{% set grant_config = config.get('grants') %}
-- other code
{% set should_revoke = should_revoke(existing_relation, full_refresh_mode) %}
{% do apply_grants(target_relation, grant_config, should_revoke=should_revoke) %}
-- other code
{%- endmaterialization %}
The code first gets the grant configuration and then calls apply_grants
to apply the get_grant_sql
method which is implemented above.
We also found that the newly added call_dcl_statements
macro needs to be overwritten to convert multiple SQL into some single SQL. Because dbt-tidb does not support multiple SQL yet
{% macro tidb__call_dcl_statements(dcl_statement_list) %}
{% for dcl_statement in dcl_statement_list %}
{% call statement('grant_or_revoke') %}
{{ dcl_statement }}
{% endcall %}
{% endfor %}
{% endmacro %}
Fix the test
After you fix all the bugs, there may also occur some errors in the test. These errors are not because of your implementation, but for some compatibility issues of the tests. Some tests need to be changed to be compatible with your adapter. You can also find out how to modify the test by Testing a new adapter
We also modify the test when implementing grants in dbt-tidb.
BaseInvalidGrants
is the test for the invalid grantee and privilege_type. Obviously, different adapters may throw different errors for the invalid case. So, it is necessary to modify the test. For TiDB:
class TestInvalidGrantsTiDB(BaseInvalidGrants):
def grantee_does_not_exist_error(self):
return "You are not allowed to create a user with GRANT"
def privilege_does_not_exist_error(self):
return "Illegal privilege level specified for"
Congratulations! I think your adapter is ready for release now.
Using dbt-tidb v1.2.0
Now, let's experience the new adapter. Here we will use dbt-tidb v1.2.0.
Set up
1.install dbt-tidb v1.2.0
$ pip3 install dbt-tidb=1.2.0
2.Build TiDB. We choose to build TiDB with TiDB Cloud's free trial — developer tier
- Sign in with your account, and the page will jump to the TiDB Cloud console.
- Click Create Cluster button and you will jump to the creation page.
- Click Create on the creation page without any change to get a developer tier and you will go back to the TiDB Cloud console.
- Wait for the cluster to be ready. Then you can click Connect button to get the user and host.
3.Download jaffle_shop on Github. It is a dbt project for testing maintained by the official
git clone https://github.com/dbt-labs/jaffle_shop
4.Addprofiles.yml
in ~/.dbt
. It configures the connection information
jaffle_shop_tidb: # project name
target: dev
outputs:
dev:
type: tidb # adapter name
server: gateway01.ap-southeast-1.prod.aws.tidbcloud.com # your TiDB host
port: 4000 # your TiDB port
schema: test # db name
username: 41y7Jq2g5sBr2ia.root # your TiDB user name
password: ${fake_password} # you tidb password
5.Change the dbt_project.yml in jaffle_shop, you only need to change the profile
profile: 'jaffle_shop_tidb'
6.Execute dbt debug
in the jaffle_shop to check your configs
dbt debug
Connection Retry Feature
Run, build and test in dbt may generate hundreds of independent connections. A single connection timeout due to networking may fail the entire project. Therefore, dbt-tidb adds a retry feature to solve the temporary connection timeout problem.
Connection Retry example
1.Add retries config in profile.yml, it determines the retry times. We use an invalid user to simulate connection failure.
jaffle_shop_tidb: # project name
target: dev
outputs:
dev:
type: tidb # adapter name
server: gateway01.ap-southeast-1.prod.aws.tidbcloud.com # your TiDB host
port: 4000 # your TiDB port
schema: test # db name
username: 41y7Jq2g5sBr2ia.invaild_user # your TiDB user name
password: ${fake_password} # you tidb password
2.Execute dbt debug, you will find the Database Error
$ dbt debug
Connection:
server: gateway01.ap-southeast-1.prod.aws.tidbcloud.com
port: 4000
database: None
schema: test
user: 41y7Jq2g5sBr2ia.invaild_user
Connection test: [ERROR]
1 check failed:
dbt was unable to connect to the specified database.
The database returned the following error:
>Database Error
1045 (28000): Access denied for user '41y7Jq2g5sBr2ia.invaild_user'@'10.0.123.88' (using password: YES)
3.Check the dbt log and you will find three times of retry before throw error.
$ cat dbt.log
06:24:19.875482 [debug] [MainThread]: tidb adapter: Got a retryable error when attempting to open a tidb connection.
3 attempts remaining. Retrying in 1 seconds.
Error:
1045 (28000): Access denied for user '41y7Jq2g5sBr2ia.invaild_user'@'10.0.123.88' (using password: YES)
06:24:21.321733 [debug] [MainThread]: tidb adapter: Got a retryable error when attempting to open a tidb connection.
2 attempts remaining. Retrying in 1 seconds.
Error:
1045 (28000): Access denied for user '41y7Jq2g5sBr2ia.invaild_user'@'10.0.123.88' (using password: YES)
06:24:22.703960 [debug] [MainThread]: tidb adapter: Got a retryable error when attempting to open a tidb connection.
1 attempts remaining. Retrying in 1 seconds.
Error:
1045 (28000): Access denied for user '41y7Jq2g5sBr2ia.invaild_user'@'10.0.123.88' (using password: YES)
06:24:24.069883 [debug] [MainThread]: tidb adapter: Error running SQL: select 1 as id
Grants Feature
In dbt-tidb, you can grant table privileges to the given user with grants.
Now, you can add grants config in the model, seed, and snapshots. If you configure it under dbt_project.yml, all resources in the project will take effect. Of course, you can also configure it with SQL or YAML for a specific resource, which will override the config in dbt_project.yml. Pay attention that grants do not support creating users, you need to create the users in TiDB first.
Grants Example
1.Create users in TiDB, note that the user name in the dev tier contains a prefix
CREATE USER '41y7Jq2g5sBr2ia.user1'@'%' IDENTIFIED BY '';
CREATE USER '41y7Jq2g5sBr2ia.user2'@'%' IDENTIFIED BY '';
CREATE USER '41y7Jq2g5sBr2ia.user3'@'%' IDENTIFIED BY '';
2.Add grants config in dbt_project.yml
seeds:
+grants:
select: ['41y7Jq2g5sBr2ia.user1','41y7Jq2g5sBr2ia.user2']
insert: ['41y7Jq2g5sBr2ia.user1','41y7Jq2g5sBr2ia.user3']
3.Execute dbt seed
$ dbt seed
06:38:49 Concurrency: 1 threads (target='dev')
06:38:49
06:38:49 1 of 3 START seed file test.raw_customers ...................................... [RUN]
06:38:50 1 of 3 OK loaded seed file test.raw_customers .................................. [INSERT 100 in 1.58s]
06:38:50 2 of 3 START seed file test.raw_orders ......................................... [RUN]
06:38:52 2 of 3 OK loaded seed file test.raw_orders ..................................... [INSERT 99 in 1.52s]
06:38:52 3 of 3 START seed file test.raw_payments ....................................... [RUN]
06:38:54 3 of 3 OK loaded seed file test.raw_payments ................................... [INSERT 113 in 1.66s]
06:38:55
06:38:55 Finished running 3 seeds in 0 hours 0 minutes and 9.09 seconds (9.09s).
06:38:55
06:38:55 Completed successfully
06:38:55
06:38:55 Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3
4.Query the result in TiDB
mysql> select * from mysql.tables_priv where User in('41y7Jq2g5sBr2ia.user1','41y7Jq2g5sBr2ia.user2','41y7Jq2g5sBr2ia.user3');
+------+------+-----------------------+---------------+---------+---------------------+---------------+---------------+
| Host | DB | User | Table_name | Grantor | Timestamp | Table_priv | Column_priv |
+------+------+-----------------------+---------------+---------+---------------------+---------------+---------------+
| % | test | 41y7Jq2g5sBr2ia.user1 | raw_customers | | 2022-08-19 06:46:08 | Select,Insert | Select,Insert |
| % | test | 41y7Jq2g5sBr2ia.user2 | raw_customers | | 2022-08-19 06:46:08 | Select | Select |
| % | test | 41y7Jq2g5sBr2ia.user3 | raw_customers | | 2022-08-19 06:46:08 | Insert | Insert |
| % | test | 41y7Jq2g5sBr2ia.user1 | raw_orders | | 2022-08-19 06:46:10 | Select,Insert | Select,Insert |
| % | test | 41y7Jq2g5sBr2ia.user2 | raw_orders | | 2022-08-19 06:46:10 | Select | Select |
| % | test | 41y7Jq2g5sBr2ia.user3 | raw_orders | | 2022-08-19 06:46:10 | Insert | Insert |
| % | test | 41y7Jq2g5sBr2ia.user1 | raw_payments | | 2022-08-19 06:46:12 | Select,Insert | Select,Insert |
| % | test | 41y7Jq2g5sBr2ia.user2 | raw_payments | | 2022-08-19 06:46:12 | Select | Select |
| % | test | 41y7Jq2g5sBr2ia.user3 | raw_payments | | 2022-08-19 06:46:12 | Insert | Insert |
+------+------+-----------------------+---------------+---------+---------------------+---------------+---------------+
The grants are just as what we have set:
41y7Jq2g5sBr2ia.user1 has Select and Insert priv
41y7Jq2g5sBr2ia.user2 has Select priv
41y7Jq2g5sBr2ia.user3 has Insert pri
Use Cross-database macros
One of the strengths of dbt is that it can reuse macros (like functions). dbt-util is an official tool repository. We can reuse its macros by importing dbt-util.
Cross-database macros have been migrated from dbt-util to dbt-core in v1.2.0, which means you can use them directly without importing dbt-util.
dbt-tidb also supports using Cross-database macros directly. You can use the following macros in dbt-tidb.
bool_or
cast_bool_to_text
dateadd
datediff
date_trunc
hash
safe_cast
split_part
last_day
cast_bool_to_text
concat
escape_single_quotes
except
intersect
length
position
replace
right
Read our doc to see how to use them. Here we will take datediff
as an example
1.Execute dbt seed to generate raw_orders table
dbt seed
2.Create datediff.sql in the models directory. It will calculate the days between order_date and 2018-01-01 in raw_orders
with orders as (
select * from {{ ref('raw_orders') }}
)
select * , {{datediff( "'2018-01-01'", "order_date", 'day' )}} as datediff from orders
3.Execute dbt run -s datediff
4.Query the result in TiDB, you will get the results
mysql> select * from test.datediff;
+------+---------+------------+----------------+----------+
| id | user_id | order_date | status | datediff |
+------+---------+------------+----------------+----------+
| 1 | 1 | 2018-01-01 | returned | 0 |
| 2 | 3 | 2018-01-02 | completed | 1 |
| 3 | 94 | 2018-01-04 | completed | 3 |
| 4 | 50 | 2018-01-05 | completed | 4 |
| 5 | 64 | 2018-01-05 | completed | 4 |
| 6 | 54 | 2018-01-07 | completed | 6 |
| 7 | 88 | 2018-01-09 | completed | 8 |
| 8 | 2 | 2018-01-11 | returned | 10 |
| 9 | 53 | 2018-01-12 | completed | 11 |
| 10 | 7 | 2018-01-14 | completed | 13 |
| 11 | 99 | 2018-01-14 | completed | 13 |
| 12 | 59 | 2018-01-15 | completed | 14 |
| 13 | 84 | 2018-01-17 | completed | 16 |
| 14 | 40 | 2018-01-17 | returned | 16 |
| 15 | 25 | 2018-01-17 | completed | 16 |
| 16 | 39 | 2018-01-18 | completed | 17 |
| 17 | 71 | 2018-01-18 | completed | 17 |
| 18 | 64 | 2018-01-20 | returned | 19 |
| 19 | 54 | 2018-01-22 | completed | 21 |
| 20 | 20 | 2018-01-23 | completed | 22 |
Conclusion
This article takes dbt-tidb as an example to share the process and skills of upgrading dbt adapter. Then introduce how to use the new features in dbt-tidb v1.2.0. Hope it will help you with the upgrade of your adapter.
Welcome to contribute dbt-tidb.
Top comments (0)