DEV Community

shiyuhang0 for TiDB Cloud Ecosystem

Posted on • Edited on

How to upgrade your dbt adapter

Author: shiyuhang

Recently, we published a new release of dbt-tidb, an adapter that enables dbt to work with TiDB.

Here, I will introduce some experience of how to upgrade a dbt adapter to support new dbt-core. For how to build a new adapter, please refer to the official doc.

Version Rules

dbt-core follows the Semantic Versioning. So I suggest you also use Semantic Versioning when you publish your own adapter.

Although dbt-core will try to avoid break changes, compatibility issues still occur. For example, dbt-core v1.2.0 has added retry_connection method in BaseConnectionManagerto support the retry connection feature. If the adapter implements this method, it can't run on dbt-core v1.1.0.

In order to avoid compatibility issues, dbt-tidb will follow the version number of dbt-core. For example, dbt-tidb v1.2.0 will only support dbt-core v1.2.0. I suggest you do the same for your adapter.

Investigation

When we support the new dbt-core, the first step is to investigate which features need to be supported.

Here are a few investigation methods you can use in combination:

  1. Visit the official Version migration guides, which may include suggestions for adapter maintainers.

  2. Check out the dbt-core release note, focusing on new features for adapters. Sort out the new features that need to be implemented.

  3. Sometimes, the dbt official will list TODO in the Github Discussions. You can upgrade your adapter according to the discussion.

  4. Refer to other adapters, you can find all the adapters apps in Available adapters.

  5. Not recommended: You can also do nothing but only upgrade the version of dbt-core at the setup.py. If you are lucky, it will still work for the new dbt-core, but you can't enjoy the new features brought by the new dbt-core.

According to the investigation methods, dbt-tidb sorts out the features that need to be implemented as follows:

For dbt-tidb 1.1.0:

  • Deprecate Python 3.7 and support Python 3.10
  • Implement the new adapter testing framework
  • Support multiple unique keys in incremental

For dbt-tidb 1.2.0:

  • Support Connection retry feature
  • Support grants feature
  • Support Cross-database macros (some macros under the dbt-util package have been migrated to dbt-core
  • Added BaseDocsGenerate and BaseValidateConnection tests

Test

I would like to introduce how to test before code because I strongly recommend using Test Driven Development(TTD) to develop the dbt adapter. That is: write the test first, and then implement the corresponding feature. If the test passes, the feature is considered to be supported.

Since dbt-core v1.1.0, dbt has provided a new testing framework for adapter maintainers. The new testing framework is included in dbt-core. With it, you can write your own tests easier. For more detail, you can refer to Testing a new adapter. dbt-tidb v1.1.0 started to use the new testing framework, and introduced the basic package to test the basic features.

dbt-tidb v1.2.0 adds the following tests based on the new features

  • basic package: add BaseValidateConnection and BaseDocsGenerate for testing connection and document generation

  • grant package: add a new grant package for testing grant features

  • util: add a new util package for testing Cross-database macros which are migrated from dbt-util

Let's take grants as an example, according to the newly added tests in dbt-core v1.2.0. We need to add the following tests for grants:

class TestModelGrantsTiDB(BaseModelGrants):
    pass


class TestIncrementalGrantsTiDB(BaseIncrementalGrants):
    pass


class TestSeedGrantsTiDB(BaseSeedGrants):
    pass


class TestSnapshotGrantsTiDB(BaseSnapshotGrants):
    pass


class TestInvalidGrantsTiDB(BaseInvalidGrants):
    pass
Enter fullscreen mode Exit fullscreen mode

Here, we use pass without any implementation, the tests will inherit the default implementation of the test framework.

Code

Implement the feature

Now, it is time to implement the features.

Generally, dbt can be expanded by overriding the default macros or some methods. About which one needed to implement, here is a reference:

  • dbt officials may explain how to implement features in the GitHub discussions.
  • Refer to dbt-core's implementation of this feature.
  • Refer to the implementation of the other adapters.

According to the Upgrading to dbt-core 1.2.0 in the dbt-core's discussions. We find that grant is mainly implemented by overriding the macros of dbt-core, and the following macros need to be implemented:

get_show_grant_sql

Returns auth information, and the return format needs to be grantee (username in some adapter) + privilege_type.

For TiDB, we query the mysql.tables_priv table to obtain the auth information. Then filter the given DB and table. Next poll Select, Insert, Update, and Delete. Finally, output in the format of grantee,privilege_typeThe corresponding SQL is as follows:

{% macro tidb__get_show_grant_sql(relation) %}

    select case(Table_priv) when null then null else 'select' end as privilege_type, `User` as grantee from mysql.tables_priv  where `DB` = '{{relation.schema}}' and `Table_name` = '{{relation.identifier}}' and Table_priv like '%Select%'
    union ALL
    select case(Table_priv) when null then null else 'insert' end as privilege_type, `User` as grantee from mysql.tables_priv  where `DB` = '{{relation.schema}}' and `Table_name` = '{{relation.identifier}}' and Table_priv like '%Insert%'
    union ALL
    select case(Table_priv) when null then null else 'update' end as privilege_type, `User` as grantee from mysql.tables_priv  where `DB` = '{{relation.schema}}' and `Table_name` = '{{relation.identifier}}' and Table_priv like '%Update%'
    union ALL
    select case(Table_priv) when null then null else 'delete' end as privilege_type, `User` as grantee from mysql.tables_priv  where `DB` = '{{relation.schema}}' and `Table_name` = '{{relation.identifier}}' and Table_priv like '%Delete%'

{% endmacro %}
Enter fullscreen mode Exit fullscreen mode

get_grant_sql

Use standard grant SQL to give privileges to multiple users. Note that users need to use double quotes in TiDB. The corresponding SQL is as follows:

{%- macro tidb__get_grant_sql(relation, privilege, grantees) -%}
    grant {{ privilege }} on {{ relation }} to {{ '\"' + grantees|join('\", \"') + '\"' }}
{%- endmacro -%}
Enter fullscreen mode Exit fullscreen mode

get_revoke_sql

Use standard revoke SQL to revoke privileges. Double quotes are also needed in TiDB. The corresponding SQL is as follows:

{%- macro tidb__get_revoke_sql(relation, privilege, grantees) -%}
    revoke {{ privilege }} on {{ relation }} from {{ '\"' + grantees|join('\", \"') + '\"' }}
{%- endmacro -%}
Enter fullscreen mode Exit fullscreen mode

Find the bug by tests

After implementing the feature, we need to test it with the testing framework. In most cases, you will find the tests fail. Here are some suggestions for you to fix the bugs:

  1. Find the root cause according to the errors shown by dbt and fix it. For example, most of the SQL syntax errors can be found in this way.

  2. Read the pr for the feature in dbt-core.

    • Check if any macros/methods that have been overridden by your adapter changed. If so, we may need to change the adapter too.
    • See if there are other macros/methods added.
  3. Refer to the implementation of the other adapters to see if you miss anything.

When we support grants in dbt-tidb. We find that the incremental materialization and snapshot materialization are changed in the dbt-core's pr of grant feature. Unfortunately, dbt-tidb has overridden them before. Thus, we need to update them in dbt-tidb:

{% materialization incremental, adapter='tidb' %}

   -- other code
  {% set grant_config = config.get('grants') %}

   -- other code
  {% set should_revoke = should_revoke(existing_relation, full_refresh_mode) %}
  {% do apply_grants(target_relation, grant_config, should_revoke=should_revoke) %}

   -- other code

{%- endmaterialization %}
Enter fullscreen mode Exit fullscreen mode

The code first gets the grant configuration and then calls apply_grants to apply the get_grant_sql method which is implemented above.

We also found that the newly added call_dcl_statements macro needs to be overwritten to convert multiple SQL into some single SQL. Because dbt-tidb does not support multiple SQL yet

{% macro tidb__call_dcl_statements(dcl_statement_list) %}
    {% for dcl_statement in dcl_statement_list %}
        {% call statement('grant_or_revoke') %}
            {{ dcl_statement }}
        {% endcall %}
    {% endfor %}
{% endmacro %}
Enter fullscreen mode Exit fullscreen mode

Fix the test

After you fix all the bugs, there may also occur some errors in the test. These errors are not because of your implementation, but for some compatibility issues of the tests. Some tests need to be changed to be compatible with your adapter. You can also find out how to modify the test by Testing a new adapter

We also modify the test when implementing grants in dbt-tidb.

BaseInvalidGrants is the test for the invalid grantee and privilege_type. Obviously, different adapters may throw different errors for the invalid case. So, it is necessary to modify the test. For TiDB:

class TestInvalidGrantsTiDB(BaseInvalidGrants):
    def grantee_does_not_exist_error(self):
        return "You are not allowed to create a user with GRANT"

    def privilege_does_not_exist_error(self):
        return "Illegal privilege level specified for"
Enter fullscreen mode Exit fullscreen mode

Congratulations! I think your adapter is ready for release now.

Using dbt-tidb v1.2.0

Now, let's experience the new adapter. Here we will use dbt-tidb v1.2.0.

Set up

1.install dbt-tidb v1.2.0

$ pip3 install dbt-tidb=1.2.0 
Enter fullscreen mode Exit fullscreen mode

2.Build TiDB. We choose to build TiDB with TiDB Cloud's free trial — developer tier

  1. Sign in with your account, and the page will jump to the TiDB Cloud console.
  2. Click Create Cluster button and you will jump to the creation page.
  3. Click Create on the creation page without any change to get a developer tier and you will go back to the TiDB Cloud console.
  4. Wait for the cluster to be ready. Then you can click Connect button to get the user and host.

3.Download jaffle_shop on Github. It is a dbt project for testing maintained by the official

git clone https://github.com/dbt-labs/jaffle_shop
Enter fullscreen mode Exit fullscreen mode

4.Addprofiles.yml in ~/.dbt. It configures the connection information

jaffle_shop_tidb:                        # project name
  target: dev                             
  outputs:
    dev:
      type: tidb                         # adapter name
      server: gateway01.ap-southeast-1.prod.aws.tidbcloud.com # your TiDB host
      port: 4000                         # your TiDB port
      schema: test                       # db name
      username: 41y7Jq2g5sBr2ia.root     # your TiDB user name
      password: ${fake_password}         # you tidb password
Enter fullscreen mode Exit fullscreen mode

5.Change the dbt_project.yml in jaffle_shop, you only need to change the profile

profile: 'jaffle_shop_tidb'
Enter fullscreen mode Exit fullscreen mode

6.Execute dbt debug in the jaffle_shop to check your configs

dbt debug
Enter fullscreen mode Exit fullscreen mode

Connection Retry Feature

Run, build and test in dbt may generate hundreds of independent connections. A single connection timeout due to networking may fail the entire project. Therefore, dbt-tidb adds a retry feature to solve the temporary connection timeout problem.

Connection Retry example

1.Add retries config in profile.yml, it determines the retry times. We use an invalid user to simulate connection failure.

jaffle_shop_tidb:                        # project name
  target: dev                             
  outputs:
    dev:
      type: tidb                         # adapter name
      server: gateway01.ap-southeast-1.prod.aws.tidbcloud.com # your TiDB host
      port: 4000                         # your TiDB port
      schema: test                       # db name
      username: 41y7Jq2g5sBr2ia.invaild_user     # your TiDB user name
      password: ${fake_password}         # you tidb password
Enter fullscreen mode Exit fullscreen mode

2.Execute dbt debug, you will find the Database Error

$ dbt debug
Connection:
  server: gateway01.ap-southeast-1.prod.aws.tidbcloud.com
  port: 4000
  database: None
  schema: test
  user: 41y7Jq2g5sBr2ia.invaild_user
  Connection test: [ERROR]

1 check failed:
dbt was unable to connect to the specified database.
The database returned the following error:

  >Database Error
  1045 (28000): Access denied for user '41y7Jq2g5sBr2ia.invaild_user'@'10.0.123.88' (using password: YES)
Enter fullscreen mode Exit fullscreen mode

3.Check the dbt log and you will find three times of retry before throw error.

$ cat dbt.log
06:24:19.875482 [debug] [MainThread]: tidb adapter: Got a retryable error when attempting to open a tidb connection.
3 attempts remaining. Retrying in 1 seconds.
Error:
1045 (28000): Access denied for user '41y7Jq2g5sBr2ia.invaild_user'@'10.0.123.88' (using password: YES)
06:24:21.321733 [debug] [MainThread]: tidb adapter: Got a retryable error when attempting to open a tidb connection.
2 attempts remaining. Retrying in 1 seconds.
Error:
1045 (28000): Access denied for user '41y7Jq2g5sBr2ia.invaild_user'@'10.0.123.88' (using password: YES)
06:24:22.703960 [debug] [MainThread]: tidb adapter: Got a retryable error when attempting to open a tidb connection.
1 attempts remaining. Retrying in 1 seconds.
Error:
1045 (28000): Access denied for user '41y7Jq2g5sBr2ia.invaild_user'@'10.0.123.88' (using password: YES)
06:24:24.069883 [debug] [MainThread]: tidb adapter: Error running SQL: select 1 as id
Enter fullscreen mode Exit fullscreen mode

Grants Feature

In dbt-tidb, you can grant table privileges to the given user with grants.

Now, you can add grants config in the model, seed, and snapshots. If you configure it under dbt_project.yml, all resources in the project will take effect. Of course, you can also configure it with SQL or YAML for a specific resource, which will override the config in dbt_project.yml. Pay attention that grants do not support creating users, you need to create the users in TiDB first.

Grants Example

1.Create users in TiDB, note that the user name in the dev tier contains a prefix

CREATE USER '41y7Jq2g5sBr2ia.user1'@'%' IDENTIFIED BY '';
CREATE USER '41y7Jq2g5sBr2ia.user2'@'%' IDENTIFIED BY '';
CREATE USER '41y7Jq2g5sBr2ia.user3'@'%' IDENTIFIED BY '';
Enter fullscreen mode Exit fullscreen mode

2.Add grants config in dbt_project.yml

seeds:
  +grants:
     select: ['41y7Jq2g5sBr2ia.user1','41y7Jq2g5sBr2ia.user2']
     insert: ['41y7Jq2g5sBr2ia.user1','41y7Jq2g5sBr2ia.user3']
Enter fullscreen mode Exit fullscreen mode

3.Execute dbt seed

$ dbt seed
06:38:49  Concurrency: 1 threads (target='dev')
06:38:49
06:38:49  1 of 3 START seed file test.raw_customers ...................................... [RUN]
06:38:50  1 of 3 OK loaded seed file test.raw_customers .................................. [INSERT 100 in 1.58s]
06:38:50  2 of 3 START seed file test.raw_orders ......................................... [RUN]
06:38:52  2 of 3 OK loaded seed file test.raw_orders ..................................... [INSERT 99 in 1.52s]
06:38:52  3 of 3 START seed file test.raw_payments ....................................... [RUN]
06:38:54  3 of 3 OK loaded seed file test.raw_payments ................................... [INSERT 113 in 1.66s]
06:38:55
06:38:55  Finished running 3 seeds in 0 hours 0 minutes and 9.09 seconds (9.09s).
06:38:55
06:38:55  Completed successfully
06:38:55
06:38:55  Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3
Enter fullscreen mode Exit fullscreen mode

4.Query the result in TiDB

mysql> select * from mysql.tables_priv where User in('41y7Jq2g5sBr2ia.user1','41y7Jq2g5sBr2ia.user2','41y7Jq2g5sBr2ia.user3');
+------+------+-----------------------+---------------+---------+---------------------+---------------+---------------+
| Host | DB   | User                  | Table_name    | Grantor | Timestamp           | Table_priv    | Column_priv   |
+------+------+-----------------------+---------------+---------+---------------------+---------------+---------------+
| %    | test | 41y7Jq2g5sBr2ia.user1 | raw_customers |         | 2022-08-19 06:46:08 | Select,Insert | Select,Insert |
| %    | test | 41y7Jq2g5sBr2ia.user2 | raw_customers |         | 2022-08-19 06:46:08 | Select        | Select        |
| %    | test | 41y7Jq2g5sBr2ia.user3 | raw_customers |         | 2022-08-19 06:46:08 | Insert        | Insert        |
| %    | test | 41y7Jq2g5sBr2ia.user1 | raw_orders    |         | 2022-08-19 06:46:10 | Select,Insert | Select,Insert |
| %    | test | 41y7Jq2g5sBr2ia.user2 | raw_orders    |         | 2022-08-19 06:46:10 | Select        | Select        |
| %    | test | 41y7Jq2g5sBr2ia.user3 | raw_orders    |         | 2022-08-19 06:46:10 | Insert        | Insert        |
| %    | test | 41y7Jq2g5sBr2ia.user1 | raw_payments  |         | 2022-08-19 06:46:12 | Select,Insert | Select,Insert |
| %    | test | 41y7Jq2g5sBr2ia.user2 | raw_payments  |         | 2022-08-19 06:46:12 | Select        | Select        |
| %    | test | 41y7Jq2g5sBr2ia.user3 | raw_payments  |         | 2022-08-19 06:46:12 | Insert        | Insert        |
+------+------+-----------------------+---------------+---------+---------------------+---------------+---------------+
Enter fullscreen mode Exit fullscreen mode

The grants are just as what we have set:

  • 41y7Jq2g5sBr2ia.user1 has Select and Insert priv

  • 41y7Jq2g5sBr2ia.user2 has Select priv

  • 41y7Jq2g5sBr2ia.user3 has Insert pri

Use Cross-database macros

One of the strengths of dbt is that it can reuse macros (like functions). dbt-util is an official tool repository. We can reuse its macros by importing dbt-util.

Cross-database macros have been migrated from dbt-util to dbt-core in v1.2.0, which means you can use them directly without importing dbt-util.

dbt-tidb also supports using Cross-database macros directly. You can use the following macros in dbt-tidb.

  • bool_or

  • cast_bool_to_text

  • dateadd

  • datediff

  • date_trunc

  • hash

  • safe_cast

  • split_part

  • last_day

  • cast_bool_to_text

  • concat

  • escape_single_quotes

  • except

  • intersect

  • length

  • position

  • replace

  • right

Read our doc to see how to use them. Here we will take datediff as an example

1.Execute dbt seed to generate raw_orders table

dbt seed
Enter fullscreen mode Exit fullscreen mode

2.Create datediff.sql in the models directory. It will calculate the days between order_date and 2018-01-01 in raw_orders

with orders as (

    select * from {{ ref('raw_orders') }}

)

select * , {{datediff( "'2018-01-01'", "order_date", 'day' )}} as datediff from orders
Enter fullscreen mode Exit fullscreen mode

3.Execute dbt run -s datediff

4.Query the result in TiDB, you will get the results

mysql> select * from test.datediff;
+------+---------+------------+----------------+----------+
| id   | user_id | order_date | status         | datediff |
+------+---------+------------+----------------+----------+
|    1 |       1 | 2018-01-01 | returned       |        0 |
|    2 |       3 | 2018-01-02 | completed      |        1 |
|    3 |      94 | 2018-01-04 | completed      |        3 |
|    4 |      50 | 2018-01-05 | completed      |        4 |
|    5 |      64 | 2018-01-05 | completed      |        4 |
|    6 |      54 | 2018-01-07 | completed      |        6 |
|    7 |      88 | 2018-01-09 | completed      |        8 |
|    8 |       2 | 2018-01-11 | returned       |       10 |
|    9 |      53 | 2018-01-12 | completed      |       11 |
|   10 |       7 | 2018-01-14 | completed      |       13 |
|   11 |      99 | 2018-01-14 | completed      |       13 |
|   12 |      59 | 2018-01-15 | completed      |       14 |
|   13 |      84 | 2018-01-17 | completed      |       16 |
|   14 |      40 | 2018-01-17 | returned       |       16 |
|   15 |      25 | 2018-01-17 | completed      |       16 |
|   16 |      39 | 2018-01-18 | completed      |       17 |
|   17 |      71 | 2018-01-18 | completed      |       17 |
|   18 |      64 | 2018-01-20 | returned       |       19 |
|   19 |      54 | 2018-01-22 | completed      |       21 |
|   20 |      20 | 2018-01-23 | completed      |       22 |
Enter fullscreen mode Exit fullscreen mode

Conclusion

This article takes dbt-tidb as an example to share the process and skills of upgrading dbt adapter. Then introduce how to use the new features in dbt-tidb v1.2.0. Hope it will help you with the upgrade of your adapter.

Welcome to contribute dbt-tidb.

Top comments (0)