DEV Community šŸ‘©ā€šŸ’»šŸ‘Øā€šŸ’»

Cover image for Transfer data in Fauna to your analytics tool using Airbyte
Fauna for Fauna, Inc.

Posted on • Originally published at fauna.com

Transfer data in Fauna to your analytics tool using Airbyte

We are excited to introduce Fauna’s new Airbyte open source connector. This connector lets you replicate Fauna data into your data warehouses, lakes, and analytical databases, such as Snowflake, Redshift, S3, and more.

Why Airbyte

With the proliferation of applications and data sources, companies are often required to build custom connectors for data transfer across their architectures. Most of these ETL (extract, transform, and load) tools require maintaining and updating the connectors as requirements change over time. Airbyte is an open source data pipeline platform that eliminates this burden by offering a robust connector ecosystem that can scale without having to maintain the connector itself.

Why Fauna

Fauna is a distributed document-relational database delivered as a cloud API. Developers choose Fauna’s document-relational model because it combines the flexibility of NoSQL databases with the relational querying and ACID capabilities of SQL databases. This model is delivered as an API so you can focus on building features and not to worry about any operations or infrastructure management.

Why Fauna + Airbyte

Fauna and Airbyte both enable improved productivity and developer experience - together, the connector will allow developers to port and migrate transactional data in Fauna to your choice of analytical tools to drive business insights.
Continue reading for a guide on how to configure the Fauna source connector to transfer your database to one of the data analytics or warehousing destination connectors supported by Airbyte.
The Fauna source supports the following ways to export your data:

  • Full refresh append sync mode copies all of your data to the destination, without deleting existing data.
  • Full refresh overwrite sync mode copies the whole stream and replaces data in the destination by overwriting it.
  • Incremental append sync mode periodically transfers new, changed, or deleted data to the destination.
  • Incremental deduped history sync mode copies new records from stream and appends data in the destination, while providing a de-duplicated view mirroring the state of the stream in the source

Prerequisites

You need a destination database account and need to set up the Data Build Tool (dbtā„¢) to transform fields in your documents to columns in your destination. You also need to install Docker.

Create a destination database account

If you do not already have an account for the database associated with your destination connector, create an account and save the authentication credentials for setting up the destination connector to populate the destination database.

Set up dbt

To access the fields in your Fauna source using SQL-style statements, create a dbt account and set up dbt as described in the Airbyte Transformations with dbt setup guide. The guide steps you through the setup for transforming the data between the source and destination, and connects you to the destination database.

Install Docker

The Fauna connector is an Airbyte Open Source integration, deployed as a Docker image. If you do not already have Docker installed, follow the Install Docker Engine guide.

Step 1: Set up the Fauna source

Depending on your use case, set up one of the following sync modes for your collection.

Full refresh sync mode

Follow these steps to fully sync the source and destination database.

1- Use theĀ FaunaĀ Dashboard orĀ fauna-shellĀ shell to create a role that can read the collection to be exported. The Fauna Source needs access to the Collections resource so that it can find which collections are readable. This does not give it access to all the collections, just the names of all the collections. For example:

CreateRole({
  name: "airbyte-readonly",
  privileges: [{
    resource: Collection("COLLECTION_NAME"),
    actions: { read: true }
  }],
})
Enter fullscreen mode Exit fullscreen mode

ReplaceĀ COLLECTION_NAMEĀ with the collection name for this connector.

2- Create a secret that has the permissions associated with the role, using theĀ nameĀ of the role you created. For example:

CreateKey({
  name: "airbyte-readonly",
  role: Role("airbyte-readonly"),
})
Enter fullscreen mode Exit fullscreen mode
{
  ref: Key("341909050919747665"),
  ts: 1662328730450000,
  role: Role("airbyte-readonly"),
  secret: "fnAEjXudojkeRWaz5lxL2wWuqHd8k690edbKNYZz",
  hashed_secret: "$2a$05$TGr5F3JzriWbRUXlKMlykerq1nnYzEUr4euwrbrLUcWgLhvWmnW6S"
}
Enter fullscreen mode Exit fullscreen mode

Save the returnedĀ secret, otherwise, you need to create a new key.

Incremental append sync mode

Use incremental sync mode to periodically sync the source and destination, updating only new and changed data.

Follow these steps to set up incremental sync.

1- Use theĀ FaunaĀ Dashboard orĀ fauna-shellĀ to create an index, which lets the connector do incremental syncs. For example:

CreateIndex({
  name: "INDEX_NAME",
  source: Collection("COLLECTION_NAME"),
  terms: [],
  values: [
    { "field": "ts" },
    { "field": "ref" }
  ]
})
Enter fullscreen mode Exit fullscreen mode

ReplaceĀ INDEX_NAMEĀ with the name you configured for the Incremental Sync Index. ReplaceĀ COLLECTION_NAMEĀ with the name of the collection configured for this connector.

|Index values|Description|
| --- | ----------- |
|`ts`| Last modified timestamp.|
|`ref`|Unique document identifier.|
Enter fullscreen mode Exit fullscreen mode

2- Create a role that can read the collection and index, and can access index metadata to validate the index settings. For example:

CreateRole({
  name: "airbyte-readonly",
  privileges: [
    {
      resource: Collection("COLLECTION_NAME"),
      actions: { read: true }
    },
    {
      resource: Index("INDEX_NAME"),
      actions: { read: true }
    },
    {
      resource: Indexes(),
      actions: { read: true }
    }
  ],
})
Enter fullscreen mode Exit fullscreen mode

ReplaceĀ COLLECTION_NAMEĀ with the name of the collection configured for this connector. ReplaceĀ INDEX_NAMEĀ with the name that you configured for the Incremental Sync Index.

3- Create a secret key that has the permissions associated with the role, using theĀ nameĀ of the role you created. For example:

CreateKey({
  name: "airbyte-readonly",
  role: Role("airbyte-readonly"),
})
Enter fullscreen mode Exit fullscreen mode
{
  ref: Key("341909050919747665"),
  ts: 1662328730450000,
  role: Role("airbyte-readonly"),
  secret: "fnAEjXudojkeRWaz5lxL2wWuqHd8k690edbKNYZz",
  hashed_secret: "$2a$05$TGr5F3JzriWbRUXlKMlykerq1nnYzEUr4euwrbrLUcWgLhvWmnW6S"
}
Enter fullscreen mode Exit fullscreen mode

Save the returnedĀ secret. You need to enter the secret in step 2 of theĀ Install DockerĀ procedure. It is important to save the key, otherwise, you need to create a new key if you lose the provided secret.

The Fauna source iterates through all indexes on the database. For each index it finds, the following conditions must be met for incremental sync:

  1. The source must be able toĀ Get()Ā the index, which means it needs read access to this index.

  2. The source of the index must be a reference to the collection you are trying to sync

  3. The number of values must be two.

  4. The number of terms must be zero.

  5. The values must be equal to:

    {"field": "ts"}
    {"field": "ref"}
    

All of the above conditions are checked in the order listed. If a check fails, it skips that index.

If no indexes are found in the initial setup, incremental sync isn't available for the given collection. No error is emitted because it can't be determined whether or not you are expecting an index for that collection.

If you find that the collection doesn't have incremental sync available, make sure that you followed all the setup steps, and that the source, terms, and values all match for your index.

Step 2: Deploy and launch Airbyte

  1. Refer to theĀ Deploy Airbyte instructions to install and deploy Airbyte. Enter the following commands to deploy the Airbyte server:

    git clone https://github.com/airbytehq/airbyte.git
    cd airbyte
    docker-compose up
    
  2. When the Airbyte banner displays, launch the Airbyte dashboard atĀ http://localhost:8000.

  3. Choose theĀ ConnectionsĀ menu item to start setting up your data source.

Step 3: Set up the Fauna source

  1. In the Airbyte dashboard, click theĀ + New connectionĀ button. If you previously set up a source, click theĀ Use existing sourceĀ button to choose that source.

  2. In theĀ Source typeĀ dropdown, chooseĀ FaunaĀ and click theĀ Set up sourceĀ button. This lists the configurable Fauna connector parameters. An in-appĀ Setup GuideĀ in the right-side panel also gives detailed setup instructions.

  3. Set the following required parameters:

    Parameter Description
    Name Enter a descriptive name for this connection. TheĀ nameĀ is displayed in theĀ ConnectionsĀ window connections list.
    Domain Enter the domain of the collection you want to export. SeeĀ Region GroupsĀ for region domains.
    Port Enter the default port number:Ā 443.
    Scheme Enter the scheme used to connect to Fauna:Ā https.
    Fauna Secret Enter the saved Fauna secret that you use to authenticate with the database.
    Page Size The page size lets you control the memory size, which affects connector performance.
    Deletion Mode The deletion mode lets you specify whether to ignore document deletions or flag documents as deleted, depending on your use case.
    + Choose from the following options:
    • TheĀ IgnoreĀ option ignores document deletions.
    • TheĀ Deleted FieldĀ option adds a date column that has the date when you deleted the document. This maintains document history while letting the destination reconstruct deletion events.
  4. After setting up the source, click theĀ Set up sourceĀ button.

    TheĀ "All connection tests passed!"Ā message confirms successful connection to the Fauna source. This minimally confirms:

-   The secret is valid.

  • The connector can list collections and indexes.
Enter fullscreen mode Exit fullscreen mode

Step 4: Set up the destination

  1. In theĀ New connectionĀ window, choose aĀ Destination typeĀ and click theĀ Set up destinationĀ button. If you previously set up a destination, click theĀ Use existing destinationĀ button to select and use that destination. Otherwise, continue to set up a new destination.

  2. Destination connector configuration parameters are unique to the destination. Populate theĀ Set up the destinationĀ fields according to the connector requirements, including authentication information if needed. AĀ Setup GuideĀ is provided in the right-side panel with detailed setup instructions.

  3. When you are done, click theĀ Set up destinationĀ button.

Step 5: Set up the connection

Set up the connection to sync the source and destination.

  1. Enter a descriptive name for the connection in theĀ NameĀ field.

  2. Choose aĀ Transfer > Replication frequency, which is the data sync interval.

    You can choose theĀ ManualĀ option to manually sync the data.

  3. In theĀ Streams > Destination NamespaceĀ field, choose a destination namespace where the data is stored. Options include:

    Option Description
    Mirror source structure Sets the name in the destination database to the name used for the Fauna source.
    Other Uses another naming option, such as prefixing the database name with a string.
  4. Optionally, enter a stream name prefix in theĀ Streams > Destination Stream PrefixĀ field.

  5. In theĀ Activate the streams you want to syncĀ section, click theĀ >Ā arrow to expand the available fields:

    Field Description
    data Collection data.
    ref Unique document identifier.
    ts Data timestamp.
    ttl Time-to-live interval.

    The document is deleted if it is not modified in theĀ ttlĀ time interval. The default value isĀ nullĀ for not used. After document deletion, it is not displayed in temporal queries and the connector does not emit aĀ deleted_atĀ row.

  6. SelectĀ refĀ as theĀ Primary key. This uniquely identifies the document in the collection.

  7. Choose aĀ Sync modeĀ as the source sync behavior, full or incremental.

    A new incremental sync gets the full database, the same as a full sync.

    Sync mode Description
    Incremental - Deduped history Sync new records from stream and append data in destination, also provides a de-duplicated view mirroring the state of the stream in the source.
    Full refresh - Overwrite Sync the whole stream and replace data in destination by overwriting it.
    Incremental - Append Sync new records from stream and append data in destination.
    Full refresh - Append Sync the whole stream and append data in destination.

    If fewer than four options are displayed, it indicates that the index is incorrectly set up. SeeĀ Step 1: Set up the Fauna source.

  8. Choose theĀ Normalization and TransformationĀ data format:

    Data format Description
    Raw data (JSON) Put all the source data in a single column.
    Normalized tabular data Put theĀ ref,Ā ts,Ā ttl, andĀ dataĀ fields in separate columns.
  9. Click theĀ + Add transformationĀ button to add the dbt transform.

    To extract the fields in the sourceĀ dataĀ column, you need to configure dbt to map source data to destination database columns. For example, the following SQL-based query extracts theĀ name,Ā account_balance, andĀ credit_card/expiresĀ fields from the sourceĀ dataĀ column to populate three separate columns of the destination data:

    with output as (
      select
        data:name as name
        data:account_balance as balance
        data:credit_card:expires as cc_expires
      from airbyte_schema.users
    )
    
    select *from output
    
  10. Click theĀ Set up connectionĀ button.

Step 6: Sync the data

On theĀ ConnectionĀ page for the connection you created, click theĀ Sync nowĀ button.

The time to run a sync varies with the status displayed inĀ Sync History. When the sync completes, the status changes fromĀ RunningĀ toĀ SucceededĀ and shows:

  • The number of bytes transferred.

  • The number of records emitted and committed.

  • The sync duration.

Step 7: Verify the integration

To expand the sync log, click the > arrow to the right of the displayed time. This gives you a detailed view of the sync events.

Finally, verify successful database transfer by opening and viewing the destination database.

Conclusion

Integrating Fauna with the Airbyte open source solution arms developers building on Fauna with a powerful tool for gaining insights into the operational data living on Fauna. We’re excited to build on our partnership with Airbyte by working towards introducing an Airbyte Cloud connector. If you have any interest in a Fauna + Airbyte Cloud integration or questions about the open source connector, feel free to reachout to us and ask questions in our forum or on our Discord.

Top comments (0)

Find and follow new tags! šŸ¤” Did you know? Ā  DEV has a variety of tags to help you find the content you like. Find and follow your favorite tags