DEV Community

Carlos A. Martinez
Carlos A. Martinez

Posted on

Azure Data Factory LAB Copy CSV

Image description

Azure Data Factory is the platform that solves such data scenarios. It is the cloud-based ETL and data integration service that allows you to create data-driven workflows for orchestrating data movement and transforming data at scale. Using Azure Data Factory, you can create and schedule data-driven workflows (called pipelines) that can ingest data from disparate data stores. You can build complex ETL processes that transform data visually with data flows or by using compute services such as Azure HDInsight Hadoop, Azure Databricks, and Azure SQL Database.

Lab ADF Copy Data from CSV to CSV

Image description

1 STEP

Create a resource group: LAB01_ADF_COPYING_DATA_FROM_CSV_TO_CSV

Image description

2 STEP

Create a storage account (blob storage)

Image description

Redundancy: Locally-redundant storage (LRS)

3 STEP

Create a storage account (data lake)

Image description

Enable hierarchical namespace

Image description

4 STEP

Inside blob storage (moviesblobst) add container 'bankmovies'

Image description

and after that upload file csv

Image description

5 STEP

Config data lake

Image description

Linked services

Linked services are much like connection strings, which define the connection information needed for the service to connect to external resources.

Is time to create a Data Factory, here we go.

6 STEP

Image description

Name: LAB01ADF01

Image description

Launch studio

Image description

7 STEP

Create new Linked service to Blob Storage

Image description

Name: ls_blob_moviesblobst

Storage account name: moviesblobst
and test connection

Image description

8 STEP

Create new Linked service to Data Lake

Image description

Name: ls_dl_moviesdatalakee
Storage account name: moviesdatalakee
and test connection

Image description

9 STEP

Create 2 dataset origin and sink

  1. Dataset Blob Storage

Dataset > New dataset > Azure Blob Storage > DelimitedText (CSV)

Name: ds_movies_bank_row_bs
Linked service: ls_blob_moviesblobst

Image description

File path you can clic in preview data

Image description

  1. Dataset Data Lake

Dataset > New dataset > Azure Data Lake Storage Gen2 > DelimitedText (CSV)

Name: ds_movies_bank_raw_dl
Linked service: ls_dl_moviesdatalakee

Image description

Validate all and publish all

Image description

10 STEP

Generate new pipeline, name is pl_ingestion_movies_data

Activities: Move & transform > Copy data

Copy data movies

Image description

Source dataset: ds_movies_bank_row_bs
Sink dataset: ds_movies_bank_raw_dl

Validate check
Debug

Image description

Go to data lake and look the file csv with data

Image description

Source ADF

Thanks for taking your time to read this post.

Top comments (1)

Collapse
 
suirennerius1982 profile image
Nerius Pérez Toirac

Great Carlos, useful manual!!! Tabks!!!