DEV Community

loading...

# dataengineering

👋 Sign in for the ability sort posts by top and latest.
Creating a Soft Delete Archive Table with PostgreSQL

Creating a Soft Delete Archive Table with PostgreSQL

Reactions 3 Comments
2 min read
📼 ksqlDB HOWTO - A mini video series 📼

📼 ksqlDB HOWTO - A mini video series 📼

Reactions 7 Comments
4 min read
Running a self-managed Kafka Connect worker for Confluent Cloud

Running a self-managed Kafka Connect worker for Confluent Cloud

Reactions 7 Comments
11 min read
Kafka Connect - Deep Dive into Single Message Transforms

Kafka Connect - Deep Dive into Single Message Transforms

Reactions 4 Comments
3 min read
Apache Spark Ecosystem, Jan 2021 Highlights

Apache Spark Ecosystem, Jan 2021 Highlights

Reactions 11 Comments
4 min read
ETL com Apache Airflow, Web Scraping, AWS S3, Apache Spark e Redshift | Parte 1

ETL com Apache Airflow, Web Scraping, AWS S3, Apache Spark e Redshift | Parte 1

Reactions 10 Comments
7 min read
🎄 Twelve Days of SMT 🎄 - Day 1: InsertField (timestamp)

🎄 Twelve Days of SMT 🎄 - Day 1: InsertField (timestamp)

Reactions 5 Comments
3 min read
First Look: AWS Glue DataBrew

First Look: AWS Glue DataBrew

Reactions 9 Comments
7 min read
My favourite re:Invent data announcements

My favourite re:Invent data announcements

Reactions 8 Comments
5 min read
🎄 Twelve Days of SMT 🎄 - Day 6: InsertField II

🎄 Twelve Days of SMT 🎄 - Day 6: InsertField II

Reactions 6 Comments
3 min read
New Features in Amazon DynamoDB - PartiQL, Export to S3, Integration with Kinesis Data Streams

New Features in Amazon DynamoDB - PartiQL, Export to S3, Integration with Kinesis Data Streams

Reactions 9 Comments
12 min read
Datetimes Are Hard: Part 1 - Incoming data and formats

Datetimes Are Hard: Part 1 - Incoming data and formats

Reactions 3 Comments 1
4 min read
Tidying up Pipelines with DataClasses

Tidying up Pipelines with DataClasses

Reactions 3 Comments
5 min read
Cut data warehouse costs with run caching

Cut data warehouse costs with run caching

Reactions 5 Comments
3 min read
Dagster with User Code Deployments (gRPC)

Dagster with User Code Deployments (gRPC)

Reactions 8 Comments 2
6 min read
11 Ways of Applying a Function to Python Pandas DataFrame

11 Ways of Applying a Function to Python Pandas DataFrame

Reactions 4 Comments
1 min read
Some of my favourite public data sets

Some of my favourite public data sets

Reactions 8 Comments 2
2 min read
5 Essential skills for becoming a Data Engineer

5 Essential skills for becoming a Data Engineer

Reactions 7 Comments
6 min read
The Most Popular Data Science Newsletters

The Most Popular Data Science Newsletters

Reactions 8 Comments
9 min read
Build a monitored code-based pipeline to move data from Postgres to Snowflake

Build a monitored code-based pipeline to move data from Postgres to Snowflake

Reactions 6 Comments
9 min read
Handling upstream data changes via Change Data Capture

Handling upstream data changes via Change Data Capture

Reactions 5 Comments
8 min read
Intoduction to Apache Spark

Intoduction to Apache Spark

Reactions 9 Comments
6 min read
Kafka Connect in 60 seconds 01:00

Kafka Connect in 60 seconds

Reactions 3 Comments
2 min read
Deploying data pipelines to AWS Fargate - with monitoring and alerts built-in

Deploying data pipelines to AWS Fargate - with monitoring and alerts built-in

Reactions 3 Comments
3 min read
Large-Scale Data Quality Verification in .NET PT.1

Large-Scale Data Quality Verification in .NET PT.1

Reactions 2 Comments
9 min read
Data Lake - 5 Major Principles

Data Lake - 5 Major Principles

Reactions 2 Comments
2 min read
Data Warehouse - The Minimal Architectural Approach

Data Warehouse - The Minimal Architectural Approach

Reactions 3 Comments
2 min read
How To Run Airflow on Windows (with Docker)

How To Run Airflow on Windows (with Docker)

Reactions 13 Comments 1
8 min read
Loading CSV data into Kafka - video walkthrough

Loading CSV data into Kafka - video walkthrough

Reactions 5 Comments
10 min read
What differentiates schema on read from schema on write?

What differentiates schema on read from schema on write?

Reactions 2 Comments 2
3 min read
Scraping Data on the Web with BeautifulSoup

Scraping Data on the Web with BeautifulSoup

Reactions 29 Comments
12 min read
Data Engineering Project for Beginners - Batch edition

Data Engineering Project for Beginners - Batch edition

Reactions 15 Comments
19 min read
10 Key skills, to help you become a data engineer

10 Key skills, to help you become a data engineer

Reactions 9 Comments
3 min read
Airflow UI with Role-Based Access Control

Airflow UI with Role-Based Access Control

Reactions 5 Comments
1 min read
Apache Airflow Installation - mysql+celery

Apache Airflow Installation - mysql+celery

Reactions 4 Comments
1 min read
Extract Nested Data From Complex JSON

Extract Nested Data From Complex JSON

Reactions 9 Comments
6 min read
🛢Create New Kedro Pipeline (kedro new)

🛢Create New Kedro Pipeline (kedro new)

Reactions 5 Comments
4 min read
🤷‍♀️ What is Kedro (The Parts)

🤷‍♀️ What is Kedro (The Parts)

Reactions 15 Comments 3
3 min read
Data engineering portfolio projects?

Data engineering portfolio projects?

Reactions 24 Comments 1
1 min read
Apache Airflow Core Concepts

Apache Airflow Core Concepts

Reactions 25 Comments
4 min read
5 Considerations to have when using Airflow

5 Considerations to have when using Airflow

Reactions 10 Comments
6 min read
I am a junior data engineer without a senior engineer. What should I do?

I am a junior data engineer without a senior engineer. What should I do?

Reactions 7 Comments 1
1 min read
Why we chose Apache Spark for ETL (Extract-Transform-Load)

Why we chose Apache Spark for ETL (Extract-Transform-Load)

Reactions 23 Comments
6 min read
Data Engineering Skills 00:31

Data Engineering Skills

Reactions 14 Comments
1 min read
Data Engineering — Complete Reference Guide From A-Z [2019]

Data Engineering — Complete Reference Guide From A-Z [2019]

Reactions 21 Comments
16 min read
ON the evolution of Data Engineering

ON the evolution of Data Engineering

Reactions 15 Comments
4 min read
Manage Data Pipelines with Apache Airflow

Manage Data Pipelines with Apache Airflow

Reactions 69 Comments
13 min read
How to Run Parallel Data Analysis in Python using Dask Dataframes

How to Run Parallel Data Analysis in Python using Dask Dataframes

Reactions 6 Comments
6 min read
Intro to Python Database Management with SQLAlchemy

Intro to Python Database Management with SQLAlchemy

Reactions 15 Comments
7 min read
Scrape Structured Data with Python and Extruct

Scrape Structured Data with Python and Extruct

Reactions 7 Comments
16 min read
CI/CD for ETL/ELT pipelines

CI/CD for ETL/ELT pipelines

Reactions 18 Comments
3 min read
Terraform in Anger Part 1: AWS S3 Access

Terraform in Anger Part 1: AWS S3 Access

Reactions 6 Comments
9 min read
5 Challenges ในการสร้าง Production-Grade Data Pipeline

5 Challenges ในการสร้าง Production-Grade Data Pipeline

Reactions 26 Comments 5
1 min read
Choosing Your Data Warehouse

Choosing Your Data Warehouse

Reactions 10 Comments 4
1 min read
Psycopg2: PostgreSQL & Python (the Old Fashioned Way)

Psycopg2: PostgreSQL & Python (the Old Fashioned Way)

Reactions 16 Comments
6 min read
Azure Message Brokers patterns for Data Applications

Azure Message Brokers patterns for Data Applications

Reactions 6 Comments
6 min read
Coding MapReduce in C from Scratch using Threads: Map

Coding MapReduce in C from Scratch using Threads: Map

Reactions 7 Comments
9 min read
How to collect the data you need to bootstrap your digital marketing analytics

How to collect the data you need to bootstrap your digital marketing analytics

Reactions 12 Comments
12 min read
Structured Streaming in PySpark

Structured Streaming in PySpark

Reactions 10 Comments
9 min read
Becoming Familiar with Apache Kafka and Message Queues

Becoming Familiar with Apache Kafka and Message Queues

Reactions 16 Comments
6 min read
loading...
Forem Open with the Forem app