DEV Community

# dataengineering

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
How to Migrate Massive Data in Record Time—Without a Single Minute of Downtime 🕑

How to Migrate Massive Data in Record Time—Without a Single Minute of Downtime 🕑

Comments
4 min read
Data Engineer as a Real-Time Algo Trader – Turning Pipelines into Profit (or at Least Trying)!

Data Engineer as a Real-Time Algo Trader – Turning Pipelines into Profit (or at Least Trying)!

Comments
13 min read
Choosing the right, real-time, Postgres CDC platform

Choosing the right, real-time, Postgres CDC platform

Comments
8 min read
Optimizing Large-Scale Data Processing in Python: A Guide to Parallelizing CSV Operations

Optimizing Large-Scale Data Processing in Python: A Guide to Parallelizing CSV Operations

Comments
3 min read
Seaborn Cheat Sheet

Seaborn Cheat Sheet

Comments
2 min read
Should I add Data Science or Analytics to my skills?

Should I add Data Science or Analytics to my skills?

Comments
1 min read
Innowise is open for internships for Data Engineers and Data Analytics

Innowise is open for internships for Data Engineers and Data Analytics

Comments
1 min read
10 Future Apache Iceberg Developments to Look forward to in 2025

10 Future Apache Iceberg Developments to Look forward to in 2025

Comments
13 min read
đź“Š AI Dashboard Builder: Create Insightful Dashboards just Droppping your Data

đź“Š AI Dashboard Builder: Create Insightful Dashboards just Droppping your Data

Comments
2 min read
Setting up memory for Flink - Configuration

Setting up memory for Flink - Configuration

Comments
3 min read
Mastering Dynamic Allocation in Apache Spark: A Practical Guide with Real-World Insights

Mastering Dynamic Allocation in Apache Spark: A Practical Guide with Real-World Insights

Comments
3 min read
Talend vs. Apache Kafka: Which Data Tool Drives Better Business Insights?

Talend vs. Apache Kafka: Which Data Tool Drives Better Business Insights?

Comments
6 min read
LightningChart Python 1.0

LightningChart Python 1.0

Comments
1 min read
Introduction to Data lakes: The future of big data storage

Introduction to Data lakes: The future of big data storage

10
Comments
2 min read
Explorer l'API de 360Learning : de l'agilité de Power Query à la robustesse de la Modern Data Stack

Explorer l'API de 360Learning : de l'agilité de Power Query à la robustesse de la Modern Data Stack

6
Comments
12 min read
Data Pipeline Filters 101: Choosing Between Static and Dynamic Approaches

Data Pipeline Filters 101: Choosing Between Static and Dynamic Approaches

Comments
1 min read
The Apache Iceberg™ Small File Problem

The Apache Iceberg™ Small File Problem

5
Comments
3 min read
Ensuring Data Quality: Best Practices and Automation

Ensuring Data Quality: Best Practices and Automation

Comments
6 min read
Data Science Simplified: Tips for Aspiring Data Scientists in 2025

Data Science Simplified: Tips for Aspiring Data Scientists in 2025

1
Comments
4 min read
2025 Guide to Architecting an Iceberg Lakehouse

2025 Guide to Architecting an Iceberg Lakehouse

13
Comments
14 min read
Dremio, Apache Iceberg and their role in AI-Ready Data

Dremio, Apache Iceberg and their role in AI-Ready Data

Comments
7 min read
Leveraging Python's Pattern Matching and Comprehensions for Data Analytics

Leveraging Python's Pattern Matching and Comprehensions for Data Analytics

Comments
12 min read
One Off to One Data Platform: Design with Intent [Part 2]

One Off to One Data Platform: Design with Intent [Part 2]

9
Comments
5 min read
Case Study: Creating an ETL Data Pipeline using AWS Services - Real-World Problem

Case Study: Creating an ETL Data Pipeline using AWS Services - Real-World Problem

Comments
2 min read
Understanding Star Schema vs. Snowflake Schema

Understanding Star Schema vs. Snowflake Schema

Comments
1 min read
ChatGPT Launches Pro: What's it Mean for Data Professionals?

ChatGPT Launches Pro: What's it Mean for Data Professionals?

2
Comments
4 min read
Introduction to Apache Kafka

Introduction to Apache Kafka

5
Comments 1
3 min read
Mastering Workflow Automation with Apache Airflow for Data Engineering

Mastering Workflow Automation with Apache Airflow for Data Engineering

Comments
6 min read
Mastering Twitter Data Collection: A Comprehensive Guide to Efficient Scraping Solutions

Mastering Twitter Data Collection: A Comprehensive Guide to Efficient Scraping Solutions

Comments
3 min read
Jupyter Notebooks in Docker

Jupyter Notebooks in Docker

4
Comments 1
3 min read
🚀 Beyond Data Ingestion: Advanced Strategies for Optimizing API Data Pipelines

🚀 Beyond Data Ingestion: Advanced Strategies for Optimizing API Data Pipelines

3
Comments 1
3 min read
SQL "SELECT INTO" vs "INSERT INTO SELECT" statements.

SQL "SELECT INTO" vs "INSERT INTO SELECT" statements.

Comments
1 min read
ACID Properties in Databases: What Happens Without Them?

ACID Properties in Databases: What Happens Without Them?

5
Comments
6 min read
🕵️ OSINT: link company acronyms to Standard Occupation Classification w. Open Source LLMs

🕵️ OSINT: link company acronyms to Standard Occupation Classification w. Open Source LLMs

1
Comments 8
6 min read
Data Architecture Best Practices

Data Architecture Best Practices

1
Comments
6 min read
My Journey into Data AI and Machine Learning

My Journey into Data AI and Machine Learning

Comments
1 min read
🚀 Unlock the Power of ORC File Format 📊

🚀 Unlock the Power of ORC File Format 📊

5
Comments
1 min read
The Ultimate Data Engineering Roadmap: From Beginner to Pro

The Ultimate Data Engineering Roadmap: From Beginner to Pro

6
Comments 1
8 min read
Designing robust and scalable relational databases: A series of best practices.

Designing robust and scalable relational databases: A series of best practices.

10
Comments 5
17 min read
From Data to Decisions: How Machine Learning Works in 2025

From Data to Decisions: How Machine Learning Works in 2025

2
Comments
3 min read
Why Data Security is Broken and How to Fix it?

Why Data Security is Broken and How to Fix it?

1
Comments
5 min read
From ETL and ELT to Reverse ETL

From ETL and ELT to Reverse ETL

Comments
4 min read
*Mastering Informatica Intelligent Cloud Services (IICS) for Cloud Data Integration*

*Mastering Informatica Intelligent Cloud Services (IICS) for Cloud Data Integration*

1
Comments
3 min read
OLAP (Online Analytical Processing)

OLAP (Online Analytical Processing)

5
Comments
3 min read
The Future of Agentic Systems Podcast 1:42:26

The Future of Agentic Systems Podcast

6
Comments 1
1 min read
What is Data Engineering?

What is Data Engineering?

Comments
1 min read
Deep Dive into Dremio's File-based Auto Ingestion into Apache Iceberg Tables

Deep Dive into Dremio's File-based Auto Ingestion into Apache Iceberg Tables

1
Comments
13 min read
One Off to One Data Platform: The Unscalable Data Platform [Part 1]

One Off to One Data Platform: The Unscalable Data Platform [Part 1]

2
Comments
3 min read
What are the major advantages of a cloud warehouse solution over an on-premises data warehouse solution?

What are the major advantages of a cloud warehouse solution over an on-premises data warehouse solution?

Comments 1
5 min read
Databricks vs. Hadoop: Which Platform is Best for Predictive Analytics?

Databricks vs. Hadoop: Which Platform is Best for Predictive Analytics?

2
Comments 1
7 min read
End-to-End ETL and Sales Dashboard on WWI dataset in Microsoft Fabric

End-to-End ETL and Sales Dashboard on WWI dataset in Microsoft Fabric

Comments
7 min read
Hands-on with Apache Iceberg & Dremio on Your Laptop within 10 Minutes

Hands-on with Apache Iceberg & Dremio on Your Laptop within 10 Minutes

Comments
19 min read
All About Parquet Part 09 - Parquet in Data Lake Architectures

All About Parquet Part 09 - Parquet in Data Lake Architectures

Comments
5 min read
All About Parquet Part 02 - Parquet's Columnar Storage Model

All About Parquet Part 02 - Parquet's Columnar Storage Model

Comments
4 min read
Data Analysis: The Unsung Hero of Modern Business

Data Analysis: The Unsung Hero of Modern Business

Comments
2 min read
Analyzing Airbnb Listings in Chicago: A Power BI Dashboard Project

Analyzing Airbnb Listings in Chicago: A Power BI Dashboard Project

1
Comments
4 min read
Intro to SQL using Apache Iceberg and Dremio

Intro to SQL using Apache Iceberg and Dremio

4
Comments
22 min read
5 Best ETL Tools: A Comprehensive Comparison Guide

5 Best ETL Tools: A Comprehensive Comparison Guide

1
Comments
3 min read
Data Engineering with Scala: Mastering Real-Time Data Processing with Apache Flink and Google Pub/Sub

Data Engineering with Scala: Mastering Real-Time Data Processing with Apache Flink and Google Pub/Sub

1
Comments
15 min read
SAP S/4HANA Cloud

SAP S/4HANA Cloud

Comments 1
2 min read
loading...