DEV Community

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
“Data has a Dream” — A Short comic about data mesh and how it can transform your company

“Data has a Dream” — A Short comic about data mesh and how it can transform your company

Comments
2 min read
How to Transpose Columns in Each Group to a Single Row

How to Transpose Columns in Each Group to a Single Row

7
Comments
2 min read
"Day 44 of My Learning Journey: Setting Sail into Data Excellence! Today's Focus: Mathematics for Data Analysis (Stats Day -22)

"Day 44 of My Learning Journey: Setting Sail into Data Excellence! Today's Focus: Mathematics for Data Analysis (Stats Day -22)

1
Comments
2 min read
Apache Doris 2.1.0: TPC-DS, Parallel Adaptive Scan, Local Shuffle, Arrow Flight-based HTTP Data API

Apache Doris 2.1.0: TPC-DS, Parallel Adaptive Scan, Local Shuffle, Arrow Flight-based HTTP Data API

Comments
29 min read
AI and Data Sets – Maximizing the Power of Data

AI and Data Sets – Maximizing the Power of Data

1
Comments
3 min read
"Day 43 of My Learning Journey: Setting Sail into Data Excellence! Today's Focus: Mathematics for Data Analysis (Stats Day -22)

"Day 43 of My Learning Journey: Setting Sail into Data Excellence! Today's Focus: Mathematics for Data Analysis (Stats Day -22)

1
Comments
2 min read
"Day 42 of My Learning Journey: Setting Sail into Data Excellence! Today's Focus: Mathematics for Data Analysis (Stats Day -21)

"Day 42 of My Learning Journey: Setting Sail into Data Excellence! Today's Focus: Mathematics for Data Analysis (Stats Day -21)

1
Comments
1 min read
The Apache Iceberg Lakehouse: The Great Data Equalizer (disrupting the Snowflake/Databricks status quo)

The Apache Iceberg Lakehouse: The Great Data Equalizer (disrupting the Snowflake/Databricks status quo)

1
Comments
7 min read
Face Detection using AI: Use Cases, Benefits and Implementation

Face Detection using AI: Use Cases, Benefits and Implementation

1
Comments 1
8 min read
A deep dive into the concept and world of Apache Iceberg Catalogs

A deep dive into the concept and world of Apache Iceberg Catalogs

Comments
8 min read
📢 About job offers, innovation & data strategy 🔭

📢 About job offers, innovation & data strategy 🔭

Comments 3
3 min read
RisingWave workshop

RisingWave workshop

1
Comments
5 min read
Visualization in dbt

Visualization in dbt

1
Comments
3 min read
Xavier's Insight: Overcoming Data Hoarding Disorder

Xavier's Insight: Overcoming Data Hoarding Disorder

5
Comments
3 min read
Production and CI/CD in dbt

Production and CI/CD in dbt

1
Comments
3 min read
The Pains of Data Ingestion

The Pains of Data Ingestion

16
Comments 3
6 min read
Testing and documenting DBT models

Testing and documenting DBT models

Comments
3 min read
Building a project in DBT

Building a project in DBT

Comments
5 min read
The Role of Ontologies in Data Management

The Role of Ontologies in Data Management

Comments
6 min read
My Experience with Apache Airflow

My Experience with Apache Airflow

2
Comments
3 min read
Different file formats, a benchmark doing basic operations

Different file formats, a benchmark doing basic operations

8
Comments 2
9 min read
When Metrics Go Awry: Analyzing KPIs using machine learning, regression analysis, and Shapley values

When Metrics Go Awry: Analyzing KPIs using machine learning, regression analysis, and Shapley values

Comments
5 min read
XGBoost Training Speed: A Comparative Analysis

XGBoost Training Speed: A Comparative Analysis

Comments
2 min read
Embarking on the Data Odyssey: A Deep Dive into Data Engineering for Tech Enthusiasts

Embarking on the Data Odyssey: A Deep Dive into Data Engineering for Tech Enthusiasts

Comments
3 min read
Big Data is dead & other stories

Big Data is dead & other stories

Comments
2 min read
How moving from Pandas to Polars made me write better code without writing better code

How moving from Pandas to Polars made me write better code without writing better code

10
Comments 2
14 min read
GroupBy and Join in Spark

GroupBy and Join in Spark

1
Comments
2 min read
10 Reasons to Make Apache Iceberg and Dremio Part of your Data Lakehouse Strategy

10 Reasons to Make Apache Iceberg and Dremio Part of your Data Lakehouse Strategy

Comments
9 min read
Exploring Feature Stores: Personal Insights and Notes on Hopsworks pt.2

Exploring Feature Stores: Personal Insights and Notes on Hopsworks pt.2

1
Comments
1 min read
Benchmarking Python Processing Engines: Who’s the Fastest?

Benchmarking Python Processing Engines: Who’s the Fastest?

3
Comments
4 min read
AWS Kinesis - Stream Storage Layer

AWS Kinesis - Stream Storage Layer

2
Comments
3 min read
Extracting data with dlt

Extracting data with dlt

Comments
7 min read
SPL - a database language featuring easy writing and fast running

SPL - a database language featuring easy writing and fast running

16
Comments
15 min read
Hands-on Guide to Enable Compute Nodes for Data Lake Analytics in Apache Doris

Hands-on Guide to Enable Compute Nodes for Data Lake Analytics in Apache Doris

Comments
4 min read
Incremental loading in dlt

Incremental loading in dlt

1
Comments
2 min read
Since When Did APIs Become Databases?

Since When Did APIs Become Databases?

Comments
4 min read
Amazon Quicksight vs Microsoft PowerBI

Amazon Quicksight vs Microsoft PowerBI

Comments
3 min read
🦿🛴Smarcity garbage reporting automation w/ ollama

🦿🛴Smarcity garbage reporting automation w/ ollama

3
Comments 4
3 min read
What to think about when designing, building, managing and operating data systems.

What to think about when designing, building, managing and operating data systems.

1
Comments
8 min read
BigQuery best practices

BigQuery best practices

1
Comments
2 min read
The Mythical Data Team

The Mythical Data Team

3
Comments
6 min read
Using data for predictive analytics

Using data for predictive analytics

Comments
6 min read
Glue Data Brew- Data Profiling & Data Quality

Glue Data Brew- Data Profiling & Data Quality

Comments
3 min read
Transform your R Dataframes: Styles, 🎨 Colors, and 😎 Emojis

Transform your R Dataframes: Styles, 🎨 Colors, and 😎 Emojis

2
Comments
9 min read
Modern Data Engineering RoadMap - 2024

Modern Data Engineering RoadMap - 2024

31
Comments 3
3 min read
IntroducciĂłn a los Data Lakes

IntroducciĂłn a los Data Lakes

3
Comments
3 min read
Data Engineering Saga part 2

Data Engineering Saga part 2

2
Comments
3 min read
Exploring Feature Stores: Personal Insights and Notes on Hopsworks

Exploring Feature Stores: Personal Insights and Notes on Hopsworks

1
Comments
1 min read
Data Evolution - Databases to Data Lakehouse

Data Evolution - Databases to Data Lakehouse

4
Comments
4 min read
How to build an Anomaly Detector using BigQuery

How to build an Anomaly Detector using BigQuery

4
Comments
12 min read
How proficient is generated AI in transforming text or natural language into SQL?

How proficient is generated AI in transforming text or natural language into SQL?

Comments
4 min read
How NASCAR delivers realtime racing data to millions of fans around the world

How NASCAR delivers realtime racing data to millions of fans around the world

16
Comments
2 min read
VS Code Extensions for Data Engineering - Part 1

VS Code Extensions for Data Engineering - Part 1

2
Comments
2 min read
Solving Pandas .to_sql Double Quotes Issue When Writing to Database

Solving Pandas .to_sql Double Quotes Issue When Writing to Database

Comments
1 min read
Saving Dataframes into Oracle Database with Python

Saving Dataframes into Oracle Database with Python

Comments
1 min read
Generating Avro Schemas from Go types

Generating Avro Schemas from Go types

Comments
5 min read
Build a federated query solution with Apache Doris, Apache Flink, and Apache Hudi

Build a federated query solution with Apache Doris, Apache Flink, and Apache Hudi

Comments
5 min read
Data Warehouse Concepts, focusing on the Kimball vs. Inmon methodologies

Data Warehouse Concepts, focusing on the Kimball vs. Inmon methodologies

2
Comments
9 min read
How to Use Pyinstaller to Generate an EXE File

How to Use Pyinstaller to Generate an EXE File

Comments
3 min read
Beginner's guide to Apache Flink

Beginner's guide to Apache Flink

1
Comments
3 min read
loading...