DEV Community

# bigdata

Posts

馃憢 Sign in for the ability to sort posts by relevant, latest, or top.
Data Engineering Terminology: Understanding Upstream and Downstream in Data Pipelines

Data Engineering Terminology: Understanding Upstream and Downstream in Data Pipelines

Comments
1 min read
Business Intelligence Data Analyst vs. BI Developer

Business Intelligence Data Analyst vs. BI Developer

2
Comments
3 min read
馃弳How to master 馃搳 Big Data pipelines with Taipy and PySpark 馃悕

馃弳How to master 馃搳 Big Data pipelines with Taipy and PySpark 馃悕

197
Comments 8
9 min read
BigData Journey from Hadoop and MapReduce to AWS EMR

BigData Journey from Hadoop and MapReduce to AWS EMR

Comments
9 min read
From Hadoop to Cloud: Why and How to Decouple Storage and Compute in Big Data Platforms

From Hadoop to Cloud: Why and How to Decouple Storage and Compute in Big Data Platforms

Comments
13 min read
S3 Multi-Part Upload: Part 2 Conclusion

S3 Multi-Part Upload: Part 2 Conclusion

5
Comments
11 min read
Big data models 馃搳 vs. Computer memory 馃捑

Big data models 馃搳 vs. Computer memory 馃捑

186
Comments 3
11 min read
Working with Parquet files in Java using Avro

Working with Parquet files in Java using Avro

1
Comments
10 min read
Which Scenarios Does ClickHouse Applies to?

Which Scenarios Does ClickHouse Applies to?

5
Comments
9 min read
Most common errors when setting up Amazon EMR

Most common errors when setting up Amazon EMR

8
Comments
2 min read
HyperLogLog | Un algoritmo para contarlos (aproximadamente) a todos

HyperLogLog | Un algoritmo para contarlos (aproximadamente) a todos

2
Comments
6 min read
Data-Powered Accessibility: How to Build Inclusive Product for Any User Need

Data-Powered Accessibility: How to Build Inclusive Product for Any User Need

48
Comments
7 min read
Install Hadoop on Ubuntu

Install Hadoop on Ubuntu

1
Comments
6 min read
Connecting Multiple Kafka Clusters in ClickHouse Using Named Collections

Connecting Multiple Kafka Clusters in ClickHouse Using Named Collections

4
Comments
3 min read
SPL computing performance test series: in-group accumulation

SPL computing performance test series: in-group accumulation

1
Comments
12 min read
Log Analysis: Elasticsearch VS Apache Doris

Log Analysis: Elasticsearch VS Apache Doris

Comments
11 min read
SPL computing performance test series: funnel analysis

SPL computing performance test series: funnel analysis

1
Comments
16 min read
SPL computing performance test series: associate tables and wide table

SPL computing performance test series: associate tables and wide table

Comments
6 min read
SPL computing performance test series: position association

SPL computing performance test series: position association

1
Comments
12 min read
SPL computing performance test series: multi-index aggregating

SPL computing performance test series: multi-index aggregating

1
Comments
6 min read
What is '_spark_metadata' Directory in Spark Structured Streaming ?

What is '_spark_metadata' Directory in Spark Structured Streaming ?

Comments
3 min read
SQL is consuming the lives of data scientists

SQL is consuming the lives of data scientists

6
Comments 3
20 min read
鉀徛燝et Mining into Data with These Top 5 Resources

鉀徛燝et Mining into Data with These Top 5 Resources

5
Comments 2
6 min read
Apache Doris 2.0 Beta Now Available: Faster, Stabler, and More Versatile

Apache Doris 2.0 Beta Now Available: Faster, Stabler, and More Versatile

Comments
15 min read
Is Your Latest Data Really the Latest? Check the Data Update Mechanism of Your Database

Is Your Latest Data Really the Latest? Check the Data Update Mechanism of Your Database

4
Comments 1
6 min read
Introduction to Big-data

Introduction to Big-data

2
Comments 2
3 min read
The performance problems of data warehouse and solutions

The performance problems of data warehouse and solutions

Comments
14 min read
Snowflake: Revolutionizing data warehousing

Snowflake: Revolutionizing data warehousing

3
Comments
6 min read
5 Common Mistakes with Apache Flink and How to Avoid聽Them

5 Common Mistakes with Apache Flink and How to Avoid聽Them

2
Comments
3 min read
Next Big Data System

Next Big Data System

Comments
1 min read
Open-source SPL锛 The Breaker of Closed Database Computing System

Open-source SPL锛 The Breaker of Closed Database Computing System

Comments 1
8 min read
3 Data Observability Tools

3 Data Observability Tools

Comments
3 min read
Why Are There So Many Snapshot Tables in BI Systems?

Why Are There So Many Snapshot Tables in BI Systems?

5
Comments
9 min read
Why does wide table prevail?

Why does wide table prevail?

5
Comments
13 min read
Bulk load to Elastic Search with PySpark

Bulk load to Elastic Search with PySpark

2
Comments
2 min read
Routable computing engine implements front-end database

Routable computing engine implements front-end database

Comments
5 min read
Processing EventHub Captured Messages in Avro Files Using Databricks

Processing EventHub Captured Messages in Avro Files Using Databricks

Comments
2 min read
How does the in-memory database bring memory鈥檚 advantage into play?

How does the in-memory database bring memory鈥檚 advantage into play?

Comments
12 min read
How to clone tables in BigQuery

How to clone tables in BigQuery

2
Comments
1 min read
Why ETL Becomes ELT or Even LET?

Why ETL Becomes ELT or Even LET?

Comments
8 min read
HTAP: Learning from Xiaohongshu

HTAP: Learning from Xiaohongshu

1
Comments
5 min read
HTAP database cannot handle HTAP requirements

HTAP database cannot handle HTAP requirements

Comments
13 min read
Integrating Apache Age with Other Big Data Tools and Frameworks

Integrating Apache Age with Other Big Data Tools and Frameworks

2
Comments 1
2 min read
The current Lakehouse is like a false proposition

The current Lakehouse is like a false proposition

Comments
11 min read
How to make the columnar storage data warehouse more efficient

How to make the columnar storage data warehouse more efficient

Comments
11 min read
Exploration of Spark Executor Memory

Exploration of Spark Executor Memory

Comments
9 min read
Simplest pyspark tutorial

Simplest pyspark tutorial

2
Comments
7 min read
Making Debezium 2.x Support Confluent Schema Registry

Making Debezium 2.x Support Confluent Schema Registry

1
Comments 3
3 min read
Performance Enhancement: Conversion Funnel Analysis

Performance Enhancement: Conversion Funnel Analysis

Comments
9 min read
Boost Your Testing Strategy: The Coolest Methods to Prioritize A/B Tests Like a Pro! 馃幉馃搳馃槑

Boost Your Testing Strategy: The Coolest Methods to Prioritize A/B Tests Like a Pro! 馃幉馃搳馃槑

3
Comments
4 min read
A Comprehensive Comparison of JuiceFS and HDFS for Cloud-Based Big Data Storage

A Comprehensive Comparison of JuiceFS and HDFS for Cloud-Based Big Data Storage

1
Comments
11 min read
How to use docker to compile Apache Doris

How to use docker to compile Apache Doris

2
Comments
3 min read
Apache Doris be common problem positioning and processing

Apache Doris be common problem positioning and processing

1
Comments
3 min read
The Secret to Rapid Scaling: How Scraping Helped These Startups Go From Zero to $1.2+ Trillion

The Secret to Rapid Scaling: How Scraping Helped These Startups Go From Zero to $1.2+ Trillion

6
Comments 1
6 min read
Mastering Large-Scale Data Processing: Building a Data Pipeline with ApacheAGE for Efficient Ingestion, Processing, and Analysis

Mastering Large-Scale Data Processing: Building a Data Pipeline with ApacheAGE for Efficient Ingestion, Processing, and Analysis

2
Comments
2 min read
How we mastered dbt: A true story

How we mastered dbt: A true story

7
Comments
14 min read
GETTING STARTED WITH SENTIMENT ANALYSIS.

GETTING STARTED WITH SENTIMENT ANALYSIS.

2
Comments
4 min read
Lightweight HTTP API for Big Data on S3

Lightweight HTTP API for Big Data on S3

3
Comments
3 min read
How to cope with high-concurrency account query?

How to cope with high-concurrency account query?

Comments
6 min read
Don't Break the Bank on SQL Queries: BigQuery On-Demand vs Flat-Rate prices. Which Saves You More? 馃挵馃槑

Don't Break the Bank on SQL Queries: BigQuery On-Demand vs Flat-Rate prices. Which Saves You More? 馃挵馃槑

5
Comments 3
5 min read
loading...