DEV Community

benzsevern profile picture

benzsevern

Building open-source data quality tools in Python. Creator of the Golden Suite: GoldenMatch, GoldenFlow, GoldenCheck, GoldenPipe, and InferMap. 2,400+ tests, 10K+ monthly downloads on PyPI.

Location Pennsylvania, United States of America Joined Joined on  Email address benzsevern@gmail.com Personal website https://bensevern.dev github website

Education

West Chester University of Pennsylvania

Work

Creator of the Golden Suite

GoldenMatch vs. Splink vs. Dedupe vs. RecordLinkage: A Practical Comparison

GoldenMatch vs. Splink vs. Dedupe vs. RecordLinkage: A Practical Comparison

Comments
8 min read

Want to connect with benzsevern?

Create an account to connect with benzsevern. You can also sign in below to proceed if you already have an account.

Already have an account? Sign in
GoldenMatch vs. BPID: Testing Against an EMNLP Benchmark

GoldenMatch vs. BPID: Testing Against an EMNLP Benchmark

Comments
7 min read
Deduplicating 401,000 Equipment Auction Records with LLM Calibration

Deduplicating 401,000 Equipment Auction Records with LLM Calibration

Comments
6 min read
AI-Powered Deduplication: How LLMs Supercharge the Golden Suite

AI-Powered Deduplication: How LLMs Supercharge the Golden Suite

Comments
8 min read
Getting Started with GoldenPipe: Clean Data in Your Python Backend

Getting Started with GoldenPipe: Clean Data in Your Python Backend

Comments
6 min read
Entity Resolution on 208,000 Real Records with the Golden Suite

Entity Resolution on 208,000 Real Records with the Golden Suite

Comments
7 min read
10 Data Problems Every Pipeline Hits (and the One-Liner Fixes)

10 Data Problems Every Pipeline Hits (and the One-Liner Fixes)

Comments
4 min read
Two Hospitals Matched Patient Records Without Sharing a Single Name

Two Hospitals Matched Patient Records Without Sharing a Single Name

Comments
4 min read
I Deduplicated 100K Records in 12 Seconds With One Command

I Deduplicated 100K Records in 12 Seconds With One Command

Comments
5 min read
How to Deduplicate 100,000 Records in 13 Seconds with Python

How to Deduplicate 100,000 Records in 13 Seconds with Python

Comments
3 min read
loading...