DEV Community

Cover image for First 15 Open Source Advent projects
Chris Churilo
Chris Churilo

Posted on • Edited on

First 15 Open Source Advent projects

Just 10 days to go!

We launched Open Source Advent at the begininng of this month to celebrate 25 different open source projects. It has been fun sharing these projects and I thought I would reshare the first 15 projects! Take a look at the repo, try the tutorial and let us know what you build!

Naturally, everyone who worked on these Open-Source projects would love a little Christmas 🎄💕 love by getting a GitHub star for their projects.

1. Milvus by Zilliz | Github

Milvus is an open-source vector database that powers embedding similarity search and AI applications and strives to make vector search accessible to every organization. Milvus can store, index, and manage a billion+ embedding vectors generated by deep neural networks and other machine learning (ML) models. It is the project we all work on here at Zilliz, so, of course it is on the list. 😇

2. FiftyOne by Voxel51 | Github | tutorial

FiftyOne is the open source toolkit for building high-quality datasets and computer vision models. With FiftyOne you can visualize, curate, manage, and QA data, and automate the workflows that make enterprise machine learning possible. They spoke at the last Unstructured Meetup and you can check out the recording here (29:10 - Speaker Jacob Marks, Vector search with computer vision data using Voxel51)

3. Quivr | GitHub | tutorial

Quivr is your personal productivity assistant to chat with your dumped files (PDF, CSV) & apps using GPT 3.5 / 4 turbo, Private, Anthropic, VertexAI, LLMs that you can share with users! Alternative to OpenAI GPTs.

4. Haystack by Deepset | Github | tutorial

Haystack is an end-to-end NLP framework that enables you to build applications powered by LLMs, Transformer models, vector search, and more. Whether you want to perform question answering, answer generation, semantic document search, or build tools capable of complex decision-making and query resolution, you can use state-of-the-art NLP models with Haystack to build end-to-end NLP applications to solve your use case. We have a video on some examples of retrieval augmentation in Haystack.

5. Proton by Timeplus | Github | tutorial

Proton is a streaming analytics database, based on ClickHouse, written in C++. Fast. Powerful, Easy

6. Ydata-synthetic and Ydata-profiling by YData | Github | tutorial

Ydata-profiling is a Python package for automated Data Quality profiling reports in a single line of code. Ydata-synthetic is a package to generate synthetic tabular and time-series data with state-of-the-art generative models.

7. Apache Flink | Github | tutorial

Apache Flink is the leading framework and distributed processing engine for stateful computations over unbounded and bounded data streams.

8. LangChain RB | Github | tutorial

LangChain RB is an original Langchain-inspired Ruby framework. The goal is to abstract complexity and difficult concepts to make building AI/ML-supercharged applications approachable for traditional Ruby software engineers. If you are a Ruby fan, we have a video to show you how to build a GenAI App End-to-End with Ruby

9. Flyte by Union AI | Github | tutorial

Flyte is an open-source orchestrator that facilitates building production-grade data and ML pipelines. It is built for scalability and reproducibility, leveraging Kubernetes as its underlying platform. With Flyte, user teams can construct pipelines using the Python SDK and seamlessly deploy them in both cloud and on-premises environments, enabling distributed processing and efficient resource utilization.

10. DVC by Iterative | Github | tutorial

DVC is a command line tool to help you develop reproducible machine learning projects.

But Wait!, There's More!

11. Phoenix by Arize AI | Github | tutorial

Phoenix is Arize AI's open-source observability library designed for experimentation, fine-tuning, and troubleshooting your LLM, CV, and NLP models in a notebook.

12. TruLens by TruEra | Github | tutorial

Observability of LLM and Multimodal apps with deep instrumentation and comprehensive evals.

13. OpenLLM by BentoML | Github | tutorial

OpenLLM is an open-source platform designed to facilitate the deployment and operation of large language models (LLMs) in real-world applications. With OpenLLM, you can run inference on any open-source LLM, deploy them on the cloud or on-premises, and build powerful AI applications.

14. LabelStudio by Human Signal | Github | tutorial

A flexible data labeling tool for all data types. Prepare training data for computer vision, natural language processing, speech, voice, and video models.

15. LlamaIndex | Github | tutorial

LamaIndex is a data framework for LLM-based applications to ingest, structure, and access private or domain-specific data.

Top comments (7)

Collapse
 
debadyuti profile image
Deb

Saw Proton which was open sourced a few weeks ago. Have you seen Fluvio?

Collapse
 
chrischurilo profile image
Chris Churilo

No. but I will check it out!

Collapse
 
nevodavid profile image
Nevo David

So many I don't know!
Thank you for sharing!

Collapse
 
gupta0112 profile image
gupta

awesome

Collapse
 
keithprinkeyops profile image
Chief Technical Officer

LlamaIndex steals private domain data? You literally put it in the description

Collapse
 
chrischurilo profile image
Chris Churilo

steal?

Collapse
 
psqbt profile image
Pushpendra Singh

Checkout postgresml: postgresml.org

Some comments may only be visible to logged-in visitors. Sign in to view all comments.