Introducing EmbedAnything

#python #rust #machinelearning #ai

Motivation

Transformer models have become increasingly popular in recent times. One crucial requirement for all transformer models is a latent representation of the input data in embeddings. Word embeddings are used in language models, while vision models rely on patch embeddings. However, currently, there are no existing solutions to extract embeddings and custom metadata from various file formats. LangChain offers some solutions, but it is a bulky package, and extracting only the embedding data is not easy. Moreover, LangChain is not very suitable for vision-related tasks. Embeddings are helpful for language models and other models trained for various tasks, such as semantic segmentation and object detection.

This is where EmbedAnything comes in. It is a lightweight library that allows you to generate embeddings from different file formats and modalities. Currently, EmbedAnything supports PDFs and images, with many more formats in the pipeline. The idea is to provide an end-to-end solution where you can give the file and get the embeddings with the appropriate metadata.

Development of EmbedAnything started with these goals in mind:

Compatibility with Local and Cloud Models: Seamless integration with local and cloud-based embedding models.
High-Speed Performance: Fast processing to meet demanding application requirements.
Multimodal Capability: Flexibility to handle various modalities.
CPU and GPU Compatibility: Performance optimization for both CPU and GPU environments.
Lightweight Design: Minimized footprint for efficient resource utilization.

In this blog, we will see how we achieve these goals and what more must be done to improve EmbedAnything. We will also see why EmbedAnything is packaged the way it is with Rust as a backend and a Python interface.

Keeping it Local

While cloud-based embedding services like OpenAI, Jina, and Mistral offer convenience, many users require the flexibility and control of local embedding models. Here's why local models are crucial for some use cases:

Cost-Effectiveness: Cloud services often charge per API call or model usage. Running embeddings locally on your own hardware can significantly reduce costs, especially for projects with frequent or high-volume embedding needs.
Data Privacy: Certain data, like medical records or financial documents, might be too sensitive to upload to the cloud. Local embedding keeps your data confidential and under your control.
Offline Functionality: An internet connection isn't always guaranteed. Local models ensure your embedding tasks can run uninterrupted even without an internet connection.

Performance

EmbedAnything is built with Rust. This makes it faster and provides type safety and a much better development experience. But why is speed so crucial in this process?

The Need for Speed

Creating embeddings from files involves two steps that demand significant computational power:

Extracting Text from Files, Especially PDFs: Text can exist in different formats such as markdown, PDFs, and Word documents. However, extracting text from PDFs can be challenging and often causes slowdowns. It is especially difficult to extract text in manageable batches as embedding models have a context limit. Breaking the text into paragraphs containing focused information can help.
Inferencing on the Transformer Embedding Model: The transformer model is usually at the core of the embedding process, but it is known for being computationally expensive. To address this, EmbedAnything utilizes the Candle Framework by Hugging Face, a machine-learning framework built entirely in Rust for optimized performance.

The Benefit of Rust for Speed

By using Rust for its core functionalities, EmbedAnything offers significant speed advantages:

Rust is Compiled: Unlike Python, Rust compiles directly to machine code, resulting in faster execution.
Memory Management: Rust enforces memory management simultaneously, preventing memory leaks and crashes that can plague other languages.
Rust achieves true multithreading.

What does Candle bring to the table?

Running language models or embedding models locally can be difficult, especially when you want to deploy a product that utilizes these models. If you use the transformers library from Hugging Face in Python, you will depend on PyTorch for tensor operations. This, in turn, has a dependency on Libtorch, which means that you will need to include the entire Libtorch library with your product. Also, Candle allows inferences on CUDA-enabled GPUs right out of the box. We will soon post on how we use Candle to increase the performance and decrease the memory usage of EmbedAnything.

Multimodality

Finally, let's see how EmbedAnything handles multimodality. When a directory is passed for embedding to EmbedAnything, the file extension is checked to see if it is text or image and a suitable embedding model is used to generate the embeddings.

Check out an example of an image search on this Google Colab Notebook: