DEV Community

Cover image for Vectorize your code repositories and PDFs with AI
irafrog
irafrog

Posted on

Vectorize your code repositories and PDFs with AI

Introduction


In this tutorial, we'll walk you through the process of using the @polyfact/vectorizer package to vectorize code repositories and PDFs using this AI package. This will help you convert your textual data into vector representations that can be used for various machine learning and data analysis tasks.


Table of Contents

  • Stack
  • Step 1: Installation
  • Step 2: Usage as a Library
  • Step 2.1: Usage via Command Line Interface
  • Conclusion
  • Other resources

Stack

To use the PolyFact vectorizer, you only need a terminal and your preferred work environment. Choose a document or a repository you want to vectorize.


Step 1: Installation

To get started, you need to install the @polyfact/vectorizer package. You can do this using the Node Package Manager (npm):

npm install @polyfact/vectorizer

Enter fullscreen mode Exit fullscreen mode

If you want to use the CLI globally, you can install it like this:

npm install -g @polyfact/vectorizer

Enter fullscreen mode Exit fullscreen mode

Step 2: Usage as a Library

Importing the Library

First, let's import the @polyfact/vectorizer library and set up the vectorizer:

import Vectorizer, { SourceType } from "@polyfact/vectorizer";

const token = "your-api-token";
const maxTokens = 1000; // Adjust as needed
const sourceType = SourceType.DIRECTORY;

const vectorizer = new Vectorizer(token, maxTokens, sourceType);

Enter fullscreen mode Exit fullscreen mode

Vectorizing Code Repositories

Now, let's see how you can use the vectorizer to process code repositories:

const filePaths = ["path/to/your/repository"];
const files = await vectorizer.readFiles(filePaths);
await vectorizer.vectorize(files, progressCallback);

const memoryId = vectorizer.getMemoryId();

Enter fullscreen mode Exit fullscreen mode

In this code snippet, the vectorizer tool is utilized to process and convert folders, PDFs, or audios from a specified path into vectorized format. Upon completion, a unique memory ID is returned. This memory ID acts as a distinct identifier, allowing you to pair it with the generate function's memoryId option. Consequently, when sending a task related to your files, PDF, or audio, the model will directly leverage your embeddings.

It is also possible to use the PolyFact SDK to do the same thing, except for the PDFs. You can find out more here.


Step 2.1: Usage via Command Line Interface

Vectorize a Code Repository

To vectorize an entire code repository, use the following CLI command:

@polyfact/vectorizer repo path/to/your/repository --token your-api-token --max-token 1000

Enter fullscreen mode Exit fullscreen mode

Vectorize PDF Files

To vectorize PDF files, use the following CLI command:

@polyfact/vectorizer pdf file1.pdf file2.pdf --token your-api-token

Enter fullscreen mode Exit fullscreen mode

Conclusion

Congratulations! You've learned how to use the @polyfact/vectorizer package to vectorize code repositories and PDFs using the PolyFact AI. These vector representations can be incredibly useful for various machine learning and data analysis tasks. Feel free to explore the PolyFact SDK documentation to learn more about how to use the generated memory ID in your projects.

For more information and more packages, refer to the official documentation.

Other resources:

Top comments (0)