Reviewing AI Code Search Tools

#opensource #ai #programming #productivity

Introduction

Over the past year, there’s been an explosion of AI-assisted coding tools — Github Copilot, Codeium, and Cursor. Large language models (LLMs) have been applied for a variety of use cases for dev tools — debugging, generating code, data analysis, and more. However, one application that I’ve been particularly interested in is code search, particularly over large enterprise-scale codebases. For me personally, the most annoying part of coding has always been onboarding onto a new codebase and keeping track of the system in my head to figure out where to make my desired changes. I’ve seen a bunch of articles comparing the different code assistants, but I was curious whether any of the new AI-first code search tools were an improvement over plain old Github search.

In this blog post, I’ll be comparing 3 distinct AI-first code search tools I recently came across: Cody (developed by late-stage startup, Sourcegraph), SeaGOAT (an open-source project that was trending on HN last week), and Bloop (an early-stage YC startup). I’ll be evaluating them along the dimensions of user-friendliness as well as their accuracy.

Why Code Search Differs from Code Generation

However, before I delve into the comparison, let me quickly touch on why code search is different than code generation.

Code search requires effectively retrieving all of the relevant files/snippets, rather than generating one of many correct answers. If you were to ask ChatGPT to generate a Tic-Tac-Toe game, many different versions of code will produce a functioning game. On the other hand, if you’re searching a codebase for “all of the files that interface with the database,” you would expect the tool to exhaustively retrieve all of the relevant files/snippets of code.

In addition, code search tools need to index the entire codebase to effectively search and retrieve relevant snippets of code. If you were to type into ChatGPT, “implement Supabase authentication,” ChatGPT leverages all of the lines of code used in training to generate the implementation code. On the other hand, for search, you need to index the entire codebase to effectively extract the correct snippets from a project. Including the entire codebase in the prompt is not an option for most non-hobby projects. Currently, the longest context window for an LLM is approximately 100,000 tokens (courtesy of Anthropic). To put this into perspective within a codebase context, companies like Square and Google have millions of lines of code. To get around this, most code search tools will pre-index the codebase with a vector DB to quickly find the relevant code snippets.

SeaGOAT

This was an open-source repo I saw trending on Hacker News. It seemed promising — a local-first semantic search engine. While the other code search tools are proxying OpenAI’s or other LLM providers’ servers, I liked that this was running the model fully locally on my machine. However, I found that the semantic search did a poor job of pulling relevant snippets. For example, for the query “Where do I implement authentication?”, it pulled everything from a correct payment-wrapper file to random library files. Ultimately, while I appreciated the fact that it was local-first and open-source, I found that its precision was too poor to be useful — each query returned pages of irrelevant content where 3 files/snippets would have been more helpful.

Cody

This is a new product by the late-stage code search startup Sourcegraph. I found the setup to be a little bit confusing — I had to download a VSCode extension and a separate Desktop app, as well as sign up for a Sourcegraph account. In the Desktop app, I could then select which Github repos (public or private) to index.

In terms of accuracy, it did a good job of both understanding my queries & extracting relevant snippets (e.g. correctly identifying the relevant components when asked “Where is the Stripe integration for payments”).

However, it falls prey to the classic LLM hallucination issue — when asked where the code checks if a user has signed in for checkout, it responds with a snippet that does not exist. Similarly, when asked for all of the relevant files that interface with a database, it hallucinates several files that do not exist in the project.

Bloop

From a UX perspective, this was my favorite tool — it had a simple onboarding experience (OAuth through Github) and also linked the relevant files in its responses so I could easily expand beyond the response to delve into the code. While it did not have a VSCode extension, I found the Desktop app easy enough to use. I also found that it did a much better job at not hallucinating responses — correctly identifying the list of files that interface with the database as well as that there were no user sign-in checks during the checkout process.

Conclusion

For me, my favorite was Bloop — it did a better job of not hallucinating, and I preferred the UX. As an aside, I found that using this in combination with Github Copilot or Perplexity was a little bit frustrating — I’m constantly switching between different tools for the same project. I keep wishing for one tool that can wrap all of this functionality and context within one entry point. I've been exploring some of these ideas in Lightrail, but it doesn't include code search (as we discussed, that's a more complex challenge than code generation). Nevertheless, I'm excited about the possibilities in the field. Thanks for reading so far - I'd love to hear your thoughts and ideas on this!

Top comments (3)

Justin Dorfman • Sep 29 '23

However, it falls prey to the classic LLM hallucination issue — when asked where the code checks if a user has signed in for checkout, it responds with a snippet that does not exist.

Sorry to hear about this. If you are interested, I would like to help you diagnose the issue. You can email me: community@sourcegraph (.com)

Either way thanks for trying Cody out!