Software engineers, programmers, and researchers use and refurbish code daily.
A quick scan through published code packages on npm, packagist and GitHub will show you similarities in implementing certain concepts in the same language or across different languages and frameworks.
As an experienced engineer that has worked on several products & built curriculums for boot camps, I have used code search extensively in my career to find code resources, API documentation and optimal solutions to hairy problems.
But how do we find code easily, quickly and seamlessly?
GitHub Code Search
GitHub is a vast fountain of code knowledge. It is the largest and most used code host in the world.
In addition, tons of developers view GitHub as the home for their everyday private and open source projects because of the tons of features it provides for collaboration, version control and deployment.
Github Advanced Search searches the entire open source repositories on GitHub to fetch you code results. But, many developers find GitHub’s search feature limited in functionality and usability.
Example: I want to search for how the common $this->generate_response()
method is used in Laravel projects via GitHub code search.
Search Query: https://github.com/search?q=%24this-%3Egenerate_response%28%29+language%3APHP&type=code
The results shown above are excellent. However, the results I want are occurrences of $this->generate_response()
method call with no arguments passed to the method while GitHub returns everything instead of the exact results.
Well, there’s an alternative to fetch better results. Let’s try this example with Sourcegraph!
Sourcegraph Code Search
Sourcegraph is a code search engine that enables developers to explore and better understand all code, faster, with contextual code intelligence.
Example 1: I want to search for how $this->generate_response()
method is used in Laravel projects.
Search Query: https://sourcegraph.com/search?q=context:global+%24this-%3Egenerate_response%28%29+lang:PHP&patternType=literal
The results returned are very accurate results of the usage of $this->generate_response()
.
I love GitHub so much and use it daily for many things other than code search. I’m probably biased but I’ll show you how Sourcegraph’s code search is a better goldmine for developers than Github code search or any tool out there!
Note: As of this writing, Sourcegraph has indexed over 2.1M of the most starred repositories on GitHub (every repository with more than 5 stars). It aims to index every open source repository with more than 1 star on GitHub and GitLab by the end of the year.
Please grab a cup of coffee, stay calm, and let’s walk through these code search challenges!
10 Practical Code Search Challenges & Solutions
Problem 1: I want to search for a literal like “foo(“
or “.foo”
. Find exact string matches including coding punctuation.
“foo(“
results:
GitHub: https://github.com/search?q=foo%28&type=code
Sourcegraph: https://sourcegraph.com/search?q=context:global+foo%28&patternType=literal
“.foo”
results:
GitHub: https://github.com/search?q=.foo&type=code
Sourcegraph: https://sourcegraph.com/search?q=context:global+.foo&patternType=literal
Problem 2: I want to search or find code projects that use specific libraries/dependencies.
GitHub: I haven’t found a way to search for this yet.
Sourcegraph: https://sourcegraph.com/search?q=context:global+file:go.mod+cockroachdb/errors+select:repo+&patternType=literal
Note: I used go.mod because this is present in all go projects. For Node.js & PHP projects, you can use package.json and composer.json, respectively. For other projects, you can use specific files that are mandatory in these projects.
Problem 3: Regular expression matching for code and file names.
GitHub: I haven’t found a way to do this yet.
Sourcegraph: https://sourcegraph.com/search?q=context:global+foo%28.*%3F%29bar&patternType=regexp
Problem 4: Finding repositories that contain files with specific filenames and sorting the results according to the repositories with the most stars or activity.
For instance, I want to find all the repositories in my organization that contain a bundlewatch.config.js file.
GitHub: I haven’t found a way to do this yet.
Problem 5: When searching for function or class names in the codebase, fetch me only results with the function definition, not files or parts of the codebase using and making function calls.
GitHub: I haven’t found a way to do this yet.
Sourcegraph: https://sourcegraph.com/search?q=context:global+type:symbol+newrouter&patternType=literal
Problem 6: Exclude test files from my code search results. I don’t want to deal with test files when searching for function names and classes.
GitHub: I haven’t found a way to do this yet.
Sourcegraph: https://sourcegraph.com/search?q=context:global+type:symbol+newrouter+-file:_test%5C.go%24+&patternType=literal
Problem 7: Ability to search only forked repositories.
GitHub: https://github.com/search?l=&q=foobar+fork%3Aonly&type=code
I tried to search forked repositories but I got no result.
Sourcegraph: https://sourcegraph.com/search?q=context:global+github+fork:only+&patternType=literal
Problem 8: I want to be able to specify a branch while searching for code.
GitHub: I haven’t found a way to do this yet.
Once you have specified the repository in the search bar, add an @
and specify the branch just after the @
symbol. That’s all you need to search within specific branches of a code repo.
Problem 9: I want to exclude archive repositories from code search results.
GitHub: I haven’t found a way to do this yet. I found this GitHub forum thread and went down a rabbit hole trying to figure out the latest about the feature request.
Sourcegraph: Archived repositories are excluded from code search results by default.
However, if you want to include archived repositories in your Sourcegraph search results, you can make a search query like:
archived:only
https://sourcegraph.com/search?q=context:global+repo:%5Egithub.com/spatie/+archived:only+media&patternType=literal
archived:yes
https://sourcegraph.com/search?q=context:global+repo:%5Egithub.com/spatie/+archived:yes+media&patternType=literal
Problem 10: De-duplication of search results.
GitHub: I haven’t found a way to do this yet.
Sourcegraph: https://sourcegraph.com/search?q=context:global+bou.ke/monkey+-file:go%5C.mod%24+select:repo+&patternType=literal
With this search query, without the “select: repo” keywords, you can see similar search results from multiple files in the same repo: https://sourcegraph.com/search?q=context:global+bou.ke/monkey+-file:go%5C.mod%24&patternType=literal
Adding the “select: repo” to the search query ensures unique repositories get returned.
Indexing / Syncing Rate of Code Search Results
Just as GitHub lets you know the last time each code search result was indexed, Sourcegraph does the same as well as shown in the image below.
Sourcegraph indexes faster than GitHub so you never need to worry about stale results. But you don’t have to take my word for it. Try it yourself on sourcegraph.com
Conclusion
In addition to searching publicly available code, Sourcegraph makes it possible to add and search through your private code on sourcegraph.com.
I believe that every programmer should be code search literate. Investing in finding the right resources faster increases a developer’s productivity geometrically & ultimately accelerates their career.
Top comments (3)
Allow any public git to be indexed. As long as it is git. We use self hosted gitlab instance for most of our work. And soon developers will host their own git so they have more control and easy access to the followers.
Are you also looking to integrate public repos in SourceForge? Lots of older projects are still there.
On Sourcegraph Cloud, no.