SemanticDiff is a programming-language aware diff for Visual Studio Code. The extension helps you understand code changes faster by hiding style-only changes, detecting moved code blocks as well as refactorings.
If you want to see the features in action, try out the online demo. Simply select one of the existing examples or enter your own code.
Most diff tools like
git diff detect changes between two versions of a code by comparing each line character-by-character. While this works well in many cases, it can produce a lot of noise if you reformat your code or perform other types of refactorings.
For example, splitting the parameters of a complex function call across multiple lines produces a diff that isn't very useful. It looks like you have completely replaced the code:
- verify_token = generate_token(user, models.TokenType.EmailVerification, datetime.timedelta(days=2), email=user.email) + verify_token = generate_token( + user, + models.TokenType.EmailVerification, + datetime.timedelta(days=3), + email=user.email, + )
You would now need to manually compare all the parameters to spot that I extended the duration from 2 to 3 days. Wouldn't it be great if your diff tool could filter out all the irrelevant changes, so that you can immediately see the changed parameter?
That is what SemanticDiff does. It makes reviewing code less tedious and more secure. By hiding style-only changes and highlighting modifications within moved code blocks, you are less likely to overlook anything important and you have to review less overall.
SemanticDiff implements a pipeline consisting of three stages:
The old and new code is parsed into an abstract syntax tree (AST). These trees contain all the information that a compiler or interpreter would need to compile or interpret your code. This approach gives us two advantages over the original text representation:
- We have additional information about the meaning of individual characters (e.g. that
generate_tokenis the name of a function that is going to be called).
- All characters that don't have an effect on the program flow, like white-space or line breaks outside of strings, are automatically filtered out.
The nodes of the old and new tree are matched to identify which parts of the code have changed and which are still the same. This involves comparing all the nodes of the old and new tree to find those that are identical.
Since we know the structure of the code, we can implement a more advanced comparison than a normal text diff. For example, the following three python statements look quite different when comparing the code character-by-character, but it is easy to verify that they are identical using the AST tree.
a = "Hello\nWorld" a = 'Hello\n' \ 'World' a = """Hello World"""
The last step is to create a side-by-side diff so that a developer has an easy way to understand what has changed. This involves aligning the old and new source code using the generated mapping. All old nodes that can not be found in the new tree are marked as deleted in the diff and vice versa for new nodes.
Since we allow mapping any pairs of nodes across the two trees, some parts of the code can not be aligned properly. This occurs for example, if a block of code has been moved. Special handling for these cases is required.
SemanticDiff will soon enter closed beta. If you are interested in better diffs integrated directly into VS Code, join the waitlist. You get a notification email as soon as the beta starts.
In the meantime you can play around with the online demo.
Also let me know if you want to know more about the inner workings of SemanticDiff 😃️.