Impressions on GitHub Copilot and PHPStorm - March 2023

#githubcopilot #ai #php #phpstorm

After the initial promo year of free access, GitHub released a commercial version of Copilot a while ago. While the price tag isn't anything special ($10 a month or $100 a year), when you consider the fact that there are many tools, services, IDE plugins,... priced at about $10 to $25 a month, the costs quickly pile up. When you add subscriptions which are not work related, it can become a bit overwhelming. So, the question for each subscription service is - is it worth it? Here are my impressions which you may find useful.

Good, not great

First of all, let's set something straight - whatever Copilot can do, it can't be classified as great. It does many things good (or good enough), but not great. You can simply feel the architecture behind the AI in every single code or comment suggestion it makes.

What Copilot does good

The first thing which Copilot does good is writing comments for the already existing code. I am in the process of revisiting and refactoring some old PHP 7.4 code to PHP 8.2, along with the upgrade of Symfony and PHPUnit. There are parts of the code which I remember being exceptionally complex to write and optimize, which inevitably produced code which is difficult to understand now, 2 years from the time the code was written. Copilot is handy in this situation. I only need to open a comment on top of the code, or start a new docblock comment line, and the suggested description is generated. Copilot's suggestions in this case are OK and mostly true, but I do need to read and try to understand the exact meaning of the generated comment, then run through the code and see if the suggestion was right. About 80%-90% of the time it is, but there are cases where it can generate a misleading comment which would make more damage than help if left unchecked. It's important to note that the accuracy of the generated comment does not depend on the size or complexity of the code. It can be 100% accurate for a complex code, but fail on more simple one (and vice versa). Generated comments are more like good and useful guidelines.

Refactoring and rewriting existing code is one of the stronger points of Copilot. If you take a look at Terms of service, you might notice that you're giving the right to Copilot to read and analyze the code of your project and any open tab in your IDE. I can simply feel this being used extensively. For example, if I have two projects opened and trying to rewrite some old code, it will start offering code suggestions based on the currently opened file (tab) of the other project. However, this is a double-edged sword, since it will force suggestions based on the old code. That's fine if I just want to rewrite something and make minimum adjustments (like renaming properties, variables, methods,...), but if I really want to refactor and improve the code, Copilot simply gets in my way with suggestions based on the old code - and it keeps doing it so aggressively that I simply want to disable it until I'm done refactoring. In addition to this, it will almost certainly skip suggesting empty string or array initialization, even if the old code has it and even if it makes perfect sense in the code. The worst part is that it gradually tries to mix the new code with the old one, which does create a solid amount of mess. But, sometimes - it just works perfectly.

If your code follows a certain architecture and is using design patterns extensively, Copilot really shines. After all, it's made to follow patterns. Following a standard for naming classes and methods and using descriptive names can also help significantly, since it's based on a natural language model and can (in some form) "understand" the context and purpose of the code based on names. For example, include "DataProvider" in the name of a class or trait and it will start suggesting code ideal for use in PHPUnit's data providers (especially if your other data providers follow the same architecture and naming patterns). The same goes for value objects, factories, adapters,...

Where Copilot fails

The most obvious fail point is based on the architecture of any ML application today - it's trained on the existing data and will suggest solutions which are the most common for a particular situation. In my case, that's refactoring old value objects to use constructor property promotion and eliminating getters. Neural network behind Copilot is trained on numerous lines of code using getters - and it will almost exclusively suggest getters over direct property access. It does read and remember the code written previously in a class or a method, and will (eventually) adapt and follow my lead, but that's only for that particular file I'm editing at the moment. When I open a new file and start writing fresh code, it will most likely forget the suggestion rules from the previous file. It will simply fall back to the model which it was trained on, until I start rejecting its suggestions again and typing my own solutions.

Code generated by Copilot is far from being reliable. While it does offer some good suggestions, especially if you first write what you want to do in a comment and let it generate the code, it does not have any knowledge of your application, types or objects which you're using. Although sometimes the accuracy of the suggested code really manages to surprise me, there are much more cases where it just spews out garbage which I need to clean up. The garbage does make some sense, when you take into account that Copilot's neural network is based on natural language - not source code. The most obvious example is that it fairly often suggests using properties which do not exist in my objects, but can be seen as logical given the context. Again, it's the same situation as with generating comments - it does manage to provide good guidelines, but the code needs to be checked manually.

While we're at reliability, even if generated code's syntax is correct and it uses a proper set of variables, properties and methods which do exist in my code, the suggested result may not exactly do what I'm expecting. I have tested this while rewriting the old code which does have unit test coverage. About 90% of the time, unit tests, written for the manually written code, will fail for the Copilot generated code, if I let Copilot derive the logic itself instead of just letting it copy the old code. The dangerous side of this is that the generated code, when read as natural language, will almost certainly make at least some sense, but the implemented code will simply not work. I do need to carefully read through each generated code suggestion and confirm the logic behind the code, which might waste more time than simply writing the code myself.

And, lastly, the integration of Copilot, along with JetBrains code completion is a bit clumsy. There are many situations where these two overlap and that can generate some mess. It also doesn't help that Copilot is activated and tries to generate code when typing a space, a tab or a new line (Enter). It really gets in the way of proper code formatting, which is annoying. I'm also having issues with Copilot's code suggestions being added when I don't want them to be added, simply because of muscle memory when typing a well formatted code. However, that's not the problem of Copilot itself, it's a problem of the IDE plugin. Having a dedicated keyboard shortcut for activating Copilot and the ability to disable suggestions until that keyboard shortcut is pressed would be great.

Conclusion

Even though it has some bright moments, after using it for a while on a real project, I am not entirely sure whether it speeds things up or slows the development down. I can not shake the feeling that I need to double check everything it generates, and that can be a bit tiresome. Especially if you take into account the existence of JetBrains own code completion which I can use out of the box without giving it much thought. Sure, JetBrains code completion is a bit more narrow and it can't generate whole blocks of code, but at least it has a proven track record and I simply know that I don't need to check the code it generates at all. I am certain of one thing, though - the workflow when using Copilot, compared to not using it, is completely different.

Maybe the best way to describe working with Copilot, compared to classic code completion, is to paraphrase Kandi from "Two and a half men":

"With JetBrains code completion it's kind of like going on Space Mountain - it's a good ride, but there is never any real danger. With Copilot, it's like being in a backseat of a car driven by a really smart kangaroo - it may go up on the curb a couple of times, but it will get you there."

I will give it one more month and test on other languages, primarily TypeScript, but at the current state - it's not worth the money.

Buying me a coffee, on the other hand, is well worth the money :)