I tried to get ChatGPT 3.5 and 4 as well as Bard to increase my productivity. I wasted 6 hours of my life proving what I already knew from just reading the docs on where they got their training on "writing code"; StackOverflow.
Unless you 100% plagiarize Stack Overflow in your daily job or you exclusively write "ToDo" apps or other strawman tutorial apps you have nothing to worry about.
I recently wrote a Go implementation of CUID2 because I could not find an existing one. It is not
hello-world, but it is not duff's device either, which by the way neither could explain what it did from just the raw code in isolation.
So after seeing the 1000th article on every tech and non-technical blogging site about how every programmer is now just counting their days until they will be out of a job, I decided to prove what I already knew; they were all written by people that had absolutely zero comprehension about what they were writing about and were probably using Chat GPT to create the drivel themselves by asking it to write their articles for them, otherwise why would they all sound like paraphrased copies of each other?
First you need to understand why nothing based on the technology that these things are based on is not replacing anyone that creates anything original, because it can not create anything original. The creators of say this themselves.
This is how they work; at best they regurgitate back what they have consumed with the same context but worded differently based on how you ask them to word it, at worst they just vomit up nonsense out of context that sounds like it makes sense because the human reading it has zero critical thinking, reading comprehension skill or instructed it to phrase the response as a confirmation bias fallacy.
And even in the best of cases, what they are "paraphrasing" has at best a less than 50% chance of being 100% factually correct unless the prompt is engineered to where it is just flat out plagiarizing source material that just happens to be factually correct.
That is because the Large Language Model (LLM) is trained on the entirety of the internet. And at least 99% of the internet is just opinion and misinformation and that last 1%, well it is mostly opinion and misinformation because 100% of the internet was written by humans. It is also by design, unable to produce deterministic output.
How can you rely on information to be factually correct, if every time you ask the exact same question you get different answers. Sometimes wildly different and/or contradictory.
Now consider that ChatGPT and Bard were both trained primarily on StackOverflow, you can understand why my experiments were 100% a waste of time.
StackOverflow is the last place you go to find accurate code to do anything unless you know how to do it already yourself. It is a bastion if misinformation from almost the last 20 years. The only information that is remotely reliable, is the upvoted accepted answers from the first 2 or 3 years of its existence. After that, the signal to noise ratio of the actual experts to the dunning/kruger "experts" became inverted to the point that objectively incorrect answers were upvoted by orders of magnitudes sometimes because the voters did not understand why what they were upvoting was incorrect and why the correct answer was correct. It is the largest repository of evidence that the dunning/kruger syndrome is real you can find anywhere on the internet.
Neither ChatGPT nor Bard mention anything about vetting of the information they were trained on other than some guardrails about not generating "abusive" content, that are less guardrails and more like gossamer strands that exist only for plausible deniability legal protection.
It claims to be able to perform the following tasks:
- Code completion
- Code linting
- Code refactoring
- Code generation
- Code translation
So I tried each one of those tasks, both systems failed miserably at each one of them.
Useless without prompting what all the previous code exactly was, and even then it still insisted on offering names of variables, functions, methods, classes, etc that did not exist in the code I was asking it to complete. This is because it is just matching against other code it has read, if I have to feed it all my previous code and tell it to only consider that code, I have already spent more time than just typing it out myself. Granted, this would likely preform better in the context of an IDE, but guess what, Jetbrains IDE did for years before "AI" was all the rage and I never needed to doubt it.
This falls under the same category as completion, this has been done for decades without "AI" and deterministically. I found that it failed, as in false positives in edge cases than it could possibly save time in any edge cases than a non-"AI" linter might have trouble with.
Failed spectacularly. In every case I asked it to refactor things, At best code it spit out code that would not compile or at worse, it compiled and ran but had very subtle bugs being introduced because it was using some source of incorrect code it was trained on verbatim. This is the problem with no vetting. I was able to use the non-"AI" Google search and find some of the incorrect code it offered up, just by searching on it in "". In all of the cases where I got an exact match; it was a StackOverflow question or answer or both that contained the code snippet. This because both systems keep the comments when they spit out the code, that is how you can tell something was verbatim.
This is the one thing that got more correct than incorrect. if you ask it to generate code that a non-"AI" code generator would produce. But in every case I could get it to work, a non-"AI" tool already exists that does exactly the thing. And
For example, given some JSON generate a struct in Go that represents the JSON, and generate functions Marshal and Unmarshal it. This is pretty much just a self trained "structural search and replace" like Jetbrains IDEs have had for a while.
I still struggled to get it to produce code that compiled reliably to the point that by the time I had a prompt that would reliably generate compliable code I could have just written it myself, much less used an online JSON to struct generator.
Both systems failed with asked to generate code that it had not been trained on generating. IE, something that was not a Gang Of Four pattern or some other well documented and already written generator. So, novel things that did not exist was just meaningless code salad.
And that brings us to translation.
Then I spent way too much time engineering a extremely complicated prompt to "force" it to translate it to Go and the results were, well, failure.
It would include packages that had nothing to do with the code it had translated, it would leave out package that it did use, it would just leave out entire blocks of logic without mentioning that it threw away anything.
I tried feeding it one block of the source code at a time in the order it needed, like in old school "structured C". Declarations, then functions based on the reverse order of use. Functions that relied on other functions last and the ones with no dependencies first. It still silently left out the same chunks of logic from the individual functions.
I spent more time on trying to get this to work than anything else because this had the promise of providing the biggest productivity gains.
The only things I could get it to translate from an arbitrary source language to an arbitrary destination language were extremely basic things; loops, iterators, list reversals, struct definitions, etc. that it probably found in some Rosetta stone type site or repo in its training set.
Sometimes it generate comments in the output without being asked, and they had typos and other human looking errors. Which tells me it was just plagiarizing. Sometimes when you asked it to generate comments it would mix in comments that were correct, but obviously from another language because of the idiomatic terminology it used was native to some other language. I saw comments in translated Rust code that was obviously from Python and Java even though the source I asked it to translate was neither.
*This was the biggest waste of time out of the entire exercise.
Anyone that tries to say that this technology has made programmers obsolete, or will replace them or will revolutionize programmers lives and productivity are the ones that are more likely going to be replaced by it. Because they are generating the same out of context misinformation drivel that it creates.