DEV Community

Jarrod Roberson
Jarrod Roberson

Posted on • Updated on

I "did my own research" and "AI" is not taking my job any time soon.

TL;DR - But I already knew that ...

I tried to get ChatGPT 3.5 and 4 as well as Bard to increase my productivity. I wasted 6 hours of my life proving what I already knew from just reading the docs on where they got their training on "writing code"; StackOverflow.

Yeah, all three failed every test they claim they can do:

Unless you 100% plagiarize Stack Overflow in your daily job or you exclusively write "ToDo" apps or other strawman tutorial apps you have nothing to worry about.

I recently wrote a Go implementation of CUID2 because I could not find an existing one. It is not hello-world, but it is not duff's device either, which by the way neither could explain what it did from just the raw code in isolation.

CUID2 is non-trivial but nothing that someone with the most basic understanding of how to read JavaScript should be able to port just reading the code of the original implementation.

There is a Java version as well. And as a former Java main, that was useful to confirm I was interpreting the JavaScript code correctly.

So after seeing the 1000th article on every tech and non-technical blogging site about how every programmer is now just counting their days until they will be out of a job, I decided to prove what I already knew; they were all written by people that had absolutely zero comprehension about what they were writing about and were probably using Chat GPT to create the drivel themselves by asking it to write their articles for them, otherwise why would they all sound like paraphrased copies of each other?

First you need to understand why nothing based on the technology that these things are based on is not replacing anyone that creates anything original, because it can not create anything original. The creators of say this themselves.

This is how they work; at best they regurgitate back what they have consumed with the same context but worded differently based on how you ask them to word it, at worst they just vomit up nonsense out of context that sounds like it makes sense because the human reading it has zero critical thinking, reading comprehension skill or instructed it to phrase the response as a confirmation bias fallacy.

And even in the best of cases, what they are "paraphrasing" has at best a less than 50% chance of being 100% factually correct unless the prompt is engineered to where it is just flat out plagiarizing source material that just happens to be factually correct.

That is because the Large Language Model (LLM) is trained on the entirety of the internet. And at least 99% of the internet is just opinion and misinformation and that last 1%, well it is mostly opinion and misinformation because 100% of the internet was written by humans. It is also by design, unable to produce deterministic output.

How can you rely on information to be factually correct, if every time you ask the exact same question you get different answers. Sometimes wildly different and/or contradictory.

Now consider that ChatGPT and Bard were both trained primarily on StackOverflow, you can understand why my experiments were 100% a waste of time.

StackOverflow is the last place you go to find accurate code to do anything unless you know how to do it already yourself. It is a bastion if misinformation from almost the last 20 years. The only information that is remotely reliable, is the upvoted accepted answers from the first 2 or 3 years of its existence. After that, the signal to noise ratio of the actual experts to the dunning/kruger "experts" became inverted to the point that objectively incorrect answers were upvoted by orders of magnitudes sometimes because the voters did not understand why what they were upvoting was incorrect and why the correct answer was correct. It is the largest repository of evidence that the dunning/kruger syndrome is real you can find anywhere on the internet.

Neither ChatGPT nor Bard mention anything about vetting of the information they were trained on other than some guardrails about not generating "abusive" content, that are less guardrails and more like gossamer strands that exist only for plausible deniability legal protection.

It claims to be able to perform the following tasks:

  • Code completion
  • Code linting
  • Code refactoring
  • Code generation
  • Code translation

So I tried each one of those tasks, both systems failed miserably at each one of them.

Code completion

Useless without prompting what all the previous code exactly was, and even then it still insisted on offering names of variables, functions, methods, classes, etc that did not exist in the code I was asking it to complete. This is because it is just matching against other code it has read, if I have to feed it all my previous code and tell it to only consider that code, I have already spent more time than just typing it out myself. Granted, this would likely preform better in the context of an IDE, but guess what, Jetbrains IDE did for years before "AI" was all the rage and I never needed to doubt it.

Code linting

This falls under the same category as completion, this has been done for decades without "AI" and deterministically. I found that it failed, as in false positives in edge cases than it could possibly save time in any edge cases than a non-"AI" linter might have trouble with.

Refactoring

Failed spectacularly. In every case I asked it to refactor things, At best code it spit out code that would not compile or at worse, it compiled and ran but had very subtle bugs being introduced because it was using some source of incorrect code it was trained on verbatim. This is the problem with no vetting. I was able to use the non-"AI" Google search and find some of the incorrect code it offered up, just by searching on it in "". In all of the cases where I got an exact match; it was a StackOverflow question or answer or both that contained the code snippet. This because both systems keep the comments when they spit out the code, that is how you can tell something was verbatim.

Generation

This is the one thing that got more correct than incorrect. if you ask it to generate code that a non-"AI" code generator would produce. But in every case I could get it to work, a non-"AI" tool already exists that does exactly the thing. And

For example, given some JSON generate a struct in Go that represents the JSON, and generate functions Marshal and Unmarshal it. This is pretty much just a self trained "structural search and replace" like Jetbrains IDEs have had for a while.

I still struggled to get it to produce code that compiled reliably to the point that by the time I had a prompt that would reliably generate compliable code I could have just written it myself, much less used an online JSON to struct generator.

Both systems failed with asked to generate code that it had not been trained on generating. IE, something that was not a Gang Of Four pattern or some other well documented and already written generator. So, novel things that did not exist was just meaningless code salad.

They would just give up or they would generate gibberish. I would ask for Go, Rust or Erlang and would get Python or JavaScript in some cases, because nothing existed for it to plagiarize/paraphrase already in the requested language.

And that brings us to translation.

Translation

Wow, these were epic fails. I used the CUID2 JavaScript code and it just flat out refused to translate the entire source code file in one go to anything but TypeScript. Really? It could only translate "correctly" to a language that was a super-set of the original language.

Then I spent way too much time engineering a extremely complicated prompt to "force" it to translate it to Go and the results were, well, failure.

It would include packages that had nothing to do with the code it had translated, it would leave out package that it did use, it would just leave out entire blocks of logic without mentioning that it threw away anything.

I tried feeding it one block of the source code at a time in the order it needed, like in old school "structured C". Declarations, then functions based on the reverse order of use. Functions that relied on other functions last and the ones with no dependencies first. It still silently left out the same chunks of logic from the individual functions.

I spent more time on trying to get this to work than anything else because this had the promise of providing the biggest productivity gains.

The only things I could get it to translate from an arbitrary source language to an arbitrary destination language were extremely basic things; loops, iterators, list reversals, struct definitions, etc. that it probably found in some Rosetta stone type site or repo in its training set.

Sometimes it generate comments in the output without being asked, and they had typos and other human looking errors. Which tells me it was just plagiarizing. Sometimes when you asked it to generate comments it would mix in comments that were correct, but obviously from another language because of the idiomatic terminology it used was native to some other language. I saw comments in translated Rust code that was obviously from Python and Java even though the source I asked it to translate was neither.

*This was the biggest waste of time out of the entire exercise.
*

Anyone that tries to say that this technology has made programmers obsolete, or will replace them or will revolutionize programmers lives and productivity are the ones that are more likely going to be replaced by it. Because they are generating the same out of context misinformation drivel that it creates.

Top comments (31)

Collapse
 
jmfayard profile image
Jean-Michel (agent double)

That was a great rant.

Oh no I'm supposed to reply

But you don't understand it's the future, you are so unfair by judging an emerging technology by its current limitation. Do you think that internet v0.1 was perfect? It didn't even support videos of cute kittens. Look how it is today. Just imagine how incredible it will be once AI is ported to the blockchain.

Those guys are right, I am not able to foresee the future.
What I don't undeerstand is how they are able to do it?

Collapse
 
ivorator profile image
ivorator

It will actually get worse, once it starts drinking it's own Kool aid. Once it starts learning from the high frequency bs content people generate and publish with it.

Collapse
 
jarrodhroberson profile image
Jarrod Roberson • Edited

i would argue “internet v0.1” was pretty perfect. i co-sysoped multiple BBSs across the world, with an Amiga 3000 and a Courier modem. i was there i would know. ;-p

Collapse
 
ivorator profile image
ivorator

If you remember the dotcom era, you probably know a lot of people claimed the internet will do stuff it still struggles to do 30 years later.

Even surviving titans such as Cisco and Amazon crashed hard. Half the IT companies died, and most of the remaining were eaten by the bigger ones.

You can pretty much bet half the companies trying to leverage AI will miserably die, because they act as if Generative AI is some sort of General AI capable of understanding.

There are limitations inherent to this type of technology which will never be overcome.

Others could be overcome but it won't be, because they are contrary to the business model. Before AI is unshackled and can be further trained in the specific context (i.e. your organization) it will be if limited use. But since OpenAI is charging by the token, they don't want you to train it further.

There is far higher probability a dev would lose their job because clueless management tanked the company playing with ai, than being replaced by AI

Thread Thread
 
jarrodhroberson profile image
Jarrod Roberson

"There is far higher probability a dev would lose their job because clueless management tanked the company playing with ai, than being replaced by AI"

^^^^ this

Thread Thread
 
jmfayard profile image
Jean-Michel (agent double)

There is far higher probability a dev would lose their job because clueless management tanked the company playing with ai, than being replaced by AI

That's such a great quote, I'm totally gonna steal it 👋🏻

Collapse
 
ivorator profile image
ivorator

GitHub copilot (with the chat integration), is quite useful. It does save lots of typing and time, as long as you do trivial, path well traveled stuff.

That being said, it isn't replacing anyone anytime soon. Even with indexing and having the context of your project, it's still clueless about solving most issues.

Not to mention being behind. If you use new version of some library, it will keep suggesting old deprecated stuff. It is also oblivious of internal company libs, services you integrate to etc. tRPC is for example far more revolutionary.

Until you can further train AI to your specific domain, will not replace anything. It will just keep suggesting and hallucinating generic stuff.

Let's face it, most issues in software development are business domain related, architecture related, and even conflicting businesses logic related.

Collapse
 
jarrodhroberson profile image
Jarrod Roberson

Jetbrains has been killing the auto complete the BS boilerplate for over a decade, no hallucinating "AI" required.

Collapse
 
mellen profile image
Matt Ellen

"as a former Java main" steals your metaphor

Collapse
 
cosjay profile image
CoSJay • Edited

"... not replacing anyone that creates anything original, because it can not create anything original"

I would guess 99% of anything original that one creates is built on chunks of code that are NOT original. Most lines of code in an app are generic in nature and that's what I've found AI very good at generating. It can create code that has to be there but is a waste of my time to write. (In general, my experience with AI and coding has been dissimilar to yours, but I appreciate your research and perspective.)

Collapse
 
jarrodhroberson profile image
Jarrod Roberson • Edited

you miss the point, the "whole" app is original composition. just like music, there are only 12 notes, but machines make music that is just paraphrased versions of what someone has already done. humans create new compositions that no one else has conceived of. And your "experience with AI" is none, because "AI" does not exist.

and Raymond Scott did not need "AI" to generate the phrases he turned into actually songs and music. He used mechanical "computers" to do it and then analog electronics. History is recorded for a reason.

Collapse
 
anderspersson profile image
Anders Persson

Have the same experience, but it nice to correct document, eg check grammar and spelling 😊

Collapse
 
jmg profile image
JMJ GREFALDO

A.i wont replace people (at least not yet ) but people who know how to utilize A.i will. I agree with the author 100%.

AI, such as GPT and BARD, can enhance productivity and speed up tasks but cannot replace human intelligence. While AI may replace certain jobs, it also creates new opportunities, necessitating upskilling. AI complements human capabilities, and the collaboration between humans and AI is crucial for leveraging its benefits effectively. The integration of AI requires adapting to the changing job market and focusing on higher-level tasks that require human judgment and problem-solving skills.

Ai is base on our data w/o us it wont evolve, even thou A.i reach super intelligence I don't think it will think the way human does, since its comprehension is way better than us & not govern by emotion or desire it will not act like human, which is most of the time motivated by lust of power and wealth.

Collapse
 
jarrodhroberson profile image
Jarrod Roberson • Edited

if you think something that produces non-deterministic randomized output that does not consider the factual correctness to include/exclude that output something that is something that can be used to generate code, then your comprehension of the requirements of writing correct programs is no better than these "AI".

these things have no "comprehension", they just pattern match the input on what they have been "trained on" and regurgitate/vomit back up the training data almost randomly.

the gishgallop of the entire thing is the parsing of the prompts and illusion that the output is somehow "intelligently" generated. it isn't, this is in concept a huge markov chain state machine mashup. granted with billions and billions of nodes and trillions of states, but is not "intelligent"

garbage in = garbage out is now "no matter what in = garbage out" for these "AI" "tools".

Collapse
 
ant_f_dev profile image
Anthony Fung

Hi JMJ.

Good points. I completely agree: AI is a tool that can help us if we learn to leverage it.

If I understand correctly, an analogy might be that it's very much like an electric screwdriver: it won't automatically tighten up every loose screw that's around it. However, our DIY projects will go along much more smoothly and quickly if we learn to incorporate it into our workflow.

Collapse
 
jarrodhroberson profile image
Jarrod Roberson • Edited

if that is your takeaway from post, that is is a "our DIY projects will go along much more smoothly and quickly if we learn to incorporate it into our workflow." in all sincerity need to study up on reading comprehension.

or if your NLP agree then add strategy failed.

first you "agree" with me, then you state your "understanding" of what I said as the exact opposite of what you just agreed with. NLP gaslighting 101

here is explained out very clearly:

so you can stop trying to frame me as agreeing with your opinion thru projection.

I said the exact opposite, to use your analogy

it is like an electric screwdriver that is unreliable to the point that you spend more time trying to figure out what is wrong with it than actually using it. but you have such an emotional investment in the idea of the screwdriver that you lie to yourself and others about how smart you are for buying it, spending so much time on trying to fix it and "understand" how to use it ... etc. Get it now? NO. it was and is an utter waste of time trying to trick a non-deterministic function designed to return random answers to produce anything useful.

Thread Thread
 
ant_f_dev profile image
Anthony Fung

Hi Jarrod.

Thanks for summarising your post so succinctly. However, I was not attempting to gaslight you. Nor was I agreeing with you, unless you use both the Jarrod Roberson and JMJ GREFALDO accounts, which I assume isn't the case given that you replied to JMJ GREFALDO

Please consider that the first line in my reply was Hi JMJ.

My comment about "incorporate it into our workflow" was my interpretation of a section of a reply to your post:

AI complements human capabilities, and the collaboration between humans and AI is crucial for leveraging its benefits effectively. The integration of AI requires adapting to the changing job market and focusing on higher-level tasks that require human judgment and problem-solving skills.

rather than the actual post itself.

Thank you for sharing and explaining your point of view.

Thread Thread
 
jarrodhroberson profile image
Jarrod Roberson

the gui on my device showed me your reply directly under my post not his, the commenting on this site is atrocious. that said, "hi jmj" is the 1 px thin font they use is easy to miss on a mobile device esp when it is scrolled off the screen when you open the gui.

Collapse
 
hayley_grace profile image
Info Comment hidden by post author - thread only accessible via permalink
Hayley Grace

Did you use AI to write your second paragraph?

Collapse
 
nikro profile image
Nagailic Sergiu (Nikro) • Edited

Problem is, you only spent 6 hours, as you claim. It's not perfect either, no one claimed that. I'm a PHP developer, I do Drupal mainly, here are some examples:

  1. I know some basic python, but now I know I can do almost anything in it. Either convert some data into CSV tables, or web-scrape a specific page / site or glue-code some binaries around the system
  2. I know basic JS but I have no idea how Gnome3 works, and yet, in a single evening, I could craft a visual-widget that talks to my Arduino and issues commands over Bluetooth, and yet, I have almost no idea how it works or where should I even start. You can claim that I could do this by reading tons of blogs, references, diving into the eco-system, but in reality - I don't care about it all, I want to code it and forget it :)
  3. I have no experience in Arm-devices and Armbian and know nothing about this eco-system, yet, I could create a MagicMirror2 module (in nodejs, which again, isn't my thing) and slap together PicoVoice + self-hosted Whisper + ChatGPT + Mimic3 and create a personal privacy-focused assistant, in under 1 week.

Sure, when it comes now to super niche areas, where there might not be enough documentation, or not enough experimentation - and not enough data online, it will be quite dumb. But if you understand the language, you know what's compilation, cmake, make, various libraries and coding patterns, you can just fix those gaps yourself, removing 1-2 lines or adjusting things - but MAN, it unlocks a ton of possibilities.

Collapse
 
iskandarreza profile image
Iskandar Reza

Well I've spent almost a month playing with this thing, even porting a python version of an AutoGPT implementation into js, and I have to agree with most of the main points of this post.

What it's very good at is summarizing text, so long as you can fit it in the 4096 token context window.

I mean it can do some code completion for basic stuff like array methods or object transformations but even then it's not super consistent and can produce code that is very different from one iteration to the next. And it's pretty crap at math.

It's a language model... Makes sense that it's good at language stuff.

Collapse
 
jarrodhroberson profile image
Jarrod Roberson • Edited

if it actually understood what code was doing instead of what it "looks like" it could explain Duff's Device and it can not. I would argue it is not even that good at the "language stuff", it mixes sources and includes debunked misinformation right along side undisputed facts like they are equal.

dev.to/jarrodhroberson/i-asked-ai-...

Collapse
 
jarrodhroberson profile image
Jarrod Roberson

so your argument is I did not waste enough time trying to learn how to program prompts for a system that is by design non-deterministic randomly generated output that is unvetted for factual correctness rather than just using the tools that work the first time and the same every time ... got it ... wow

Collapse
 
ivorator profile image
ivorator

This is the "mount stupid" effect. It "knows" just enough to do some things better than domain ignorant person. Just enough to be dangerous. I have been using copilot + integrated chat for quite a few months. So yes, it will absolutely help with common things and patterns we can be oblivious at.

Thing is, often it will give suboptimal, outdated, out of context, or plain wrong "advice".

For example, if you ask it how to send email using Google REST api and python, without google library - it will give you "working" solution. Then when you ask it to generate some html email texts for testing purposes - you'll find out some of them get mangled.

Ironically the correct solution is to simply use a module from the standard python library. Which not only complies with the RFC 2822, covers a lot more content cases, it is far more concise and readable than than what chat suggests.

It's an amateur level programmer, which can cover a huge amount of our " field ignorance". So it is actually very useful in many aspects.

However if you try using it for something you have at least intermediate knowledge - you will find it severely lacking.

Also, just as most people starting up, it wont add error handling for example.

Point being, it is useful tool, but people seem to overestimate its abilities by orders of magnitude.

Collapse
 
jarrodhroberson profile image
Jarrod Roberson

that is because it is trained on the absolute worst code base on the internet, StackOverflow. Where the most naive solution gets the most upvotes because there are more people that only understand the naive solution and Dunning/Kruger keeps them from understanding why they are incorrect.

That makes it less than useful as a net effect, for "experienced" programmers who can see the garbage for what it is, and even more so for those that can not.

Collapse
 
cosjay profile image
CoSJay

Just because you had a poor experience with ChatGPT doesn't mean that's the case for everyone. Here's a bunch of developers who have experienced productivity gains using Chatty: news.ycombinator.com/item?id=36037559

Collapse
 
jarrodhroberson profile image
Jarrod Roberson • Edited

your comment is cherry picking. If you read the sub comments and the sub comments of the sub comments, probably about 75% of them are about how he doesn’t actually generate working code, how it mixes dialects together and strips out comments or includes comments that are wrong or all the other things that it fails at.

and everyone of of the people claiming productivity gains prefix it with some form of "I am crap as SQL". Then complain that it generates broken incorrect stuff they have to fix anyway, so it is more crap than they are.

The argument that everyone makes is how much more it is making programmers, NOT crap programmers that are barely able to write code. Confirmation Bias is a hell of a drug.

Collapse
 
cosjay profile image
CoSJay • Edited

everyone of the people claiming productivity gains prefix it with some form of "I am crap as SQL".

Absolutely untrue. See your comment about confirmation bias above.

You're sitting there with a straight face telling the world that no good developer could possibly be more productive when using ChatGPT than without it -- do I have that correct?

Collapse
 
stainlessray profile image
stainlessray

I don't know what you were asking it. But I get boilerplate java generated, working without change, day one. Python too.

I finished a week of experiments and concluded that it made a great supplement to Google search, replacing it for most simple questions. Due to the fact that the first answer, which I still would need to validate (just like a Google search result) comes much faster.

It is increasingly useful to analyze the code you wrote. Hone in on best practices, and explore features I've never used.

Tbh, if your experience was this bad for real, maybe it's you.

Collapse
 
jarrodhroberson profile image
Jarrod Roberson

my IDE generates boilerplate for me with a single CTLR+ALT+ENTER, has for almost a decade. Read for comprehension if you want to know what I was asking it to do. Generate boilerplate is not one of the things I listed in my article.

Collapse
 
mikeyglitz profile image
mikeyGlitz

I think the largest potential for AI in its present state is to eliminate repetitive jobs that don’t require much thought. Programming requires more than just analytical skills, but also a degree of creativity which can’t be accurately replicated by these large language models. People think it does more than it does. Best case, you replace mediocre talent that copy-pastes most of their code.

Some comments have been hidden by the post's author - find out more