Code that does not work is just text.
Ai code is specifically annoying because it looks like it would work, but its just plausible bullshit.
And that’s what happens when you spend a trillion dollars on an autocomplete: amazing at making things look like whatever it’s imitating, but with zero understanding of why the original looked that way.
I mean, there’s about a billion ways it’s been shown to have actual coherent originality at this point, and so it must have understanding of some kind. That’s how I know I and other humans have understanding, after all.
What it’s not is aligned to care about anything other than making plausible-looking text.
Coherent originality does not point to the machine’s understanding; the human is the one capable of finding a result coherent and weighting their program to produce more results in that vein.
Your brain does not function in the same way as an artificial neural network, nor are they even in the same neighborhood of capability. John Carmack estimates the brain to be four orders of magnitude more efficient in its thinking; Andrej Karpathy says six.
And none of these tech companies even pretend that they’ve invented a caring machine that they just haven’t inspired yet. Don’t ascribe further moral and intellectual capabilities to server racks than do the people who advertise them.
Coherent originality does not point to the machine’s understanding; the human is the one capable of finding a result coherent and weighting their program to produce more results in that vein.
You got the “originality” part there, right? I’m talking about tasks that never came close to being in the training data. Would you like me to link some of the research?
Your brain does not function in the same way as an artificial neural network, nor are they even in the same neighborhood of capability. John Carmack estimates the brain to be four orders of magnitude more efficient in its thinking; Andrej Karpathy says six.
Given that both biological and computer neural nets very by orders of magnitude in size, that means pretty little. It’s true that one is based on continuous floats and the other is dynamic peaks, but the end result is often remarkably similar in function and behavior.
If you would like to link some abstracts you find in a DuckDuckGo search that’s fine.
I actually was going to link the same one I always do, which I think I heard about through a blog or talk. If that’s not good enough, it’s easy to devise your own test and put it to an LLM. The way you phrased that makes it sound like you’re more interested in ignoring any empirical evidence, though.
That’s unreal. No, you cannot come up with your own scientific test to determine a language model’s capacity for understanding. You don’t even have access to the “thinking” side of the LLM.
The image is taken from Zhihu, a Chinese Quora-like site.
The prompt is talking about give a design of a certain app, and the response seems to talk about some suggested pages. So it doesn’t seem to reflect the text.
But this in general aligns with my experience coding with llm. I was trying to upgrade my eslint from 8 to 9, and ask chatgpt to convert my eslint file, and it proceed to spit out complete garbage.
I thought this would be a good task for llm because eslint config is very common and well-documented, and the transformation is very mechanical, but it just cannot do it. So I proceed to read the documents and finished the migration in a couple hour…
I asked ChatGPT with help about bare metal 32-bit ARM (For the Pi Zero W) C/ASM, emulated in QEMU for testing, and after the third iteration of “use printf for output” -> “there’s no printf with bare metal as target” -> “use solution X” -> “doesn’t work” -> “ude printf for output” … I had enough.
Sounds like it’s perfectly replicated the help forums it was trained on.
I wouldn’t say it’s accurate that this was a “mechanical” upgrade, having done it a few times. They even have a migration tool which you’d think could fully do the upgrade but out of the probably 4-5 projects I’ve upgraded, the migration tool always produced a config that errored and needed several obscure manual changes to get working. All that to say it seems like a particularly bad candidate for llms
Write tests and run them, reiterate until all tests pass.
Bogosort with extra steps
That doesn’t sound viby to me, though. You expect people to actually code? /s
You can vibe code the tests too y’know
You know, I’d be interested to know what the critical size you can get to with that approach is before it becomes useless.
It can become pretty bad quickly, with just a small project with only 15-20 files. I’ve been using cursor IDE, building out flow charts & tests manually, and just seeing where it goes.
And while incredibly impressive how it’s creating all the steps, it then goes into chaos mode where it will start ignoring all the rules. It’ll start changing tests, start pulling in random libraries, not at all thinking holistically about how everything fits together.
Then you try to reel it in, and it continues to go rampant. And for me, that’s when I either take the wheel or roll back.
I highly recommend every programmer watch it in action.
Is there a chance that’s right around the time the code no longer fits into the LLMs input window of tokens? The basic technology doesn’t actually have a long term memory of any kind (at least outside of the training phase).
Return “works”;
Am I doikg this correctly?
Try to get one of these LLMs to update a package.json.
Watching the serious people trying to use AI to code gives me the same feeling as the cybertruck people exploring the limits of their car. XD
“It’s terrible and I should hate it, but gosh it it isn’t just so cool”
I wish i could get so excited over disappointing garbage
On Error Resume Next