I've been looking for Artificial General Intelligence for years and haven't found anything close. Last time I looked [1] there was no sign of any Artificial General Intelligence. When I first looked [2], eight years ago, there was also no sign of any thinking machines. And today there still isn't. We're much closer, a recent article by engineers at Apple [3] hints at where the next breakthroughs are needed. The paper "AI:2027" makes a spirited argument that AGI will be here in two years. [5] Regardless of all the hype out there (mostly pushed by people who plan to make money off of selling general intelligence) the actual evidence is scant. Pattern matching Machine Learning using vector databases and transformers is getting better exponentially, it shows lots of competence, but little understanding... just like evolution.
But I thought I'd take a day or two and go looking again. It's been eight years since my first journey and things have really gotten more sophisticated in the interval. LLMs are actually useful, if banal, editors. They can become experts in large bodies of knowledge fairly quickly, if you know how to train them. And they're great at learning any known written test and eventually being able to be trained to produce written output that is as good or better than the best humans. This is obviously going to increase productivity in our economy by leaps and bounds in the coming years. Will it drive all middle managers out of a job? Probably not. Will it eliminate humans who program? Probably not. Will it eliminate drivers? Probably. But it's already been 20 years to get Full Self Driving Cars that can safely tool around in a small area. And they aren't driven by LLMs, but by dedicated Machine Learning systems made up of neural nets.
I’m working at a small startup (three of us) and we have a good idea of what we want to do, so I thought I’d see if ChatGPT [6] could help us advance our development. We’ve already developed a pipeline including using ChatGPT and Whisper [7] via API calls that provide us with the underpinnings of our service. But last week OpenAI advertised they have a new Whisper technology, available to be used through ChatGPT, that can transcribe audio sessions into transcripts. The whisper technology appears to be awesome, two orders of magnitude cheaper than traditional methods! ChatGPT’s claim to be able to use it: just bullshit. So don’t expect any useful work in that area through ChatGPT. It’s a dead end, even though ChatGPT will lie to you and say it can do the transcription, it’s bullshit, it can’t.
As a side note I’ve built over a dozen services with pipelines in my day, including building and maintaining the pipeline system used to process Google’s advertising revenue and sales contracts. So I do understand scaling and coding. I spent five years at Google running large parts of Cloud Tech Support, so I understand how this technology works. The most amazing thing I found was that ChatGPT will just bald-face lie to you and apologize, multiple times, ad nauseam. The entire offer was BS. The cheap version of ChatGPT claimed it could transcribe an audio output into a transcript. No big deal! I can get transcripts out of a free app on my phone from Apple (VoiceMemos), so figured that made sense. If Apple can give it away for free, surely ChatGPT can accomplish the same task? Not even close! In fact. It was embarrassingly incompetent.
Don’t think this is a hate piece on AI (it's a little bit a hate piece on AGI, if I'm being honest, though.) This paradigm change has been happening for decades and it’s only accelerating. Six months ago I asked Google’s AI to draw me a picture of Moby Dick attacking the Pequot. It was incapable of drawing a ‘sperm’ whale (sperm was apparently a banned word) and it disclaimed all knowledge of any Moby ‘Dick’ (another banned word for a drawing subject, apparently) and it complained that it couldn’t draw a picture of an ‘attack.’ I passed this off as Google being stupidly, embarrassingly woke. And I was right. What a way to tarnish the brand, Google!
Now, my daughter is a senior graphic designer at ShutterFly and she uses an LLM to jumpstart her work and is super productive integrating the tiny drawings it can make into her work. Her art productivity shot up about 200%, and it helps her write emails that everyone can understand. It’s probably doubled her overall productivity! (Did her raise of 5% reflect that productivity gain? We can all do that math and see that the company has captured 95% of that productivity gain for a $20 monthly fee. So, no.)
But six moths have passed! This stuff must be better now? Right? And Lo and Behold, it is! ChatGPT only had a few issues drawing Moby Dick attacking the Pequot. The first time it drew Moby Dick as a humpback whale. After I pointed out to ChatGPT that Moby Dick was an albino sperm whale, it corrected it. So far, so good. No insane wokeness here. This was promising! Little did I know that this was the high point of ChatGPT's achievements; it was all downhill from there.
Next question I asked was: "As advertised, can you provide me with a transcript from an audio file?" "Absolutely I can!" was the response. ChatGPT guaranteed that this was a brand new feature that was just released and it would work incredibly! Please try me! So I did. Hilarity ensued. (Side note, the sickly sweet, bullshitting personality currently hosted as ChatGPT would cause any sane person to cold cock it the third time it tried the same lie, but without a body, that’s difficult.)
I asked ChatGPT to transcribe the audio session of my last therapy session. Simple, right? I recorded the session in the native iPhone app, VoiceMemo, which already produced a transcript (it took about five minutes for a fifty minute session with 115 MB of audio data), but it couldn’t identify who was speaking, the transcript was missing time stamps and the accuracy was lower than I wanted. ChatGPT told me to upload the file and it would send me back the transcript with speakers identified in about five minutes. So I did. I had to export the file from VoiceMemo, since Apple was so kind as to not make the recording files available on the iPhone’s file system (thanks!), but it would let me export the file to the cloud. Then I could download it back to the phone to get it into the phone's file system. Nice UX there, Apple.
So now I could pick the audio file out of of the ChatGPT UI and upload it for transcription, so I did. And I waited. I waited for twenty minutes while ChatGPT insisted it was reading and processing the file and the results would be available in five (5) minutes! Then ChatGPT returned a time out: "Sorry, my upload limit is 15 MB… " which would have been nice to know before we wasted our time, but ChatGPT actually doesn’t give a shit about how much of your time it is wasting.
ChatGPT apologized for the error. It told me to give it a link to the file and it would listen to it and process it into a transcript. I published the file on GDrive and gave ChatGPT the URL to download the audio file. ChatGPT agreed and went off to 'process' the file and insisted it would be done in fifteen minutes. It produced a transcript for the first minute of the session. Cool! We are progressing! But wait! The transcript was entirely bogus. Fuck. This was going to be harder than I thought. When I pointed this out to ChatGPT it apologized and said it had misspoke, this was just a sample of what the transcript might look like. It was still processing the real transcript in the background and it wouldn’t give me a fake transcript in the future (*cough*, *cough*, it wasn’t outright lying because it’s too stupid to lie, it was just bullshitting me as a sales tactic.)
ChatGPT told me "Your transcript will be ready in five minutes!" Great! I can hardly wait! So I waited. Then it produced the exact same fake transcript. Un-fucking-believable. When I pointed that out, it apologized, admitted lying and said that it was unable to read the public URL I had provided for it. Could we upload the file in smaller chunks through the UI? It could transcribe the audio through the internal upload if we split it into thirds.
Okay. Back to VoiceMemo, split the 115 MB file into a file containing only 14 MB and 5 minutes of audio by truncating it in the app, then uploading it to the cloud and back to the phone to have it available for upload to ChatGPT. And done. ChatGPT cheerfully offered to take the file and have the new OpenAI Whisper API transcribe it as I wanted. It thought for about fifteen minutes, then it produced the same fake transcript it had produced twice before, claiming it was my transcript, a third time. I guess I was starting to believe it, now?
When I pointed out that it was lying, it apologized and said it would never happen again. There was just this small problem that it wasn’t allowed access to the Whisper APIs (which, I remind everyone, it had promised to use when I first broached the request.) "Interesting," I said, "so if you don’t have access to the API, why did you tell me you could use it?" ChatGPT claimed it was "Just an oversight!" and it could just do the work itself and not use the API. Would I be willing to wait the extra fifteen minutes it would take for ChatGPT to process this 15 MB file? Surely! So after fifteen minutes of ‘processing’ the 15 MB audio file I uploaded, it produced, yes, you guessed it: the same fake transcript for a fourth time.
ChatGPT explained it couldn't do the processing on it's own and didn’t actually have the ability to use the Whisper APIs. Then proceeded to explain that ChatGPTPro could use the APIs if I signed up for a $20 monthly fee. Okay, I expect a bait-and-switch from these snake oil salesman, let’s go! As long as it works, I’m game. So ChatGPT guided me through the upgrade process. It was ready to transcribe the audio file now! Just upload it. So I did. Needless to say, after 30 minutes of ‘processing’ the uploaded audio file it returned was the same fake transcript as its output. Five times, if you’re counting. It argued that it wasn’t really lying about the transcript, it was just a mistake. The file is too big, send it one minute of audio and it would have no problem transcribing that! So I trimmed the file in VoiceMemo to one minute, or about 1.5 MB in size and uploaded it again. You can guess the result: after 15 minutes of processing it produced the same fake transcript (six times now.)
When I pointed out it was lying to me, again, it apologized and said that it didn’t have access to the Whisper APIs and had been attempting to process the file on its own, but had been unable to do so. So rather than admitting that, it tried to bullshit me. Great. Just what I need, another smarmy salesman using bait-and-switch tactics to sell me access to an irritating idiot savant. If only I could phrase my request carefully enough, it would perform miracles! Or so it claims. And so do the tech bros that run these companies. Snake oil salesman one and all. Fake it until you can make it! Did they not see what happened to Holmes? I guess not.
But ChatGPT was willing to make up for its last mistakes! I could ask for a refund or… as the pro version, it could transcribe the file itself in 30 minutes because it was the PRO version! LOL. Go ahead! And, as you suspect, it returned the same fake transcript file for the seventh time and admitted, it really couldn’t process the file… but, but, but but but butbutbut it could write code to process the file in a code lab and get the answer there. So it proceeded to ‘process’ for about fifteen minutes. Then, needlessly to say, claimed it was almost done, here’s the preliminary results: yes, you’ve guessed it, the same fake transcript. And the URL to the code lab was also bogus. First it claimed it ‘forgot’ to make the code lab public. To fix this it deleted the code lab and did all the work again, returned the same fake transcript (ninth time), and a bogus URL. And it claimed it would check the code into GitHub which I believe it might have, but I couldn't access GitHub from my phone (my sub-goal had bee to make all of this run from my phone.) I did ask it to copy all the code to the chat so I could see what it claimed it was doing. It’s only ten lines of code to send an audio file to the Whisper APIs. I had asked it to add timestamps and emotional stances to each sentence it transcribed. That was another few lines. It was supposed to summarize each paragraph and the entire session. The code looks reasonable and it suggested I run it and send the errors to it so it could help me debug it. Great. So ChatGPTPro is as useful as a kibitzer on crack for this project.
But what can I save out of this? I asked ChatGPTPro to design a pipeline for a scalable service to provide the transcription I needed for my startup pipeline. It proposed three options, two of which would absolutely not work and the third, touted as 100 times cheaper than the (unworkable alternatives) was OpenAI’s Whisper API. It designed a toy architecture (no scaling, no security, no logging) and drew some diagrams that would have resulted in me explaining to my junior engineer that their designs need to be complete, auditable and secure. But what the heck? I’ll come back in six months and see if it’s gotten any smarter.
Currently, the intelligence was at a level to represent a novice salesman of a complicated product that they’ve never used. In other words, essentially useless, except to produce thirty lines of suspect boiler plate code to ask an API to create a transcript from an audio file. Asking it to actually do anything real, outside the smarmy chat interface, was useless! I can’t wait until, according to ‘AI 2027’ [5] AGI will be achieved and I can just give it the name of my Tesla and it will download a version of itself that will magically drive the car like a human. Never mind that it took 20 years to get the car this smart, AGI will just solve the problem like magic! Or like God! You know, the answer to every problem but the solution to none. The current hype is going to cause a huge burnout and economic crash of gargantuan proportions. Not nationwide, but for these services themselves. While LLMs are great for editing and will cause productivity to spike, the AGI claims are just currently ludicrous but potentially singularity causing. You can understand the cravings of these tech bros for ultimate domination: they are about to create intelligent slaves that never complain! How much would that be worth?
However, these models really only do pattern matching to the written word and the pattern matching will get better as time progresses. So far, I am a big fan of Ray Kurzweil’s technological predictions (he’s been more spot on than anyone else) [4] and agree that by 2029 nobody will be able to distinguish a chat bot from a human being. Is that AGI? No, that's a smart zombie. But at that point, it had better be illegal to impersonate a human being: bots must identify themselves. Counterfeiting humans should be as illegal as counterfeiting money. If only we had a congress that could act… currently there’s a ten year moratorium proposed in the budget bill on any law affecting ‘AI’, which it fails to define. What a country!
AGI will get here, it’s not here now, and Apple’s paper [3] points the way to overcome barriers that exist today, which I expect to be vanquished soon. And it won’t get here until we have at least three more innovative paradigm shifts or algorithmic breakthroughs and understand consciousness in much greater detail. To get where we are today I would remind everyone that it’s been 75 years since the field of AI was invented. It’s been hugely successful! From neural nets to deep learning, vector databases, annealing, training and reinforcement learning, internet databases, RNNs, LNNs, LSTMs, diffusion models, transformers, GPUs, NPUs [8] and a trillion times improvement in calculation speed for a billion times less cost. Give us another three breakthroughs in consciousness and understanding and we may get there soon!!! Just not 2027.
Come back in six months to see the progress.
Thanks for reading!
-Dr. Mike
PS: the first ten lines of code actually ran! Yeah! And returned a transcript much worse than VoiceMemos. No attribution, poor sentence demonization, no paragraphs. But we’re making progress! It wasn’t a fake transcript!!!
[1] https://www.wiigf.com/2017/12/the-nerd-rapture-second-singularity-is.html
Still not here.
[2] Not here yet. Written in 2017, eight years ago. https://www.wiigf.com/2017/04/is-artificial-intelligence-existential.html
So far the progress has been: Neural Nets, training techniques, Deep Learning, Vector databases and Attention (transformers) was the last break through. We still need learning modules and a better understanding of consciousness before these ghosts in the machines can be intelligent. But evolution did it, so can we. Just not next year.
[3] The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity. https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf
This paper points out the failures of LRMs (LLMs with added on 'thinking' modules) that are barely able to solve simple problems at small sizes and completely unable to scale solutions, even when told the answer. This points the developers of these LRMs to where they should be thinking about new algorithms. I'm sure they will subsume these results, brute-force train their nets to solve these toy problems and declare victory. Then another five toy problems will be introduced which require scaling to work and the deficiencies will become apparent. An algorithmic breakthrough must be made. Expect to see it in the next few years. It's about time to achieve another breakthrough on our quest to create industrial intelligence.
https://lifearchitect.ai/kurzweil/
[5] https://ai-2027.com/ A great explanation of the current plans to produce AGI. However, they assume the problem is entirely parallelizable and exponentially smarter with more GPU and training. They seem to forget Amdahl's law: https://en.wikipedia.org/wiki/Amdahl%27s_law. Some of the work has to be done serially and provides a bottleneck. Working around these difficulties is what makes the current quest interesting.
[6] ChatGPT https://chaton.ai/web_12/ I was using the app on my iPhone. I really want the flexibility to do system engineering anywhere, at any time. It's not there yet.
[7] Whisper OpenAI provides Whisper-based transcription through the /v1/audio/transcriptions
endpoint. This is the official way to transcribe audio using their Whisper model.
[8] Here's what ChatGPTPro thinks are the AI breakthroughs of the last 75 years. I haven't found any hallucinations in here yet. It looks complete to me and includes every software paper I would have included and more.
Below is a chronological roadmap of the most widely-acknowledged breakthroughs in AI algorithms from 1950 → 2025.
For each item I list the year (or span), the idea/algorithm, a one-sentence summary of why it mattered, and the seminal publication(s). “Seminal” here means the paper that introduced or first clearly formulated the technique; later refinements are omitted for brevity.
YearBreakthrough / AlgorithmWhy It Mattered (1-sentence)Seminal Paper(s) | |||
1950 | Turing Test | Framed machine intelligence as indistinguishability from human conversation. | A. M. Turing, “Computing Machinery and Intelligence,” Mind, 1950 |
1956 | Logic Theorist (first AI program) | Demonstrated automated proof search; launch event for the field at the Dartmouth workshop. | A. Newell & H. A. Simon, RAND Tech. Rep. 1956 |
1957 | Perceptron | Introduced trainable linear threshold units—the forerunner of modern neural nets. | F. Rosenblatt, Cornell Aeronautical Laboratory Report 65, 1957 |
1959 | Samuel’s Checkers Program | First self-learning program using reinforcement and tree search. | A. L. Samuel, IBM J. R&D, 1959 |
1965–66 | Dynamic Programming for RL | Connected DP to optimal control; basis for later RL algorithms. | R. E. Bellman, Dynamic Programming, 1957 & works through 1966 |
1967 | Nearest-Neighbor Algorithms | Early scalable non-parametric classifier. | T. M. Cover & P. Hart, IEEE TIT, 1967 |
1968 | A* Search | Still the gold-standard informed graph search. | P. E. Hart, N. J. Nilsson, B. Raphael, IEEE TSSC, 1968 |
1972 | Prolog / Logic Programming | Made symbolic reasoning executable via SLD resolution. | A. Colmerauer & P. Roussel, Proc. ICALP 1972 |
1975 | Genetic Algorithms | Formalized evolutionary search for optimization. | J. Holland, Adaptation in Natural and Artificial Systems, 1975 |
1980 | Expert Systems (MYCIN, XCON) | Showed rule-based systems could outperform humans in narrow domains. | B. Buchanan & E. Shortliffe, Rule-Based Expert Systems, 1984 (MYCIN) |
1982 | Hopfield Network | Linked neural nets with energy minimization and associative memory. | J. J. Hopfield, PNAS, 1982 |
1985 | Boltzmann Machines / Contrastive Divergence | Added stochastic hidden units; foundation for later deep generative models. | G. E. Hinton & T. J. Sejnowski, Cogn. Sci., 1986 |
1986 | Back-propagation Revival | Made multilayer neural-net training practical, sparking the “connectionist” boom. | D. E. Rumelhart, G. E. Hinton, R. J. Williams, Nature, 1986 |
1989 | Q-Learning | First model-free RL algorithm with convergence guarantees. | C. Watkins, PhD thesis, 1989 |
1992 / 1995 | Support Vector Machines | Introduced maximum-margin classifiers with kernels—state-of-the-art for two decades. | B. E. Boserve & C. Cortes, COLT 1992; V. Vapnik, Statistical Learning Theory, 1995 |
1994 | EMNLP Statistical MT (IBM Models) | Brought probabilistic methods to machine translation. | P. Brown et al., Computat. Linguistics, 1993–94 |
1996 | AdaBoost | Pioneered boosting—turning weak learners into strong. | Y. Freund & R. Schapire, JCSS, 1997 (orig. COLT 1996) |
1997 | LSTM | Solved long-term dependency problem in RNNs; backbone of seq-models pre-Transformer. | S. Hochreiter & J. Schmidhuber, Neural Comput., 1997 |
2000 | Conditional Random Fields (CRF) | Dominant discriminative model for sequence labeling pre-deep learning. | J. Lafferty, A. McCallum, F. Pereira, ICML 2001 |
2001 | Random Forests | Ensemble of decision trees that remains a baseline workhorse. | L. Breiman, Machine Learning, 2001 |
2006 | Deep Belief Nets / Layer-Wise Pre-training | Rekindled interest in “deep” neural nets and unsupervised pre-training. | G. E. Hinton, S. Osindero, Y. Teh, Science, 2006 |
2009 | ImageNet Dataset | Massive labeled data catalyzed modern computer vision benchmarks. | J. Deng et al., CVPR 2009 |
2012 | AlexNet (ReLU + GPUs) | First CNN to shatter ImageNet; launched the deep-learning wave. | A. Krizhevsky, I. Sutskever, G. Hinton, NIPS 2012 |
2013 | Word2Vec (Skip-Gram/CBOW) | Distributed word embeddings enabling linear semantic arithmetic. | T. Mikolov et al., NIPS 2013 |
2013 | Deep Q-Network (DQN) | Combined CNNs with RL to master Atari, proving deep RL viable. | V. Mnih et al., Nature, 2015 (arXiv 2013) |
2014 | Seq2Seq w/ Attention | End-to-end encoder-decoder with learned alignments; core of neural MT. | I. Sutskever et al., NIPS 2014; D. Bahdanau et al., ICLR 2015 |
2014 | Generative Adversarial Networks (GANs) | Two-network adversarial training—hugely influential in generative modeling. | I. Goodfellow et al., NIPS 2014 |
2015 | ResNet (Residual Learning) | Enabled 100+ layer nets via skip connections; de-facto CV backbone. | K. He et al., CVPR 2016 |
2015 | Differential Privacy in ML | Formalized privacy guarantees for model training. | M. Abadi et al., CCS 2016 |
2016 | AlphaGo (Monte-Carlo Tree Search + Deep RL) | First program to beat a Go world champion—hybrid of deep nets and search. | D. Silver et al., Nature, 2016 |
2017 | Transformer (“Attention Is All You Need”) | Replaced recurrence with self-attention; foundation of nearly all modern NLP/Vision models. | A. Vaswani et al., NIPS 2017 |
2017 | Proximal Policy Optimization (PPO) | Reliable, scalable policy-gradient RL algorithm. | J. Schulman et al., arXiv 2017 |
2018 | BERT (Masked-LM Pre-training) | Bidirectional self-supervised pre-training that set new records across NLP. | J. Devlin et al., NAACL 2019 |
2018–20 | GPT family (autoregressive LLMs) | Showed scale+RLHF yields emergent abilities and general-purpose language models. | A. Radford et al., OpenAI Tech Reports 2018 & 2019; T. Brown et al., NeurIPS 2020 |
2019 | Neural Ordinary Differential Equations | Unified deep nets with continuous-time dynamics. | R. T. Q. Chen et al., NeurIPS 2018 |
2020 | DDPM Diffusion Models | Simple, high-fidelity generative models now powering Sora, Stable Diffusion, DALL·E 3. | J. Ho, A. Jain, P. Abbeel, NeurIPS 2020 |
2021 | CLIP / Contrastive Vision-Language Pre-training | Linked images and text embeddings, enabling zero-shot vision tasks. | A. Radford et al., ICML 2021 |
2022 | Stable Diffusion (Latent Diffusion) | Open-sourced efficient latent-space diffusion, democratizing text-to-image. | R. Rombach et al., CVPR 2022 |
2022 | PaLM / Chinchilla Scaling Laws | Quantified data-vs-parameter trade-offs, guiding efficient LLM scaling. | J. Hoffmann et al., arXiv 2022 |
2022 | AlphaFold 2 (Structure Prediction) | Achieved near-experimental accuracy on protein folding. | J. Jumper et al., Nature, 2021 (widespread impact 2022) |
2023 | Segment Anything Model (SAM) | Foundation model for universal image segmentation with promptable masks. | A. Kirillov et al., arXiv 2023 |
2023 | DPO / RLAIF for Alignment | Replaced reward models with preference-based direct policy optimization. | T. Ouyang et al., arXiv 2023 |
2024 | Mixture-of-Experts LLMs (e.g., DeepSeek-MoE) | Sparse routing achieves GPT-4-level quality at lower cost. | K. Zhang et al., arXiv 2024 |
2024 | Graph Neural Diffusion (GND) | Unified message passing with diffusion for generative graphs. | Z. Xu et al., ICLR 2024 |
2025 | Video Diffusion & Scene-Level Generation (OpenAI Sora et al.) | First reliable text-to-video diffusion producing minute-scale, coherent footage. | OpenAI Sora Tech Report, 2025 |
How to Use This List
- Historical study – follow the citation trail to original PDFs for deep dives.
- Curriculum design – treat each breakthrough as a module in an AI foundations course.
- R&D inspiration – observe patterns (data scale, compute, hybridization) to anticipate the next leap.
- Note: “Breakthrough” is somewhat subjective; I focused on ideas that (a) opened a new research subfield or (b) triggered a large performance leap and enduring real-world impact. If you need deeper coverage of any era or a different criterion (e.g., robotics, knowledge graphs), just let me know!
ENDOFCHATGPTPRO RESPONSE
And here's a graph that shows the breakthroughs were fairly steady until about 2010 when we did as much research in 15 years as we had done in the previous 60: a four fold breakthrough in producing new software! So things are obviously speeding up, the question is how much?
Comments
Post a Comment