- For example, I asked GPT-5 to “create a svg with code of an otter using a laptop on a plane” (asking for an .svg file requires the AI to blindly draw an image using basic shapes and math, a very hard challenge). Around 2/3 of the time, GPT-5 decides this is an easy problem, and responds instantly, presumably using its weakest model and lowest reasoning time. I get an image like this:
Not an otter, not a plane, not a laptop, no usage shown, not in this or any other dimension...
- But premium subscribers can directly select the more powerful models, such as the one called (at least for me) GPT-5 Thinking. This removes some of the issues with being at the mercy of GPT-5’s model selector. I found that if I encouraged the model to think hard about the otter, it would spend a good 30 seconds before giving you an images like these the one below - notice the little animations, the steaming coffee cup, and clouds going by outside, none of which I asked for. How to ensure the model puts in the most effort? It is really unclear - GPT-5 just does things for you.
Not an otter, not a plane, not a laptop, no usage shown, not in this or any other dimension.
I've tried to get several different AI's to write Open Office calc code to roll initiative for a D&D game where you enter all players name, their bonus to init and put it in order. We are rolling every round. This seems like it shouldn't be that tough but ChatGPT, Gemini and what ever Microsoft is calling their Bing AI have all fucked it up everytime and when asked to fix it just make the code 10% longer while breaking things in new ways. All the AI's are cocksure of themselves going into it and by the end they all say something along the lines of "well...Open Office is just wierd" Not that AI cant be cool and fun but come on...write a damn initiative program.
One good thing about it is that it seems to be much improved in its sycophancy and in its safety boundaries compared to 4o especially, which is still the default “chat” for most people. So I’m hoping that means AI psychosis is down too before that gets any more outta hand because 4o loooves SCP. OpenAI claims they’ve made significant improvements to their training data processes, which I’m reading as “maybe it wasn’t such a good idea to blindly scrape all text we could find after all”. I do feel like I need some more testing and benchmarks with it to judge how reliable it is. Maybe make my own benchmark. But better instruction following is much welcome.
I have yet to find an application for AI in a live business environment that is useful. For something that’s now a couple percent of us gdp you would think there would be some sort of Meaningful business use but nope. I think crypto was more useful than AI and that still turned out to not be very useful