• Login / Register
  • logo of fineducke

    What Comes After ChatGPT? Inside the Rise of AI World Models

    Industry Trends
    What Comes After ChatGPT? Inside the Rise of AI World Models

    We’ve all seen what ChatGPT can do. It writes essays, answers questions, even flirts a little if you let it. But here’s the thing nobody’s telling you — it doesn’t actually understand anything.

    That’s not a bug. It’s the limit of language. And now, some of the world’s top AI scientists say it’s time to move on. Not from AI, but from the kind of AI we’ve been building.

    If you’re wondering what’s next for AI, well, it’s already here. It’s called world models.

    Fei-Fei Li, the Stanford professor known for creating ImageNet, is one of the leading voices behind this next leap after large language models (LLMs). She’s not alone. Meta’s chief AI scientist, Yann LeCun, is also pushing for a future that looks beyond the kind of AI tools we’ve gotten used to, like ChatGPT and Gemini.

    According to both Li and LeCun, language is limiting. That’s a big deal, considering LLMs like ChatGPT run purely on language. They work by identifying statistical patterns in text and predicting what words should come next based on what they’ve seen during training.

    But world models aim higher. They’re not about language. They’re about understanding the world the way humans do, through mental constructs.

    What Does That Mean?

    Fei-Fei Li explained it beautifully during an episode of the a16z podcast. She said language actually plays a very small role in how humans truly perceive the world around them. We think in visuals, experiences, space, intuition — not just words.

    She’s not the only one who believes this. The late MIT professor Jay Wright Forrester once said:

    “The mental images in one’s head about one’s surroundings are models. One’s head does not contain real families, businesses, cities, governments, or countries… A mental image is a model. All decisions are taken on the basis of models.”

    In other words, we all use world models in our own lives, whether we realize it or not.

    So if AI is ever going to reach or exceed human intelligence, it has to start doing that too. It needs to form internal representations of reality, not just spit out text predictions.

    Also read this guide: AI Trading for Beginners in 2026

    Li’s World Labs and Meta’s Vision

    In 2024, Li launched World Labs, an ambitious effort to build AI systems that understand and simulate the real world.

    Their mission is clear:

    “We aim to lift AI models from the 2D plane of pixels to full 3D worlds, both virtual and real, endowing them with spatial intelligence as rich as our own.”

    The world we live in is three-dimensional. So any AI that hopes to function with human-like intelligence has to be able to perceive space, interact with it, reason within it, and even create within it.

    Right now, LLMs like ChatGPT aren’t built for that. They can summarize, answer questions, write stories. But they can’t simulate possible real-life scenarios. That’s because they lack an understanding of context, space, and causality.

    World models, on the other hand, are designed to think more like humans. They can imagine. They can run mental simulations. That opens the door for huge advancements in areas like robotics, creative work, and even military applications.

    Companies like Meta and Anduril are already exploring these possibilities. Think about AI that can help soldiers better perceive a battlefield or anticipate enemy movements. These aren’t just concepts. They’re becoming real.

    Here’s a quick side-by-side look at how LLMs compare to world models:

    LLM vs. World Model

    Feature

    LLM (ChatGPT)

    World Model

    Input

    Text only

    3D spatial, video, real-world data

    Output

    Words & sentences

    Simulations, planning, predictions

    Limitation

    Can’t imagine or reason

    Designed to think like humans

    Good For

    Conversations, content

    Robotics, creativity, decision-making


    But There’s a Catch…

    Building these models isn’t easy.

    Unlike LLMs, which are trained on centuries of written text, world models need rich, complex, spatial data. And right now, there just isn’t enough of it.

    “If I ask you to close your eyes and mentally recreate the room you’re sitting in, that’s not easy,” Li said on the No Priors podcast. “It’s extremely complicated to create trained models without enough high-quality data.”

    That’s why her team is focused not just on building models, but on building the data infrastructure needed to support them. That includes advanced data engineering, acquisition, processing, and synthesis. It’s a massive task, but one that could reshape AI entirely.

    At Meta, LeCun is taking a slightly different approach. His team is using video data to train AI systems, not to recreate every pixel, but to build abstract representations of what’s happening.

    Speaking at the AI Action Summit in Paris, LeCun explained:

    “You don’t predict at the pixel level. You train a system to run an abstract version of the video so it can make predictions within that abstract world. Ideally, that removes all the unnecessary detail and keeps only what matters.”

    In simple terms, it’s like giving AI a brain that doesn’t obsess over every leaf on a tree but instead understands what a tree is, what it’s for, and what might happen next.

    This approach allows the system to map out future possibilities. That’s a core requirement for anything pretending to be “intelligent.”

    So Why Does This Matter?

    Because if AI is ever going to be more than a fancy autocomplete machine, it has to do more than just process text.

    As LeCun put it recently at the National University of Singapore:

    “We need AI systems that can learn new tasks really quickly. They need to understand the physical world, not just text and language, and have some level of common sense, reasoning, planning, persistent memory… all the stuff we expect from intelligent entities.”

    Imagine AI systems that help architects simulate entire buildings, or assist doctors in planning surgeries using 3D reasoning, not just text prompts. That’s the kind of shift world models could bring.

    We’re at the beginning of something bigger than LLMs. We’re stepping into the age of world models, where machines might actually start to understand.

    Newsletter

    Subscribe to our newsletter to stay.

    Author

    I’m Clinton Wamalwa Wanjala, a financial writer and certified financial consultant passionate about empowering the youth with practical financial knowledge. As the founder of Fineducke.com, I provide accessible guidance on personal finance, entrepreneurship, and investment opportunities.

    Finance IQ Test

    Test your financial knowledge in under 2 minutes. It's FREE

    Start Quiz