AI Takes a Leap in Learning: Super Mario Became Its Playground

Super Mario Bros 1985

Move over Pokémon! Forget traditional benchmarks! Researchers at Hao AI Lab, nestled in sunny San Diego, have found that Super Mario Bros. is the new frontier for testing artificial intelligence. Last Friday, they tossed various AI models into the chaos of live Mario games, and the results may surprise you.

Leading the charge was Anthropic’s Claude 3.7, rocking that Mario joystick with finesse, while Claude 3.5 wasn’t far behind. But what’s this? Titans like Google’s Gemini 1.5 Pro and OpenAI’s GPT-4o floundered like they just ate a bad mushroom.

Let’s be clear: this isn’t about the fuzzy nostalgia of 1985. The game was played in an emulator, cunningly wired to a framework called GamingAgent, which, let’s be honest, sounds like something out of a futuristic cyberpunk novel. The AIs were given a crash course in Mario school, complete with directives like “jump left to dodge those pesky Goombas”. Naturally, these genius bots then whipped up their controller inputs in Python code, because why not?

But it gets spicy. The crucial takeaway here is that this crazy little game taught these models not just to play, but to think strategically. One intriguing finding? Reasoning models, despite having a reputation of being the smart ones, struggled to keep up with the fast-paced antics of our favorite plumber. Can you blame them? They needed an eternity to mull over their next move, while Mario’s enemies don’t wait for second-guessing.

For decades, video games have been the testing grounds for AI, but not without controversy. Some experts are pulling their hair out over the idea that AI’s prowess in gaming could translate to real-world applications. After all, games are typically abstract, serving up a buffet of data for AIs to munch on without a care in the world.

With all this talk, it’s clear we’re entering what AI researcher Andrej Karpathy calls an “evaluation crisis”. The guy’s so baffled by today’s metrics that he’s throwing his hands in the air, asking, “WTF are we even measuring?”

But hey, at least we now have a front-row seat to watch AI navigate the treacherous pipes and tricky jumps of Mario. Who said tech research can’t be entertaining? 🍄🎮

AUTHOR: tgc

SOURCE: TechCrunch