Taught machines to read minds and referee a game!

By blending artificial intelligence with both playful and competitive gaming, I’ve found that AI doesn’t just join the game, it changes the rules. From developing an AI foosball referee with a personality to a poker expert that can spot a lie, these projects explore the fun and challenging frontier of computer vision.

Picture this: a system that can detect the ball and players, track every move, and display the stats - prepare to be shamed for your 52 shots off target! And just when you think it couldn’t get any better - or worse - there’s even a cheeky live commentator powered by AI, ready to roast your every miss and celebrate your every win.

Disclaimer: I can’t guarantee your friendships will survive the banter!

But the fun doesn’t stop here. What if AI could also sit at a poker table, analyzing players’ behaviour to sniff out a bluff before anyone else can? Suddenly, the game shifts from entertainment to strategy - it’s about nerves, skill, and outsmarting not just the opponents, but the smartest technology in the room.


Foosball chaos: fast shots, stats, and sassiness

Why Foosball?

Foosball is an obvious choice if you need a testbed for fast, fun, and interactive AI. The goal of this project was to make something that is quite entertaining by itself - foosball - an even more immersive experience. And the speed, unpredictability, and fast-paced action - the ball can reach velocities of up to 15 m/s - create the perfect test environment for computer vision. Tracking a tiny ball moving across a confined space with human players frantically spinning rods and blocking shots. If the algorithms can handle this chaos, they can handle almost any sport tracking challenge.

Setup

The system utilizes a vertically mounted camera to capture and analyze game data such as goals, ball possession, shot accuracy, and ball speed. It provides instant insights by tracking the ball and players, displaying various statistics on a dashboard for participants and spectators. Meanwhile, an LLM generates commentary, a text-to-speech model verbalizes the comments, and all goals are captured to be shown after the game is over - the highlights!

Tech stack

Technology wise, we developed a computer vision algorithm to detect the ball and the players based on the colours, and… that’s the easy part! Then, some more complex heuristics come into play to compute all the statistics you can see in the dashboard above. For the commentary, we used Gemini 2.5 Flash - not the smartest model - but quite fast in generating the commentary in text format. To give voice to these comments, we trained a text-to-speech model from ElevenLabs, so that we guarantee that the commentator has the emotion in their voice expected from someone narrating high-tension foosball games!

Challenges - the final boss: Latency!

Imagine you’re spinning your players, the ball’s flying, and the crowd - okay, your friends - are on the edge of their seats - except our AI referee is still catching up, huffing and puffing like it just sprinted across the pitch.

Here’s the play-by-play: Everything runs in the cloud, which is not so great for instant reactions. First, the camera grabs the action and sends the footage up to Google Cloud. But before it gets there, the video has to squeeze through a few tight corners - converted to RTSP, streamed to a video broker (shout out to mediamtx!), and only then does our AI get to work its magic inside a Kubernetes Cluster.

Now, our computer vision is fast, but it’s not lightning fast - so we cap the action at 15 frames per second to keep things smooth. And just when you think the AI commentator is ready to unleash a banger commentary, it pauses for a quick chat with Gemini and ElevenLabs, but it is worth the wait!

Bottom line: We’ve thrown every trick in the playbook at it to keep the action as real-time as possible. Is it perfect? Not quite. Is it fun? Absolutely.


Poker face: AI and the art of the bluff

Why Poker?

As someone fascinated by human behavior, the poker use case is a very unique one to study. Professional players are masters of misdirection: their full-time job is to hide their tells, mask their intentions, and keep opponents guessing. Trying to spot patterns in a room full of people whose profession is hiding patterns is no easy task.

Confession time: I’m one of those people who can watch hours of poker tournaments - seriously, since I was a kid! My mom used to freak out about it… but all of this without rarely playing a hand myself! I guess you could say I’m a professional spectator.

So when the opportunity came to blend my love for computer vision with my fascination with the poker table, I was all in - no hesitation. Honestly, I was even more hyped for this project than I ever was for the foosball one - but let’s keep that little secret between us, okay?

Setup

What if I told you that… we don’t have a setup? That’s right - No fancy cameras, no custom lighting, not even a say in where the cameras point. Our laboratory was the wild west of poker tournament footage available online, where we had exactly zero control over production, camera angles, lighting, or zoom. Nothing. Nada. Rien. Niente.

But hey, who doesn’t love a good challenge where the environment is totally (not) in your hands? Nervous sweating. If you can build something that works here, you know it’s ready for just about anything.

Tech stack

We didn’t just dive in blind - this project is built on quite a lot of psychological research to figure out what gives away a player’s hand. After a deep dive into the science of deception, we decided to start by analyzing two key signals: gaze and decision time.

Where is the player looking? How often are they blinking? Are they fixating on the chips, the cards, or maybe sneaking glances at their opponents? And what about the speed - are they slamming down decisions, or taking their sweet time to think?

We played around with our sidekick: MediaPipe. Using its face detection capabilities and face mesh magic, we could map out every twitch and blink in real-time with our in-house heuristics. Adding to it a little projection wizardry, we even estimated the pose of the eyes - so we could track exactly where the player was looking, frame by frame.

Unmasking the bluff

As you can see, we can successfully track a ton of features that map beautifully into a dataset. With this data, we can build detailed player profiles under specific conditions - bluffing, value betting, holding a monster hand, you name it. By analyzing these trends, we unlock insights into player behavior, train a model to classify if the player is bluffing… and really help anyone looking to up their poker game (or just geek out, like me).

Challenges

I already complained about having zero control over the environment - and trust me, that’s 100% the biggest challenge here. Sometimes the camera frames a whole crowd when we just want one player. Sometimes the camera operator gets creative, and our target vanishes from view. And then there are those moments when we’re blessed with the weirdest angles, with no access to any camera settings whatsoever. It’s chaos, pure and simple.

So, for now, we had to manually pick the moments and hands we wanted to analyze. Not ideal, but hey, you play the cards you’re dealt!

Besides that, we also have to address some ethical considerations. Running this live on a poker tournament? That’s a hard no - definitely against all possible rules and fair play standards. Even running this offline raises some important privacy questions. We’re committed to respecting player privacy and ensuring our research stays on the right side of ethics.

The good news? In the next blog post on this topic, I’ll have some exciting updates on how we’re automating and optimizing this whole process. Stay tuned… things are about to get a lot smoother… and with a lot of new features! Don’t tell anyone, but I’ll spoil a few for you: overall posture, hand pose, and macro-expressions.

Besides a new obsession on blinking eyes, what else did I learn?

Both projects pushed me to rethink what’s possible with computer vision. With foosball, controlling the environment was helpful, but the speed of the game and the necessity of running the models in real-time were a true test of efficiency.

Poker was a different beast - no control over the setup, but a huge focus on subtle details like eye movements and decision timing. It taught me to adapt quickly and get creative with whatever footage I had.

In the end, blending tech with human behavior is just as challenging as it is fun. And yes, now I can’t help but notice every single blink!

What’s next? Level up!

Are we about to dive headfirst into another sport or game? Maybe! I’m always open to wild suggestions - so if you’ve got a favorite, let me know.

Comparing Foosball with the Poker side, the possibilities in poker are endless. There’s so much more to explore: automating the entire pipeline, detecting even more features, or maybe even keeping a closer eye on the cards themselves to advise a player on what they should do. Does this mean we are on the verge of cracking the poker game? Who knows - but it’s going to be a fun ride finding out!

Stay tuned - because the next level is just getting started.