Google's new Gemini model - on moats and the future of AI

Is this the end for OpenAI?

Before we get started, AI Academy’s January edition of the Master in Prompt Engineering is almost sold out. If you’re considering joining you should do it now (click here to read more).

Google yesterday released Gemini, their new AI model. And it’s the most powerful AI technology we’ve ever seen, finally de-throning GPT4.

In this newsletter, I want to explain a bit what it is, what it can do, and share some reflections on the future of AI and competition.

What is Gemini

First of all, Gemini is not a single model, but a family of models:

  • Gemini Nano, a small model for on-device computation (this is going to run on Google’s Pixel phones)

  • Gemini Ultra, a super powerful giant LLM made for the most complex tasks

  • Gemini Pro, somewhere in between Nano and Ultra, will power most of Google’s experiences (search, Bard, etc.). Think of this as a model that is a bit less smart than Ultra but more efficient so it can work well for most cases without being slow and costing Google a fortune.

The most important attribute of the Gemini models is that they’re developed for multimodality from the ground up. “Multimodality” means being able to process and output not just text, but also images, audio, and video.

Basically, Gemini is not an improvement to Google’s old models - Google went back to the drawing board and re-thought completely how to use all these heterogeneous datasets (this will be important in the analysis piece later).

What can Gemini do?

Google mostly demoed the Ultra model, so let’s talk about that.

First of all, it’s “smarter” than GPT-4. How do we define “intelligence” though? Today AI companies use a benchmark called MMLU, it’s basically a giant list of questions on topics ranging from politics to law, biology, math, anything you can think of. Researchers ask these questions to AI models and they measure their accuracy.

GPT4 has a MMLU accuracy of 86.4%. Gemini Ultra 90.0%.

But the most exciting thing is the new capabilities unlocked by this super-advanced multimodality. There are quite a lot of videos in the Google blogpost, so here I’ll report the ones that were the most surprising to me.

In the GIF above Gemini is shown two yarns of different colors and it’s asked (via voice) to generate some ideas on what to do with it. Notice how Gemini can generate images of realistic, fun ideas that you could actually knit using the yarn in the picture. This is a seamless experience going from images + voice → text + image, showing also pretty good understanding of context and creativity while staying grounded to the input data.

In this other GIF, you can see Gemini “reasoning” about which car would be the faster, starting from two car sketches on post-its. Gemini correctly answers that the one to the right would be more aerodynamic so that’d be the faster one. This is pretty mind-blowing because it seems to prove it has some real “understanding” of the two items in the image, and it can link these two to an internal model of physics.

The last example: the user hides a ball of paper under a cup and mixes it with the other two. Gemini correctly identifies where the ball is. This was the most shocking example to me because it shows an important element that was not present before in other AI models: time. To be able to complete this task, understanding the content of an image isn’t enough. Gemini had to “watch” each frame of the video and link what it knew about a frame to the next one and extrapolate some information based on this sequence. Wild.

On moats and the future of AI

A company's moat refers to its ability to maintain the competitive advantages that are expected to help it fend off competition and maintain profitability in the future. A moat can be anything hard to replicate for competitors: data, network effects, tech, etc.

Since GPT4 was introduced and it was the most powerful model out there for a while, people started wondering whether anyone could catch up with OpenAI and whether their technology was big enough as a moat to crown them as the winners in the AI race.

Then there was a piece of news that not many people cared about: OpenAI started a data partnership initiative to collect more high-quality data from partners.

Did that mean they ran out of data to build GPT-5?

Potentially, but now Gemini is showing that Google may never have that problem. Gemini is a testament to the value of the wildly heterogeneous amount of data Google has collected over the years and on its value as a moat. Think about video data from Youtube, audio data recorded through Android, and all the partnerships for Google Books, news, etc. etc. If “more data” is the solution to more powerful AI models, Google has just flexed its muscles and showed everyone who would win in that race. (By the way, I’m not sure that “more data” is the answer, but it’s paying off for now so let’s roll with that assumption).

What does that mean for us? I want to take my point of view as an entrepreneur and reflect on how that can impact you too.

As you know, I’m building a generative AI product that’s powered by GPT models for now. As soon as I get access to Gemini, I could run a test and check whether my product's performance improves (this would take between a few hours and a few days, depending on the complexity of my product).

Let’s assume now that Gemini does improve the performance of my system. What do I need to do to completely cut off ties with OpenAI and power my entire company with Google’s technology? I’d probably have to change one single line of code.

Switching from one model to another is incredibly easy. This means that:

  1. For tech companies there will probably be one true winner for “general” tasks, and others may have to specialize in niche applications

  2. For entrepreneurs, this is great news as competition will drive prices down

  3. For consumers this is even better news: if a new, better AI model is developed and adopted by other companies, literally every AI-powered experience you have improves

So that’s it, we have a new king in the AI race. Just a regular Thursday in this crazy world of AI.

If you’ve read this far you must be really keen to take part in this crazy AI revolution. I think you’d love the Master in Prompt Engineering, I hope I’ll see you in class in the new year.