THE ONE THING YOU NEED TO KNOW ABOUT DATA THIS WEEK… is why Lina Khan’s FTC is investigating Big Tech and why a little French startup, Mistral, is being hailed as an AI champion. The reason: AI is an oligopoly.
Definition, please? Oligopoly is “a state of limited competition, in which a market is shared by a small number of producers or sellers.”
Let’s start with the FTC inquiry.
A few weeks ago, the FTC ordered five companies--Alphabet, Amazon, Anthropic, Microsoft, and OpenAI--to “provide information regarding generative AI.”
The FTC’s mission is to “promote competition.” This implies they suspect something anti-competitive is happening with generative AI.
WE TOLD YOU SO
The FTC is taking action now, but this phenomenon was noted back in 2022 by a scrappy band of researchers, led by Abeba Birhane of the Mozilla Institute, whose paper “The Values Encoded in Machine Learning Research” noted the takeover of AI research by Big Tech.
Almost half of all research papers on the topics of machine learning and AI, they said, are written or funded by Big Tech. The proportion tripled from 13% in 2009 to 47% in 2019.
What was special about 2009-2019? Alan Turing was talking about “intelligent machines” as far back as 1948. Why did Big Tech come in so fast and hard in this period?
The short answer: ImageNet.
MUD TURTLES AND HOCKEY PUCKS
If you’re paying attention at bookstores lately… or you’re a proud grad of Princeton or CalTech… you’ve heard of “The Worlds I See,” the memoir of the 47-year-old AI legend Fei-Fei Li.
Li’s tale is an underdog story. Immigrant family. Parents needing care. A world convinced AI was a sideshow.
Li believed that the way to make AI more widely adopted was to do everybody a favor and gather the data together to train models. Then everyone would get it.
Around 2008 she put together a database called ImageNet of images in thousands of categories. She recruited grad students to laboriously tag images of, like, dogs.
No one cared.
So in 2010 she launched a contest. The ImageNet Large Scale Visual Recognition Challenge. Finally people paid attention. ImageNet grew. Today the database contains 14 million images in 30,000 categories ranging from “mud turtle” to “hockey puck.” Teams compete to see how accurate their model is at identifying images.
In 2012, a competing team blew the doors off. You might recognize the names of the winners.
Geoffrey Hinton, former Googler and current AI fear-monger (“If you want hyper-intelligent robots to be good at killing, you don’t want to micromanage them”).
And Ilya Sutskever, co-founder of OpenAI. (“The board no longer has confidence in Sam Altman’s ability to continue leading OpenAI.”)
According to researchers Nur Ahmed (MIT) and Muntasir Wahed (Virginia Tech), Hinton and Sustkever rocked everybody’s idea of AI performance by introducing two tools that were new to the big stage.
Deep learning, and GPUs.
These new tactics were so effective that Big Tech got FOMO. And the number of machine learning papers written by Big Tech employees started climbing.
PRICEY PARAMETERS
But paying clever researchers to publish is not the hard part. The hard part is buying technology for them to futz with.
To understand just how expensive this is, we need to understand the economics of a large language model, or LLM.
As we know, LLMs are designed to receive a prompt, from you or me, and provide an answer. That operation--a LLM fulfilling a single task--is commonly referred to as “inference.” As in, you gave me a prompt and I will infer from it how I should respond.
But to complete that task--that prompt, or inference--the LLM needs to be trained. First and foremost, to speak your language.
To train an LLM you need all the text you can find. LLM programmers turn all that text into “tokens,” or numbers, almost like a simple A = 1 word code. Only more complex. The tokens’ value reflects not only the word, but other stuff too. Like the position of the word in the sentence.
For instance, GPT might turn the word “sailboat” into 38865, but if it’s “sailboats,” it’s 95458, and if it’s the first word in a sentence, it’s 50, 607, 38865.
Once the words are changed to tokens, the model can do calculations.
Another key concept in a LLM are parameters. These are the weights that reflect how language fits together. How likely a word is to connect with another, and in what context. Which is crazy, when you think about it. It’s teaching a computer model to understand all of language by understanding it statistically. The weights for these predictions are parameters.
And if learning English through a whole bunch of statistical predictions sounds insanely complex, it is. GPT3 has 175 billion parameters.
Okay, so the model has massive amounts of text (now in token form)... and it has weights for how text is likely to work together (parameters)... now it has to develop rules for reading and writing. Given prompt x, what’s my inferred response, y.
Developing a rule becomes a function of how many tokens are in the chunk of text you are developing a rule around… and the number of parameters required to interpret it.
So if you have all the text you can possibly access in token form… times 175 billion weights… and each rule you develop requires a calculation… then you’re using deep learning (thanks to Hinton and Sutskever). And deep learning is iterative, requiring multiple--call it six--computational roundtrips to develop each rule… which adds up to…
I think you’re getting the point. It’s a lot of calculations. And that many calculations is extremely expensive in terms of data storage and compute. According to Andreessen Horowitz, the cost of a single iteration of a LLM model--as a word lover, I like to think of it as a “draft”--the cost of a single draft of GPT costs around $5 million.
And of course you don’t publish your first draft. You do many drafts. The WSJ estimates the cost of training a GPT model at much more than $100 million. Which is obviously a vast amount of money.
Or maybe it’s not, if it’s the cost of a technical miracle like teaching a machine to speak like a human.
THE FIX IS IN
Remember we’re talking about the cost of training a model. The upfront commitment. The expensive part. Kind of like going to college. And what happens after college, if all goes well?
You start making money.
The cost of answering a single prompt--the inference function of a LLM--is a tiny fraction of a penny. According to that same Andreesen Horowitz blog, it’s about $0.0008.
This, my friends, is a classic fixed cost business model.
For those of you who didn’t have the pleasure of going to graduate business school, and for whom this snippet of jargon is not on the tip of your tongue, I will illustrate with a professional flashback…
… to my days at Nielsen. The Nielsen Company, or the Nielsen Ratings, was long criticized for holding a monopoly over the TV ratings business. When I was there, execs were trained--literally trained--never, ever to use the word “monopoly.” Do not use the word monopoly, or the word dominant, as in ‘our monopoly position,’ or ‘our dominant market position.’
What were supposed to say then?
Say, ‘our unique market position.’ Or better, ‘our special market position.’
I am totally not kidding, these conversations actually happened.
And I remember sitting at the boardroom table with the CFO of a division of Nielsen. He was making a point. He did some calculations in the margins of our annual strategic plan document with a ballpoint pen.
“Let’s look at the marginal profits,” he’d say. Deftly, he demonstrated that after we re-couped our costs, the profit on each additional dollar of revenue was like, 60%. Even higher.
He got this gleam in his eye, the one CFOs get when they gaze down at numbers that please them. “It’s really a fixed cost business,” he said. He might as well have been saying butterscotch sundae or trip to Hawaii. He glowed with greed, pleasure, satisfaction.
Because that’s the beauty of a fixed cost model. You pay a fortune to do something incredibly hard. (Launch a massive TV panel. Teach machines to talk.) You pay so much, in fact, that almost nobody can afford to get into the business.
But once you do, the marginal profits are insane. Think about your use of ChatGPT. If you use it 10x per week, for a year, that’s 520 prompts. If the numbers above are correct, you’re paying OpenAI $240 a year for your monthly subscription…
…and providing you with answers to those prompts costs them 42 cents.
Training is expensive. Inference is cheap.
So who can afford to get into the LLM business? Companies who can afford to train models. Who can afford to train models?
Big Tech… the companies sitting on top of mountains of cash from their other incredibly scalable fixed costs businesses: tech-driven internet ads.
BILLION EURO HERO
If all this is waking your inner cynic, I can’t blame you. We, the little people, the citizens, the proles--we need a hero to save us.
Maybe we’ll settle for a mensch.
Definition? “A mensch is someone of noble character. The key to being ‘a real mensch’ is nothing less than a sense of what is right, responsible, decorous.” (Leo Rosten, The Joys of Yiddish, 1968.)
Or maybe, le Mensch: Arthur Mensch, 30 years old, of Paris, France, founder of Mistral. On February 27, 2024, Mistral… in an A round, if you can believe it, and in only its ninth month of business… raised $500 million, at a $2 billion valuation.
It did so based on the claim that it can develop LLMs for $22 million a pop (vs $100 million). Their goal is to make much of their code open source. A disruptor in the oligopoly game.
What would it mean if Mistral is successful? Or for that matter, if the FTC learns something useful in its generative AI inquiry?
Alternatively, what is they both fail, and the LLM oligopoly continues?
WHAT IT MEANS
I’ll take those questions out of order.
You can borrow someone else’s model. Oligopolies can do you good. That low cost of a single inference is fantastic value for the average business. You set up APIs and run calls against the Big Tech LLM. It’s like using electricity. Who cares if you don’t own it? Or if the owner has a monopoly? You’re getting this incredible resource for pennies per unit.
Competition is healthy. If Mistral, and companies like it, succeed, they will create competition. They will force Big Tech to be cheaper; offer better and more varied services; be more convenient. This will only make “borrowing someone else’s model” (above) a better and better proposition. (Note: based on my own experience at a monopoly, market power is the real real. Someone close to our business said at the time, The road to defeat Nielsen is littered with corpses.)
Transparency is even healthier. But the real issue with oligopolies in general, and Big Tech in particular, is trying to force them to be ethical. And to do that, arguably, they must be transparent. So far everyone has failed to make Big Tech transparent. I believe that this is the case because a) so much corporate activity takes place in little rooms far away from the eyes of the Investor Relations people writing up the 10Ks. Look at the trouble Google is in with the DOJ based on anticompetitive ad practices. It’s about real-time-bidding and auction manipulation. Nerd stuff that normal people don’t know about because they don’t have to. Which leads us to the second cause of poor transparency: b) regulators generally don’t know jack about the technology and data they attempt to oversee. (“Senator, we sell ads.”) In internet-era tech, this has already had potentially tragic consequences for civilization. Could we have foreseen that social media algos would make societies around the world hateful and divided? Or that the leaders of these companies would have such low character they wouldn’t fix it? Perhaps in the AI-tech-era, we are on the precipice of similar unintended consequences. Maybe at existential scale.
Transparency is for you. It is up to all of you--aided by your friends at the The DataStory--to remember that we, all of us, are smart enough to understand what happens behind the scenes. Emboldened by this confidence, we should be able to ask all the questions we need answered… to understand how these critical technologies and data are developing. Has FTC scrutiny, good tech reporting, and Senate shaming, helped us to date? Maybe barely; maybe a lot. Clearly we have not found the magic formula.
But it doesn’t hurt, as a first principle, to be a mensch.
Footnote: Speaking of right, responsible and decorous, I’d like to thank my son, Nicholas Evans, Computer Science and Linguistics soon-to-be-graduate of Trinity College Dublin, for introducing me to the important paper “The Values Encoded in Machine Learning Research” referenced in this post.