The Age of Open Foundation Models
"Software 3.0" For Everyone
Someone asked me a question a few weeks ago in anonymous Q&A: [what is a] "puzzle in the technology world you're thinking about?"
It's a question that has nagged me.
I became deeply interested in modern machine learning since ~2015 (sincere thanks to a now longtime friend and AI pioneer, Andrew Ng). I'm projecting, but I think it's hard to be a technologist today and not be an excited believer. The leaders of the large cloud and internet companies have an insatiable appetite for researchers, and on the weekend, most every engineer is at least lightly keeping up with the newest papers.
And yet...until now there have been few tinkerers, people testing the bounds of what products you can build, of startups and side projects and cool demos. Why? It's been an exciting week (really, decade) in ML and I have a hunch the game is finally changing.
The age of modern machine learning is very young. We can ruthlessly abbreviate it into three broad eras:
Epoch 1: The Unreasonable Effectiveness of Deep Learning
Many of the core ideas in ML, such as stochastic gradient descent and backpropogation, existed in obscurity for decades. AlexNet in 2012 was a watershed moment that exposed the unreasonable effectiveness of deep learning. This landmark paper described an ML model which won a key computer vision benchmark (ImageNet), by a massive margin. Amongst internet giants with core business use cases for ML (say, search relevance, news feed recommendation or voice recognition interfaces), interest had been building. Google Brain and Meta AI, research labs established in 2011 and 2012, fruitfully combined longtime academic researchers in deep learning with their key fuels: cloud computing infrastructure and large datasets.
These tech companies invested in building deeper and larger neural networks, exploring model architectures, and buying innovators such as DeepMind. Vertical application efforts also began to bear fruit as the field increasingly mastered classification tasks where models look at unstructured raw data and attach a class label (e.g. Tesla Autopilot recognizing a road bike in the next lane).
Epoch 2: Giant Transformers Eat Machine Learning
In 2017, a team at Google led by Ashish Vaswani, Noam Shazeer and Niki Parmar published "Attention is All You Need," detailing the intuitions behind Transformers models. This general-purpose model, whose performance seems to improve with training data (tokens), model scale (parameters), and compute without limit, has since achieved breakthrough results across many downstream tasks.
The majority of the machine learning research field has coalesced around the idea that there will be pre-trained, transformer-shaped "foundation models" that will generalize well to many tasks either out of the box or tuned with dramatically fewer/more efficient training samples. These general models began to outperform more specific models for object detection, image classification and speech recognition.
The engineering vanguard slowly congregated at the altar of "Software 2.0," elevating these algorithms from "another tool in the toolbox" to a revolution in programming. Large labs with owned compute began racing toward this future with "universal giant model" efforts that used much of the known internet as an unlabeled training dataset. A hope that until a few years ago was only whispered, became more common — Transformers, with enough compute and enough data, could be a path to AGI.
Epoch 3: Generation, not just Classification
In parallel with this rise of Transformers, researchers began to look at generation tasks ("create an example of a road bike"). Prediction and generation are core parts of intelligence, and these tasks are a significant expansion of the AI field's scope: for example language generation, code generation, voice generation, image generation, and even math solving.
Starting with the 2018 release of GPT, OpenAI inserted themselves into this fight as a legitimate independent. Their subsequent releases of papers, playgrounds and APIs for generating language (GPT2/GPT3), code (CODEX) and images (DALL-E/DALLE-2) splashed through engineering spheres and then the court of public opinion. Technologists began to recognize that AI was potentially likely to impact a much broader swath of work traditionally considered to be higher-level and creative.
But despite a backdrop of consulting firms declaring there would be a $1T business value transition and these unbelievable research advances, these impactful generative applications have been limited mostly to large internet companies and a few well-funded independent labs, largely but not all, descendent from Google. The vast majority of companies struggle to leverage open source libraries for even basic classification such as xgboost or pytorch. Given these new transformative (hah hah), generative superpowers, there has been a curious dearth of AI tinkerers and AI-native startups in the third epoch. Why?
AI Inhibitors (to Date):
- Insufficient Model Performance. State-of-the-art models are getting much, much better, and continually clearing thresholds of usefulness.
- Low Accessibility. The best performing models have been to date owned by closed labs, which often beg AI ethics and safety concerns. They have to date generally offered only APIs that are neither state of the art nor sufficiently flexible. Training models from scratch requires huge datasets and GPU/AI accelerator compute, both of which are historically expensive and out of the reach of many would-be tinkerers and entrepreneurs. For a period of time, experimenting with models required actually buying(!), 90s-style, specialized servers that sat under your desk. The major cloud players have since built more on-demand capacity, tooling has massively improved, and costs are decreasing.
- New Product Challenges. No bank of best practices for handling the eccentricities of ML apps (for example, gracefully managing the confusion matrix) exists today, just as there wasn't a set of known paths to building great consumer mobile apps. It will take creative people new cleverness to build successfully around ML, and the overlap between great product people and those who understand ML is today small
- Talent Capture. Any field as young as deep learning means there is a thin bench of practitioners. The growth in supply of ML talent cannot keep up with demand, and ML has been so good (so profitable) for a few companies they've completely drained the ecosystem. In addition, because these internet companies are large enough to be an ecosystem in and of themselves, the people in them have less natural exposure to outside business problems. I am repeatedly introduced to brilliant researchers who are exercising entirely new muscles when thinking of interesting "downstream tasks" outside of ads, spam & abuse, recommender systems, or benchmarks. Finally, many product leaders assume it will take a large team to ship ML products in addition to the data scientists (data engineering, full-stack, MLOps). However, AI now dominates concentrations for college computer science programs and tools are improving.
- The ML Status Game. Finally, every human is somewhat affected by their local status game, and the research-centric modern machine learning ecosystem rewards citations and benchmarks over user or business value. Less cynically, now that there is a possible path to AGI, that is the one and only goal worth pursuing. Working on products that do not serve to produce training data for that aim is only a distraction.
Epoch 3 and the Glorious Future: Towards a More Open AI?
Over the past few years, the ecosystem has slowly become more open, with community-driven Eleuther leading the charge on training data with PILE, and other research labs offering more pre-trained models with weights, e.g. Google's BERT, T5 and HuggingFace Transformers natural language models.
Earlier this year, the world took a collective gasp at OpenAI's DALLE2, Google Imagen and Midjourney image generators, to the tune of 1M people participating in Midjourney's discord community. Over the past few weeks, Stable Diffusion, an impressive image generation model in the same vein, designed and efficiently trained by new startup Stability AI and its collaborators, has taken the internet by storm. It is small enough to run locally on a laptop with a decent GPU, and it is fully open-source. Excitingly, Stability has committed to open models for more modalities.
Until now, the ability to leverage these foundation models was largely limited to a small handful of compute, data, and researcher-rich organizations. The chessboard is changing, and we should expect many more companies to harness intelligence in their products. A couple of initial predictions (really, startup requests!):
- We will see the social and content landscapes completely reformed: why shouldn't we play video games with characters of our imagining, go through our day with personalized soundtracks, read books designed to titillate us, and have imagery automatically accompany every story we tell our friends? Our creative tools will use models to both bootstrap and "scale" assets.
- We will have ambient assistants for every function: selling to customers, writing PRDs, writing code, running a Shopify operation, helping you grasp a new subject, and coaching you to avoid sticking your foot in your mouth when you speak with your boss. We will progress from "autocomplete" and utilities to remote digital employees.
- New markets for software will open up if the cost of building software decreases by an order of magnitude. Is the future "no-code" or is it generated code?
- Companies will be formed that become factories for harvesting the new knowledge these models create. If we extrapolate from solving protein folding to having reliable, virtual models of human cells, how does that reshape what a biotech startup looks like?
Community is a powerful force, and the engineering and research communities clearly want cutting-edge AI to be accessible. While AGI remains the goal of a privileged few organizations, in the meantime, let a million intelligent applications bloom. It's about to become a free-for-all in terms of harnessing AI.