No Priors🎙Episode 103: Emad Mostaque, Founder/CEO Stability AI (TRANSCRIPT)

February 16, 2023

🎙 No Priors, Episode 103: Emad Mostaque (TRANSCRIPT)

EPISODE TITLE: How can we make sure that everyone has access to AI? Can small models outperform large models? With Stability AI’s Emad Mostaque

EPISODE DESCRIPTION: AI-generated images have been everywhere over the past year, but one company has fueled an explosive developer ecosystem around large image models: Stability AI. Stability builds open AI tools with a mission to improve humanity. Stability AI is most known for Stable Diffusion, the AI model where a user puts in a natural language prompt and the AI generates images. But they're also engaged in progressing models in natural language, voice, video, and biology.

This week on the podcast, Emad Mostaque joins Sarah Guo and Elad Gil to talk about how this barely one-year-old, London-based company has changed the AI landscape, scaling laws, progress in different modalities, frameworks for AI safety and why the future of AI is open.

Full Transcript Below (not edited):

SARAH:

AI-generated images have been everywhere over the past year, but one company has fueled an explosive developer ecosystem around large image models and that's Stability AI. This week on the podcast, we'll talk to Emad Mostaque, the founder and CEO. Stability builds open AI tools. They're most known for Stable Diffusion, the unreasonably effective AI model where a user puts in a natural language prompt and the AI generates images.

But they're also engaged in progressing models in natural language, voice, video, and biology. We'll talk about how this barely one-year-old, London-based company has changed the AI landscape, scaling laws, progress in different modalities, safety and why he thinks the future of AI is open. Emad, welcome to No Priors.

EMAD:

Thank you for having me on, Sarah, Elad.

SARAH:

Let's start with the personal story. You have a background in computer science and you were working in the hedge fund world.

That's a hard left turn or it looks like it, from that world to being a driving force in the AI State of the Art. How did you end up working in this field?

EMAD:

Yeah. I've always been interested in AI and technology. On the hedge fund, I was one of the largest investors in video games and artificial intelligence. But then my real interest came when my son was diagnosed with autism and I was told there was no cure or treatment. I was like, "Well, let's try and see what we can do." I built up a team and did AI-based literature review. This was about 12 years ago, of the existing treatments and papers to try and figure out commonalities.

Then did some biomolecular pathway analysis of neurotransmitters for drug repurposing, and came down to a few different things that could be causing it. Worked with doctors to treat him and he went to mainstream school, and that was fantastic. Went back to running a hedge fund, won some awards. Then I was like, "Let's try and make the world better." The first one was a non-AI enhanced education tablets for refugees and others, and that's Imagine Worldwide, my co-founder's charity.

Then in 2020, COVID came and I saw something like autism, a multisystemic condition, the existing mechanisms that extrapolated the future from the past, wouldn't be able to keep up with it. And thought, "Could we use AI to make this understandable?" I set up an AI initiative with the World Bank, UNESCO and others to try and understand what caused COVID, and try and make that available to everyone.

Then I hit the institutional wall in a variety of places, and realized that the models and technologies that had evolved were far beyond anything that happened before. There were some interesting arbitrage opportunities from a business perspective. More than that, a bit of a moral imperative to make this technology available to everyone, because we are now going to very narrow superhuman performance and everyone should have access to that.

SARAH:

It's an amazing journey and congratulations on all the impact you've already had. As you say or as you imply, the AI field in recent years has been increasingly driven by labs and private companies. One of the most obvious paths to performance progress is to just make models bigger, scaling data parameters, GPUs, which is very expensive.

Then in reaction, just to set the stage a little bit, there's been some efforts over the previous years to be more community-driven and open, and build alternatives like Eleuther. How did you start engaging in that, and how did Stability change the game here?

EMAD:

Yeah. When I was doing the COVID work, we tried to get access to various models. In some cases, the companies blew up. Other cases, we weren't given access despite it being a high profile project. I started supporting EleutherAI as part of the second wave. Stella and Connor and others led it on the language model side.

But really one of my main interests was the image model side. I have aphantasia, so I can't visualize anything in my brain, which is more common than people would think. In fact, a lot of the developers in this space have that. We've got nothing in our brain.

SARAH:

You just see words? What's in there?

EMAD:

Just feelings.

SARAH:

Okay.

EMAD:

Again, I thought it was a metaphor, imagine yourself on a beach. I was like, "Okay, I feel a beach." No, apparently you guys have pictures in your heads, it must be like just disconcerting. But then with the arrival of CLIP released by OpenAI a couple of years ago, you could suddenly take generative models and guide them to text prompts. It was VQGAN, which is the slightly mushy, more abstract version first. But I built a model for my daughter while I was recovering ironically from COVID.

Then she took the output and sold it as an NFT for $3,500 and donated it to India COVID Relief. I was like, "Wow, that's crazy." I started supporting the whole space at Eleuther and beyond, giving jobs to the developers, compute for the model creators. Funding the various notebooks from Disco Diffusion to these other things. Giving grants to people like Midjourney that were kicking this off.

SARAH:

Just personally?

EMAD:

Just personally. They were doing all the hard work and I was like, "Can I catalyze this because this is good for society?" Then about 15 months ago, I was like, "While these communities are growing, it'd be great if we could create this as a common good." Originally, I thought you got communities, you got to make them coordinated, could a dial work or a dial of dials? That's how Stability started. After about a week, I realized that was not going to work and it was incredibly difficult.

Then I figured out commercial, open-source software could be the way to create a line technology, not just an image but beyond, that would potentially change the game by making this stuff accessible. Because as you said, one of the key things, this is in the State of the AI Report. This is in AI Index as well, is that most research has been subject to scaling laws and other things. Transformers seem to work for everything. It was moving more and more towards private companies, but the power of this technology is double-edged.

One is that there are fears about what could go wrong so it's not released. The other one is why not keep it for excess returns? You've had this massive brain drain occurring and no real option. You work in an academic lab, you have a couple of GPUs. Or you go and work at big tech/OpenAI, or you set up your own startup, which is very, very difficult, as you guys know. I wanted to create another option, and that's what we did with Eleuther and Stability, and the other communities that we have grown and incubated.

ELAD:

Could you talk more broadly about why you think it's important for there to be open-source efforts in AI, and what your view of the world is? Because I think Stability has really helped create this alternative to a lot of the closed ecosystems, particularly around image gen and protein folding, a variety of different areas. Those are incredibly important efforts.

I'd just love to hear more about your thoughts on why is this important, how you all view the participation of the industry over time? Also, what you think the world looks like in five years, 10 years, et cetera, in terms of closed versus open systems?

EMAD:

I think there's a fundamental misunderstanding about this technology because it's a very new thing. Classical open source has lots of people working together with a bit of direction that's a bit chaotic, but then you've seen Red Hat and other things emerge from this. There aren't many people that train these models. We don't invite the whole community and you have 100 people training a model. It's usually five to 10, plus a supercomputer and a data team, and things like that.

The models when they come out, are a new type of programming primitive infrastructure. Because you can have a Stable Diffusion that's two gigabytes, that deterministically converts a string into an image. That's a bit insane and that's what's led to the adoption here. On GitHub Stars, we've overtaken Ethereum and Bitcoin. Cumulatively, it took them 10 years, we got there in three, four months. If you look at the whole ecosystem, it's the most popular open-source software ever, not just AI.

Why? Because again, it is this new translation file and you do the pre-compute as it were on these big supercomputers. Which means the inference required to create an image is very low, and that's not what people would've expected five years ago, or to create a ChatGPT output. As infrastructure, I think that's how it should be viewed. My take was that what would happen is everyone would be closed because you needed talent, data and super compute, and those would be lacking as it were.

It'd be the big companies only. They would go four or five years, and then someone would defect and go open source. It would collapse the market as they would commoditize everyone else's compliment. Similar to Google offering free Gmail and all sorts of stuff around their core business. But more than that, I realized that governments and others would need this infrastructure, because if a company has it privately, they will sell to business-to-business.

Maybe a bit of B2C, but we've seen the Cambrian explosion of people building around this technology, but who's building the Japan model, or the India model or others? Well, we are. Then that means that you can tap into infrastructure spending, which is very important because it needs billions. But the reality is that's actually a small drop in the ocean. Self-driving cars got $100 billion of investment. Web3, hundreds of billions. 5G, trillions, and for me, this is 5G level.

From an ethical, moral perspective, I was like, "We've got to make this as equitably available as possible." From a business model perspective, I thought it was a good idea as well, but I thought we were held here inevitably. I decided to create Stability to help coordinate and drive this forward, in what's hopefully a moral and reasonable way. The decisions that we make have a lot of input and they're not easy, but we are trying to be Switzerland in the middle of all of this, and provide infrastructure that will uplift everyone here.

ELAD:

What do you think this world looks like in five years or 10 years? Do you think that there's a mix of closed and open source? Do you think the most cutting edge models, the giant language models are going to be both?

Or do you think capital will eventually become such a large obstacle, that it'll make the private world more likely to drive progress forward? I know you have plans in terms of how to offset that, but I'd just love to hear about those.

EMAD:

The reality is we have more compute available to us than Microsoft or Google. I have access to national supercomputers, and I'm helping multiple nations build access scale computers. To give you an example, we just got a seven million hour grant on Summit, one of the fastest supercomputers in the US. Like I said, we're building access scale computers that are literally the fastest in the world. Private companies don't have access to that infrastructure because governments, thanks to us, are realizing that this is infrastructure of the future.

We have more compute access, we have more cooperation from the whole of academia than all of them do because their agreements tend to be commercial. There's no way that private enterprise can keep up with us, and our costs are zero as well when you actually consider that. Whereas they have to ramp up tens of billions of dollars of compute. My take is that foundation models will all be open source for the deep learning phase, because we've actually got multiple phases now.

The first stage is deep learning, that's creating of these large models. We'll be the coordinator of the open source. The next stage is the reinforcement learning, the Instruct models, Flan-PaLM or InstructGPT or others. That requires very specified annotation and that's something that private companies can excel in. The next stage beyond that is fine-tuning. Actually, let's give a practical example. PaLM is a 540 billion parameter model. It achieves about 50% on medical answers.

Flan-PaLM is the instructed version of that, and that achieves 70%. Med-PaLM, they took medical information, they fed it in, this is a recent paper from a few weeks ago, achieved 92%, which is human level on the answers. Then the final stage for that is you take this Med-PaLM and you put it into clinical practice with human-in-the-loop. For me, the private sector will be focused on the Instruct to human-in-the-loop area. The base models will be infrastructure available to everyone on an international generalized and national basis.

Particularly because when you combine models together, I think that's superior to creating multilingual models. That's quite a bit there and I'm sure you want to unpack that.

ELAD:

Yeah, that's very exciting. Yeah. Could you actually talk about the range of things or efforts that are going on at Stability right now? I know that you've done everything from these foundation models on the language side, protein folding, image gen, et cetera.

If you could just explain what is a spectrum of stuff that Stability does and supports, and works with? Then what are the areas that you're putting the most emphasis behind going forward?

EMAD:

Yeah. I think we are the only independent, multimodal AI company in the world. You have amazing research labs like FAIR at Meta and others, and DeepMind doing everything from protein folding, to language, to image. There are cross-learnings from all of these. Basically, we do everything from audio to language, coding models. Any kind of almost private model, we are looking at what the open equivalent looks like and that's not always a replication. With Stable Diffusion, for example, we optimized it for a 24 gigabyte view around GPU.

Now, as of the release of Distilled Stable Diffusion, it will run in a couple of seconds on an iPhone. We have neural engine access because our view of the future is creating models that aren't necessarily bigger, but that are customizable and editable. This is a bit of a different emphasis, and we think that's a superior thing than scaling. I think things like the Chinchilla paper, that's the 67 billion parameter model, that's as performant as GPT-3 at 175 billion.

Are important in that because it said that training more is important. Actually, when you dig into it, it actually said data quality is important. Because now we are seeing that the first stage or the DL stage, where the deep learning stage is, let's use all the tokens on the internet, but maybe we can use better tokens. That's what we see when we instruct and use reinforcement learning with human feedback. We've also been releasing technology around that. Our Carpa Lab representative learning, we released our Instruct framework that allows you to instruct these big models to be more human.

The way I put it is that our focus is thinking what are the foundation models that will advance humanity, be it commercial or not? What needs to be there, and what's rate susceptible to this transformer-based architecture that takes about 80% of all research in the space? Making that compute and knowledge, and understanding of how to build these models available to academia, independent research and our own researchers. Then from a business perspective, really focusing on where our edge is. Our edge is in two areas, one is media.

This is why image models, video models and audio models have been a focus, 3D soon as well. The other area is private and regulated data, because what's the probability that a GPT-3 model weight or a PaLM model weight will be put on-prem? It's very low. Versus an open model, it's very high and there's a lot more valuable private data than there is public data. It is a bit of everything, but like I said, there are certain focuses on a business side on media. Then I think on a breakthrough side, computational biology will be the biggest one.

ELAD:

That's really cool. On the computational biology side, I guess there's a few different areas. There's things like protein folding and then to your point, there's things like Med-PaLM.

Are you thinking of playing a role in both of those types of models, in terms of both the medical information?

EMAD:

Yes.

ELAD:

Yeah.

EMAD:

We will release an open Med-PaLM model, well, med-stable GPT. Then protein folding, we are one of the key drivers of OpenFold right now. We just released a paper on that, much faster ablations than AlphaFold. We're doing as well DNA diffusion for predicting the outcome of DNA sequences. We have biolab around taking language models for chemical reactions. That's an area that we will aggressively build, because there's a lot of demand from the computational biology side for some level of standardization there.

There have been initiatives like MELLODDY and others looking at federated learning, but there is a misalignment of incentives in that space that I think we could come in and fix. I think that's where we really view ourselves. How can you really align incentive structures and create a foundational element that brings people together? I think that's where we are most valuable, because private sector can't do it that well. Public sector can't do it that well. A mission-oriented, private company that has this broad base and all these areas could potentially.

ELAD:

Yeah. I think also the global nature of your focus is really exciting. Because when I look at things like medical information or medical models, ultimately the big vision there, which a number of people have talked about for decades at this point. Is that you'd have a machine that would allow you to have very high access to care and medical information no matter where you are in the world. Especially since you can take images with your phone, and then interpret them with different types of models and then have an output.

If you have a cardiac issue, you should have care equivalent to the world's best cardiologist from Stanford, or you name the center of excellence available to anybody in the world, whether they're rich, poor, a developing country, not, et cetera. It's very compelling to see this big wave of technology and the things that it may be able to enable, including some of the things that you mentioned around AI and medicine.

EMAD:

Yeah.

ELAD:

It's very exciting stuff.

EMAD:

I think it's very interesting as well, because this technology is being adopted so fast. Let's face it, Microsoft and Google, $2 trillion companies have made it core of their strategy, which is crazy insane. For technology, that's basically five years old or let's say two years old, really breaking through because it can adapt to existing infrastructure. It sits there and it absorbs knowledge when you fine-tune it.

But then my thing is I look to the future and I'm like that best doctor, which bits of that should be infrastructure for everyone, and which bits of that should be private? That's how I oriented my business. I look to the future, I come back and I think, "What should be public infrastructure, and how can I help build that and coordinate that, and that's valuable?" Then everything else, other people can build around.

ELAD:

How do you think about the traditional pushback that's existed in the medical world around some of these technologies? For example, the first time an expert system or a computer could actually outperform Stanford University physicians at predicting infectious disease, was in the 1970s with this micing project where they literally trained an expert system, or designed an expert system to be able to predict an infectious disease.

But here we are almost 50 years later, with none of that technology adopted. Do you think it's just we have to do a lot of human-in-the-loop things and it's a doctor's assistant, and that'll be good enough? Do you think it's just a sea change, there aren't enough physicians? What do you think is the driver for the technological adoption in something so important today?

EMAD:

I think the infrastructural barriers are huge for adoption of technologies, particularly when private sector. I think there is a new space of open-source technology adoption that could be very interesting, and a willingness now that people understand this, which wasn't there even 10 years ago. The nature of open source, now it runs the world's servers and databases. I think there's another level of open source, which is open-source, complex systems as it were. Previously, in other discussions, I've talked about our education work.

Right now, we're deploying four million tablets to every child in Malawi. By next year, we'll have hundreds of millions of kids hopefully that we deploy to. It's not just education, it's healthcare and it's working with the government. It's working with multilaterals to say, "Can we build a healthcare system from the bottom up, that can do all of these things without an existing infrastructure?" Because they don't have an existing infrastructure. It's one doctor per 1,000 kids, 10,000 kids, one teacher per 400 kids.

I am certain that system will outperform anything in the West within five years, which is crazy to say. But then our Western systems can then take bits of that and adapt to it, because I think this competitive pressure is required because Western systems are very hard to change. In the UK, we've done that with HDR UK, the genomic banks and others. That was a massive, uphill battle, as you know, to get these technologies adopted. Because there should be barriers to adoption of this technology when it comes to things as important as healthcare. But at the same time, I think now is the time to open that up.

SARAH:

Yeah. I think there is an interesting loose analogy to different pace of adoption of different technologies, in different geos in the past. One that comes to mind is today, I think it's a very commonplace amongst consumer internet investors to look at what's happened with mobile in East Asia, as a precursor to interactions that might happen here.

Mobile technology advanced much more rapidly in China, Korea, many other places. One because of private-public partnership. Two, because there was more, I guess, greenfield in terms of access to information and different infrastructure that supported mobile as the primary communication medium. I could certainly see that happening with some AI native products in terms of it.

EMAD:

I think that's an excellent point. I agree 100%. I think just as they leapfrog to mobile, a lot of the emerging markets, Asia in particular, will leapfrog to generative AI or personalized AI. I can see this because I'm having discussions with the governments right now. What is the reaction? Over the Christmas holiday, I was getting a few hours of sleep, finally. I got six calls from headmasters of UK schools saying, "Emad, what is our generative AI strategy?"

I was like, "Your what?" They were like, "All our kids are using ChatGPT to do their homework." It's one of the first global moments, an amazing interface that opened the AI belt, it's going mainstream. I was like, "Well, good. Stop assigning essays." Now in some of the top private schools in the UK, they actually have to write the essays during the lessons without computers, which I think is wrong. Because my discussions in an Asian context, for example, with certain leading governments that are about to put tens of billions into this space.

They're embracing the technology and they're like, "How can we have our own versions of this? How can we implement this to help our students get even better?" Because also, even though there might be bureaucracy in some of these nations, if they want to get something done, they get it done. This technology is very different, in that the costs are not continuous like a 5G network. The CapEx profile and other things are very different. You can say it costs $10 million to train a GPT, it doesn't cost that much anymore.

That's really valuable if you can have a ChatGPT for everyone. The ROIs are huge. Yeah. I do think that a lot of these nations, the African context is one that we are driving forward with education as a core piece. Right now, we're teaching kids with the most basic AI in the world, literacy and numeracy in 13 months on one hour a day in refugee camps. That's insane, that's already better. It's going to get even better. But I think Asia in particular, they're going to go directly to this technology and embrace it fully.

Then we have to have a question. If you're not embracing this in the West, in America, in the UK, you're going to fall behind, because ultimately, this can translate between structured and unstructured data quicker than anything.

SARAH:

I'd like to see what pace of adoption we can have in the United States of this technology as well, but I can see the prediction coming true. If we just go back to the core, I guess not the core necessarily, but the most advanced, mature use case within Stability and as you said, media as an advantage. What does the future of media look like?

Actually, even if we go back be before that, you're involved in early ecosystem efforts with Eleuther and such. How did you even identify that this was an area of interest for you versus everything else going on across modalities?

EMAD:

I've always been interested in meaning. Symantec is even part of my email address, and that's my religious studies as well around epistemology and ethics, ironically. The way that I viewed it is that the easiest way for us to communicate is what we're doing right now via words. That's held constant, but now we can communicate via phones and podcasts, or whatever and it's nice. Writing was more difficult and the Gutenberg Press made it easier, but visual communication is incredibly difficult via a PowerPoint, which is visual communication or art, which is visual communication.

Then you have video and things like that, which is just impossible. Now you have TikToks and others making it easier. I saw this technology and I was like, "If the pace of acceleration continues, visual communication becomes easy for everyone." Like my mom sending me memes every day, telling me to call more or whatever, and I'm like, "That's amazing because that creation will make humanity happier." You see art therapy, that's visual communication and it's the most effective form of therapy. What if you could give that to everyone? There was that aspect to it.

But then I saw movie creation and things like that. My first job was actually organizing the British Independent Film Awards, and being a reviewer for the Raindance Film Festival. Every year, I put a movie on for my birthday and we give the proceeds to charity. I get to see my favorite movie with my friends, it's pretty cool. Then I was the biggest video game investor in the world at one point. These types of communication and interaction are really interesting. I thought that people really misunderstood the metaverse UGC, and the nature of what could happen if anyone could create anything instantly.

It's not going to be a world for everyone or a world that everyone visits. It's going to be everyone sharing their own worlds and seeing the richness of humanity. Again, I thought that was an amazing ethical/moral imperative for making humanity better, but also an amazing business opportunity. Because the nature and way that we create media, will transform as a result of this technology. We're seeing it right now. We have amazing apps like Descript, where you could take this podcast and you can edit it with your words live.

You have amazing gaming things come out where you create assets and instances. Or some of this new 3D nerve technology where you can reshoot stuff. We are working with multiple movie studios at the moment, who are saving millions of dollars just implementing Stable Diffusion by itself, let alone these other technologies. That was for me, tremendously exciting to allow anyone not to be creative because people are creative, but to access creativity. Then allow the creatives to be even more creative, and tell even better stories.

SARAH:

I believe Sam from OpenAI said they don't think image generation is core on the path to AGI. It's obviously really important to you personally and to Stability. Tell us about your stance on AGI and if that's part of the Stability mission?

EMAD:

Yeah. I don't care about AGI except for it not killing us. They can care about it. My thing, what I care about is intelligence augmentation. This is the classic Memex type of thing. How can we make humans better? Our mission is to build the foundation to activate humanity's potential. Look, AGI is fine. Again, we have to have some things around that. I do believe that they are incorrect around multimodality being or images being a core component of that. But I think there are two paradigms here.

One is stack more layers and I'm sure GPT-4 and PALM 18, and all these things will be amazing, stacking more layers and having better data as well. But one of the things we saw, for example, Stable Diffusion, we put it out together and then people trained hundreds of different models. When you combine those models, it learns all sorts of features like perfect faces and perfect fingers, and other things. This is related to the work that DeepMind did with Gato and others, that show the auto regression of these models and the latent spaces becomes really, really interesting.

What if the route to AGI is not one big model to rule them all, trained on the whole internet and then narrowed down to human preferences, but instead, millions of models that reflect the diversity of humanity that are then brought together? I think that is an interesting way to look at it, because that will also more likely to be a human aligned AGI, rather than trying to make this massive elder god of weirdness bow to your will, which is what it feels like at the moment.

SARAH:

Yes. We're going to have a hive of elder gods instead. You've mentioned that Stability is still working on language. The application of diffusion models to image is a really unique breakthrough, and it's not as computationally intensive as the known approaches to language so far.

I think you've said that the core training run for the original Stable Diffusion was 150,000 A100 hours, which is not that huge in the grander scheme of things. What can you tell us about your approach to language?

EMAD:

Yeah. We have the EleutherAI side of things and our team there. We released GPT/Neo-J and X, which have been downloaded 20 million times. They're the most popular language models in the world. You basically either use GPT-3 or you use those. We're up to about 20 billion parameters and like I said, we've released our TRLX from the Carpa Lab, which is the Instruct framework. We're training multiple models in the up to 100 billion parameters now.

I don't think you need more. Chinchilla optimal to enable an open ChatGPT equivalent, enable an Open Claw equivalent. I think that will be an amazing foundation from which to train sector specific and other models, that then again can be autoregressed. There will be very interesting things around that. Language requires more, not necessarily because of the approach and diffusion breakthroughs. Recently, Google had their Musepaper, where they showed a transformer actually can replace the VA.

You don't necessarily need diffusion for great images, it's more because language is semantically dense, I think, versus images. There's a lot more accuracy that's required for these things. That, I think, there are various breakthroughs that can occur. We have an attention-free transformer model basis in RWKV that we've been funding. We've got a 14 billion parameter version of that coming out, that is showing amazing progress. But I think that the way to look at this is we haven't gone through the optimization cycle of language yet.

OpenAI again, amazing work they do. They announced InstructGPT, their 1.3 billion parameter version, outperformed 175 billion parameter GPT-3. You look at FLAN-T5, the Instruct version of the T5-XXL model from Google, the three billion parameter version outperforms GPT at 175 billion parameters in certain cases. These are very interesting results and it's one of those things that as these things get released, it gets optimized.

With Stable Diffusion, leave aside the architecture, day one, 5.6 seconds for an image generation using an A100. Now, not quite nine seconds. With the additional breakthroughs that are coming through, it will be 25 images a second. That's 100 times speed up, over 100 times, just from getting it out there and people interested in doing that. I think language models will be similar. I don't think that you need to have ridiculous scale when you can understand how humans interact with the models, and when you can learn from the collective of humanity.

Like I said, a very different approach of small language models or medium ones versus let's train a trillion parameters or whatever. I think there will be room for both. I think it will be use these amazingly packaged services from Microsoft and Google if you just want something out the box. Or if you need something trained on your own data with privacy, and things like that, that may not be as good but maybe better for you. Use an open-source space and work with our partners at SageMaker or whoever else.

ELAD:

Can you talk more about that in the context of your business model and your approach? You mentioned that you think that some of the areas that Stability will be focused on is media, and then proprietary and regulated data sets.

If there's things you can share right now in that area? If not, no worries, but if you can, it'd be interesting to learn more about how you view the business evolving.

EMAD:

Sure. Now we're training on hundreds and soon thousands of Bollywood movies, to create Bollywood video models with our partnership with Eros. That is exclusively licensed. We'll have audio models coming as well, a command model or whatever. We're talking to various other entities as well. This is why we have the partnership with Amazon and SageMaker. There'll be additional services that can train models for your behalf of most people.

Our focus is on the big models for nations, the big models for the largest corporates who will need to train their own models one day. That's really difficult. There's only 100 people who can train models in the world. It's not really a science, it's more an art. Losses explode all over the place when you try to do something, so we're going to make it easy for them. We're going to be inside the data centers training their own models that they control.

Our open-source models then become the benchmark models for everyone. Again, we have access to the neural engine, dedicated teams at Intel and others working on optimizing these. That is the model of the framework, and the open model is optimized. Then we take and create private models. Again, I think that's complimentary to the APIs and other things you'll see from Microsoft, Google, et cetera, because you would want both.

ELAD:

Yeah. Some of the other areas that you've focused on is or at least talked about, I think, in interesting ways is about how AI can be used to make our democracy more direct and digital. A little bit more about broader, global impact. Could you extrapolate a bit more there?

EMAD:

Yeah. I think you have to look at intelligence augmentation like information theory in classical channel ways, information is valuable in as much as it changes the state. We've obviously seen political information become more and more influenced by manipulation of stories and things like that, so the divide has been grown. What if we could create an AI that could translate between various things, make things easier to understand and make people more informed?

I think that would be ideal with some of these national and public models and interfaces being provided to people. Then that could be very positive for democracy and allowing people to really understand the needs. You can already with ChatGPT, when you train it on nature of yourself, it can summarize for your perspective. That's an amazing thing. You can tell it to talk like a five-year-old or a six-year-old, or an eight-year-old or a 10-year-old. Once it starts understanding, Sarah and Elad, that'll be even better.

Again, you don't need open source to do that. The OpenAI embeddings API is fantastic, but I think there'll be more and more of these services that allow there to be that filter layer between us and this mass of information on the internet. That will be amazing. I think if we build the education systems and other things correctly as well. This young lady's illustrated primer, that we're going to give to all the kids in Africa and beyond.

Again, let's really blue sky think, "How can we get people engaged with their communities and societies?" Because it will be a full open-source stack, not only education and healthcare and beyond. That's super exciting. I think again, that's the future of how we come together because you want to come together to form a human colossus. Like in the [inaudible 00:36:19], where you get shit done, pardon of my language. I think this is one of the best ways for us to do that leveraging these technologies.

SARAH:

It's okay, we don't have commercial sponsors.

ELAD:

There's actually a book called Lady of Mazes that's an AGI-centric book from 10 years ago. Basically, the idea is what you mentioned, whereas different AGI's gain models of how a subset of the population thinks about certain issues.

It substantiates into a virtual person, who's basically representing them in some House of Representatives equivalent. You don't actually have to vote. The AGI just synthesizes group opinions and then turns it into representatives.

EMAD:

Yeah. You have to think about with the advances like Meta's amazing work on CICERO, for example, beating humans on diplomacy. They used eight different language models combined. Again, I think this is the future, not just zero shock, multiple models interacting with each other is the way, full stop. The issue and mechanism design perspective of the game theory of our current economy, is that there is no central organizing factor that we trust. What is the trust in Congress?

I think they trust Congress less than cockroaches. No offense to Congress, please don't ring me up. It's just a poll. People will err towards trusting machines as it were. Machines are capable of low balancing. Now, they're capable of low balancing facts and things. We have to be super careful, as we integrate these things, what that looks like, because they will make more and more decisions for us. That could be for our benefit.

Like I said, as you said, having AI that speaks on our behalf and amalgamates, but then we need to make sure that these aren't too frail and fragile as we cede more and more of our own personal authority to them because they're optimizing. This is also one of the dangers on the alignment side. As we introduce RLHF into some of these large models, there are very weird instances of mode collapse and out of sample things. I do say these large models as well should be viewed as fiction creative models, not fact models.

Because otherwise, we've created the most efficient compression in the world. Does it make sense you can take terabytes of data and compress it down to a few gigabytes with no loss? No, of course, you lose something, you lose the factualness of them, but you keep the principle-based analysis of them. We have to be very clear about what these models can and can't do, because I think we will cede more and more of our authority individually as a society to the coordinators of these models.

ELAD:

Could you talk more about that in the context of safety? Because ultimately, one of the concerns that increased in the AI community is AI safety, and there's three or four components of that. There's alignment, will bots kill humans or whatever form you want to put it in?

SARAH:

They'll farm us, not kill us.

ELAD:

Farm us, that's a good point. They'll just build a giant RLHF farm on top of us or something. There's the concern around certain types of content, pedophilia, et cetera, that people don't want to have exist in society for all sorts of positive reasons.

There's politics. There's concerns, for example, that AI may become the next big battleground after social media, in terms of political viewpoints being represented in these models with the claims that they're not political viewpoints.

I'm curious how you think about AI safety more broadly, particularly when you talk about trust of models? To your point, part of it is fact versus fiction, but part of it may also be, "Well, it looks like it's political and so therefore, maybe I can't trust it at all."

EMAD:

Yeah. I don't think technology is neutral. I'm not one of the people that adheres to that, especially with the way we build it. It does reflect the biases and other things that we have in there. I did follow the open-source thing because I think we can adjust that. On the alignment side, it was interesting, Eleuther basically split into two. Part of it is Stability and the people who work here on capabilities.

The other part is conjecture that does specific work on alignment, and they're also based here in London. I think it's not easy. I think that everyone is ramping up at the same time and we don't really understand how this technology works, but we're doing our best. You have people Riley Goodside and other prompt whisperers who are like, "Wait, on earth it can do all these things."

I think that there needs to be more formalized work. I actually think there needs to be regulation around this because we are dealing with an unknown, unknown. I don't think we're doing good enough tying things together, particularly as we stack more layers, and we get bigger and bigger and bigger. I think small models are less dangerous, but then the combination of them may not be safe. But again, we don't know this yet.

SARAH:

You've mentioned before this support for the idea of regulation of large models. What would be a productive outcome of that regulation that you can imagine?

EMAD:

I think that a productive outcome of that regulation is anything above a certain level of FLOPS needs to be registered, similar to well, bio weapons and things that have the potential for dual use. I think there needs to be a dedicated international team of people who can actually understand and put in place some regulations on how do we test these models for things, like the amazing work Anthropic recently did with constitutional models and other things like that?

We just start pulling this knowledge as opposed to keeping it secret. But there is this game theoretic thing of one of the optimal ways to stop AGI happening is to build your own AGI first. I'm not sure if that will ever happen, but we're in a bit of a bind right now, which means that everyone's having their own arms race. When governments decide and I don't believe they decided yet, that having an AGI is the number one thing. Tens of billions, hundreds of billions will go into building bigger models.

Again, this is very dangerous, I think, from a variety of different perspectives. I prefer multilateral action right now as opposed to in the future. I've put that out there. I can't really drive it. I'm really dying from all the overwork as it is, but I do believe that should be the case. I think going on to the next one, as you said, the political biases and things like that, we can use this as filters in various ways. I think one of the interesting things, and the other thing I've called a regulation of.

Maybe I should do it a bit more loudly, is you have a lot of companies that have ads as one of their key elements, and ads are basically manipulation. These models are really convincing. They can write really great prose. My sister-in-law creates a comedy semantic that can do human realistic, emotional voices. She did like Val Kilmer's voice for his documentary and stuff like that before going to Spotify. It's going to be crazy the types of ads that you see.

We need to have regulation about those soon, because you're going to see Meta and Google and others trying to optimize for engagement and manipulation fundamentally. I think that those can then be co-opted by various other parties as well on the political spectrum. We need to start building some protections around that. What was the final one? Sorry, Elad.

ELAD:

I was just asking about, and I think Sarah asked the question around where do you think regulation should be applied? What would be positive outcomes of that versus negative outcomes?

EMAD:

Yeah. I think there should be these elements around identification of AI, especially on advertising. I think that there should be regulation on very large models in particular. European Union introducing a CE mark/generative AI restrictions where the creators are responsible for the outputs, I think is the wrong way. But there are other ones as well. I would call for opt-out mechanisms. I think we're the only ones building those for data sets because we're also building some of these data sets.

Trying to figure out attribution mechanisms for opt-in as well on the other side. Right now, the only thing that is really checked is robots.txt, which is the kind of thing I'm scraping. But I think again, it's evolving so fast that people might be okay with scraping, but they may not be okay with this. Legally it's fine, but then I think we should make this more and more inclusive as things go forward.

ELAD:

That's, for example, if an artist doesn't want their work represented in the corpus that a machine is trained on, for example?

EMAD:

Yes. It's difficult. It isn't just a case of don't look at deviant art on my website. What if your picture is on the BBC or CNN with a label? It will pick that up. It's a lot harder. This is why we trained our own open CLIP model.

We have the new CLIP landing this week. That's even better on zero, I think 80% because we need to know what data was on the generative and the guidance side, so that we could start offering opt-out and opt-in, It's not as easy as some people note.

ELAD:

Yeah. Then I guess one other area that people often talk about safety is more around defense applications, and the ethics of using some of these models in the context of defense or offense from a national perspective. What's your view on that?

EMAD

I think the "bad guys," I'm going to put that in quotes, have access to these models already in thousands and thousands of A100s. I think you have to stop building defense, but it's a very difficult one. We were going to do a $200,000 deep fake detector prize. But then it was pointed out quite reasonably that if you create a prize for a detector. Then creates, well, a bouncing effect, where you have a generator and a detector, and they bounce off each other, and you just get better, and better and better.

Now we're trying to rethink that. Maybe we'll offer a price for the best suggestion of how to do this. Similar to ChatGPT is detectable but not really. I think the defense implications of this, it's largely around misinformation, disinformation. This is an area that I have advised multiple governments on with my work on counter extremism and others, is a very difficult one to unpick. But I think one of the key things here is having attribution-based mechanisms and other things for curation, because our networks are curated.

This is where we've teamed up with Adobe on content, authenticity.org and others. I think that metadata element is probably the winning one here, but we have to standardize as quickly as possible around trusted sources. I think people already don't believe what they see though, which is a good thing and a bad thing. We want to have those trusted coordinators around this. Beyond that, in some of the more severe things around drones and slaughterbots, and things like that, I don't know how to stop that, unfortunately.

I think that's a very complicated thing, but we need an international compact on that because again, this technology is incredibly dangerous when used in those areas. I don't think there's been enough discussion at the highest levels on this, given the pace of adoption right now.

SARAH:

I think that's all we have time for today. One last important question for you. What controversial prediction, you seem like an optimist, but good or bad about AI do you have over the next five years?

EMAD:

I think that small models will outperform large models massively. Like you said, the hive model aspect. You will see ChatGPT level models running on the edge on smartphones in five years, which will be crazy.

ELAD:

Great. Thanks so much for joining us. Amazing conversation as usual.

EMAD:

It's my pleasure.

(Theme music fades in)

SARAH:

Thank you for listening to this week's episode of No Priors

ELAD:

Follow, No Priors for new guest each week and let us know online what you think and who in AI you want to hear from.

SARAH:

You can keep in touch with me and conviction by following at @saranormous

ELAD:

You can follow me on Twitter @EladGil. Thanks for listening.

SARAH:

No Priors is produced by Conviction in partnership with Pod People. Special thanks to our team: Cynthia Gildea and Pranav Reddy; and the production team at Pod People: Alex Vikmanis, Matt Sav, Aimee Machado], Ashton Carter, Danielle Roth, Carter Wogahn, and Billy Libby. Also our parents are children, the academy and open Google Soft AI, the future employer of all of Mankind.

Thanks!

🎙 No Priors, Episode 103: Emad Mostaque (TRANSCRIPT)