From forecasting storms to designing molecules: How new AI foundation models can speed up scientific discovery

People have always looked for patterns to explain the universe and to predict the future. “Red sky at night, sailor’s delight. Red sky in morning, sailor’s warning” is an adage predicting the weather.

AI is very good at seeing patterns and making predictions. Now, Microsoft researchers are working to apply “foundation models” – large-scale models that take advantage of recent AI advances – to scientific disciplines. These models are trained on a wide variety of data and can excel at many tasks, in contrast to more specialized models. They have the potential to generate answers in a fraction of the time traditionally required and help solve more sophisticated problems.

Some of the wildly different scientific disciplines that are promising for advancement through AI include materials science, climate science and healthcare and life sciences. Experts say foundation models tailored to these disciplines will speed up the process of scientific discovery, allowing them to more quickly create practical things like medications, new materials or more accurate weather forecasts but also to better understand atoms, the human body or the Earth. Currently, many of these models are still under development at Microsoft Research, and the first, a weather model called Aurora, is already available.

“AI is a tool in your arsenal that can support you,” said Bonnie Kruft, partner and deputy director at Microsoft Research who helps oversee its AI for Science lab. “The idea is that we’re working on very science-specific models rather than language-specific models. We’re seeing this amazing opportunity to move beyond traditional human language-based large models into a new paradigm that employs mathematics and molecular simulations to create an even more powerful model for scientific discovery.”

A woman in a black top and tan pants sits on a red chair in front of glass-walled offices. — Bonnie Kruft, partner and deputy director at Microsoft Research who helps oversee its AI for Science lab. Photo by Jonathan Banks for Microsoft.

Recent AI advances that have allowed people to plan parties or generate graphic presentations with a few conversational prompts or get instant summaries of meetings they’ve missed were initially powered by a new class of AI models known as large language models (LLMs). This type of foundation model is trained on huge amounts of text to perform a wide variety of language-related tasks. Now, Microsoft researchers are discovering how some of these same AI architectures and approaches can fuel advances in scientific discovery.

“Large language models have two remarkable properties that are very useful. The first one is, of course, they can generate and can understand human language, so they provide a wonderful human interface to very sophisticated technologies. But the other property of large language models – and I think this came as a big surprise to many of us – is that they can function as effective reasoning engines. And, of course, that’s going to be very useful in scientific discovery,” said Chris Bishop, technical fellow and director of Microsoft Research AI for Science, at a keynote to the Microsoft Research Forum earlier this year.

At first, AI researchers thought that very specific models trained to perform a narrow task – like the ones that could win at chess or backgammon (but not both), or those that could translate languages or transcribe recordings (but not both) – would outperform larger generalized models like LLMs. But the opposite turned out to be true – there was no need to train one model to answer questions or summarize research about law, another in physics and another in Shakespeare because one large, generalized model was able to outperform across different subjects and tasks. Now, researchers are investigating the possibility that foundation models can do the same for science.

https://youtube.com/watch?v=CJejmZ5Luo4%3Ffeature%3Doembed%26enablejsapi%3D1%26origin%3Dhttps%253A%252F%252Fnews.microsoft.com

Traditionally, scientific discovery involved developing a hypothesis, testing it, tweaking it over many iterations until finding a solution or starting over, a process of weeding out what doesn’t work. By contrast, some foundation models flip that script by building rather than eliminating. Scientists can give foundation models parameters, such as which qualities they want, and the models can predict, say, the combinations of molecules that could work. Rather than find a needle in a haystack, the models suggest how to make needles directly.

In some cases, these foundation models are also designed to understand natural language, making it easy for scientists to write prompts. To look for a new material, for example, scientists might specify that they want a molecule that is stable (it won’t fall apart), that isn’t magnetic, that doesn’t conduct electricity and that isn’t rare or expensive.

LLMs are trained on text – words – but the foundation models that Microsoft researchers have been developing to advance discovery have mainly been trained on the languages of science – not just scientific textbooks and research papers but also mountains of data generated from solving those physics or chemistry equations.

Aurora, which takes weather and pollution forecasting to new levels, was trained on the language of the Earth’s atmosphere. MatterGen, which suggests new materials given prompts, and MatterSim, which predicts how new materials will behave, were trained on the language of molecules. TamGen – developed in collaboration between Microsoft Research and the Global Health Drug Discovery Institute (GHDDI), which develops drugs for infectious diseases that disproportionately affect populations in the developing world, focuses on other molecules – for new drugs and protein inhibitors for such diseases as tuberculosis and COVID-19.

But the other property of large language models – and I think this came as a big surprise to many of us – is that they can function as effective reasoning engines.

Just as some foods are better cooked by frying, others by boiling and others by baking, so, too, different scientific problems lend themselves to different AI techniques. Many recently developed AI models are generative – they generate answers and images based on natural language requests. But some AI models are emulators, which can simulate the properties or behaviors of something.

Yet each of these foundation models is broad – the materials model isn’t trying to discover only one kind of material but many, and the atmospheric model isn’t just to predict rain but also other phenomena such as pollution. This ability to do many things is key to defining an AI model as a foundation model. And the goal is to eventually link multiple models together to create even broader models, because broader, more diverse models have outperformed narrower ones in other areas.

MatterGen for new materials

Discovering new materials might seem like a narrow field, but in fact, it’s a huge focus of R & D because there are so many kinds – alloys, ceramics, polymers, composites, semiconductors – and because the possible combinations of atoms into new molecules number in the billions. New materials are vital to reduce the impact of carbon emissions as well as to find safe replacements for materials that endanger the environment or health.

Microsoft Research’s MatterGen foundation model “can actually directly generate the materials that satisfy your design conditions,” said Tian Xie, principal research manager at Microsoft Research in Cambridge, U.K. Scientists can not only tell MatterGen the kind of material they want to create, but also stipulate mechanical, electrical, magnetic and other properties.

“It gives materials scientists a way to come up with better hypotheses for the kinds of materials they want to design,” Xie said.

A man dressed in blue sits on a yellow chair. — Tian Xie, principal research manager at Microsoft Research in Cambridge, U.K. Photo by Jonathan Banks for Microsoft.

This is an advance over past methods because AI is three to five orders of magnitude more efficient at generating materials than screening all the millions of potential combinations to find those that meet the scientist’s criteria, Xie said. MatterGen starts with the scientist’s criteria and builds a solution, rather than starting with every possibility and screening over and over until a handful of potential combinations are left that match the scientist’s criteria. And it’s far, far more efficient and more economical than trying to create new materials in a lab through trial and error, Xie said, though lab work to synthesize the new materials candidates is necessary.

MatterGen is a diffusion model, an AI architecture that has been used in image creation tools. Instead of generating pictures, MatterGen generates molecules for new materials. All the data that has accumulated over decades, even centuries, of experiments is far too meager to train a foundation model. But because scientific fields such as physics and chemistry follow well-established mathematical equations, computing those equations many times creates the necessary volume of high-quality training data. The team created training data for MatterGen by using a quantum mechanics formula called density functional theory, running on high-performance computing, to generate some 600,000 structures.

Microsoft’s MatterGen research team is working with partners to validate some of the materials it has generated. Areas for the future include ways to recycle polymers and to create metal-organic frameworks that could be used for carbon capture. “So far we are focusing on inorganic materials, but in the future, we hope to expand it to more complex materials,” Xie said.

MatterSim for predicting how new materials will work

Even with the help of AI, creating a new material isn’t a straightforward process. MatterSim is a companion to MatterGen, simulating, or predicting, how the molecules of a new material will behave. If the result isn’t what the scientists wanted, they can do an iterative loop with MatterGen, tweaking the inputs the way one might tweak Microsoft Copilot prompts until the results meet the scientists’ requirements. Unlike MatterGen, however, MatterSim isn’t generative AI but an emulator that determines how molecules will behave under different temperatures and pressures.

MatterSim uses the Graphormer architecture, which is based on the basic idea of transformers – like LLMs, which break apart words or sentences in order to learn to predict the next word in a sentence – but was created by Microsoft Research for materials’ behavior and properties. “It is trained to master the language of atoms,” said Ziheng Lu, principal researcher at Microsoft Research AI for Science in Shanghai. “Predicting the behavior of materials is critical to chemists. What is more important, is the model mastering the language of atoms – to learn from the entire periodic table. What does the molecule look like in the embedding space? How to convert the structure of a molecule into a vector that the machine can understand? That is the most important thing MatterSim does, besides its power to predict materials properties.”

A male researcher in a blue shirt and jeans. — Ziheng Lu, principal researcher at Microsoft Research AI for Science in Shanghai. Photo courtesy of Microsoft.

The model uses active learning, which is similar to how a student might study for a test. As the model gets a new piece of data, it decides whether it is uncertain about it. If so, that data goes into the simulation to retrain the model, like students studying the parts of a subject they don’t yet know, rather than the parts they have already learned.

Very little data exists about the behavior of molecules, so the team used quantum mechanics calculations to create synthetic data, similar to the MatterGen example.

The result is ten times more accurate than any previous model “because we are able to generate data to cover unprecedented materials space,” Lu said. “That makes the model very accurate.”

For now, MatterSim focuses on inorganic materials, but other kinds might be added later. “MatterSim is a domain-specific foundation model. Researchers at AI for Science are moving toward a unified large foundation model that understands the entire language of science like molecules, biomolecules, DNA, materials, proteins – all these might be unified later, but for MatterSim at this moment, what we unify is the entire periodic table,” Lu said.

Aurora for atmospheric prediction

Computers have long been crucial for weather predictions, by crunching the numbers on equations in physics or fluid dynamics to try to simulate the atmospheric system. “Now AI and foundation models bring this new opportunity that is radically different,” said Paris Perdikaris, principal research manager at Microsoft Research AI for Science in Amsterdam. “Let’s go out and observe the world and collect as much data as we can. Then let’s train an AI system that can process this data, can extract patterns from this data and can be predictive in helping us forecast the weather, for example.”

Two spinning orange globes on a black background depict changes in the Earth’s atmosphere, with the actual measurements on the left nearly matching the Aurora forecast on the right. — The globe on the right shows Aurora’s weather prediction, while the globe on the left shows the actual conditions that were measured. Animation courtesy of Microsoft.

The big advantage of AI is that, once trained, it doesn’t require big computing power. Currently, generating a 10-day weather forecast with a supercomputer that runs around the clock takes about two hours, Perdikaris said. Aurora, Microsoft’s foundation model for the atmosphere, can do that job in a few seconds, using a desktop computer with a GPU card. “The major difference that AI methods bring is computational efficiency and reducing the cost of obtaining those forecasts,” he said.

Aurora also improves accuracy because it not only uses data from physics-based models but also real-world data from satellites, weather stations and other sources, “which contain a more truthful representation of reality,” he said. “Because it’s exposed to all these different sources of information, Aurora has the opportunity to kind of blend them together and produce a more accurate prediction than the conventional simulation tools we have in place.”

Aurora is a large neural network, a vision transformer, that was trained on 1.2 petabytes of data – about ten times the volume of all the text on the Internet. “This is still a tiny fraction of data that is out there that describes the Earth system,” Perdikaris said.

A male researcher in a blue suit smiles. — Paris Perdikaris, principal research manager at Microsoft Research AI for Science in Amsterdam. Photo courtesy of Microsoft.

Three typical weather questions – will it rain here in the next ten minutes? What will the weather be across the Earth over the next 10 days? What will the weather be months or years in the future? – were all treated by different prediction models up to now. Aurora, and its future extensions, will be able to answer all those questions with the same model.

Aurora was trained on weather data, but by finetuning it with atmospheric chemistry data, the model can predict pollution levels as well.

“One of our initial hypotheses was that we could leverage what the model learns from weather and try to adapt it to new tasks that are governed by different physics, like atmospheric chemistry, then see how it does,” Perdikaris said. “To our surprise, it’s been working and gives some initial results that are quite promising.”

The benefits of AI are even more pronounced for pollution predictions, which are ten times more costly than weather predictions.

Making scientific discovery more accessible

Lu noted that the models could make science much more appealing to students. When he was earning his degrees, he had to write out equations, “but now with these simulations, we can actually do the statistics using a computer or laptop. You can really see the reaction, the behaviors of the molecules and the materials in real time on the screen. It gives you a very good sense of what’s really happening, instead of just looking at equations on paper.”

Microsoft’s scientific foundation models were all built from the ground up on Azure. The company plans to make early versions of the models available to help democratize scientific discovery and get feedback from the community. This feedback will help identify practical applications that will inform and shape future iterations of the models, Kruft said.

Foundation models have the potential to transform daily life and revolutionize industries. By accelerating scientific discovery, they not only are expected to drive rapid advancements in areas like medicine and materials but also offer deeper insights into complex systems like atoms, molecules and proteins, Kruft said, adding that this, in turn, opens up vast commercial possibilities across various industries.

Source