Meta’s Llama 4 models – Llama 4 Scout and Llama 4 Maverick are here! These models can help people build more personalized multimodal experiences, based on large improvements in image and text understanding and instruction following, and can accommodate a range of use cases and developer needs. Whether you’re building apps for reasoning, summarization, or conversational AI, Llama 4 Scout and Maverick deliver powerful performance with open access. Llama 4 models can be run, fine-tuned and deployed in Oracle Cloud Infrastructure (OCI) Data Science. Whether you’re a data scientist or a developer, OCI offers the infrastructure and tools to move fast in the evolving world of Generative AI.

What are Llama 4’s improvements?

Meta’s Llama 4 family includes:

  • Llama 4 Scout: A powerful multimodal model that supports context window of up to 10M tokens with 17B active parameters, 16 experts and a total of 109B parameters that can fit on a H100 (with Int4 quantization).
  • Llama 4 Maverick: A 17B active parameter model with 128 experts and a total of 400B parameters, delivering strong performance to cost ratio for reasoning and coding while remaining open-weight and customizable and can fit on a H100.

The new Llama 4 models use a mixture of experts (MoE) architecture. In MoE models, a single token activates only a fraction of the total parameters. MoE architectures are more compute efficient for model training and inference and, given a fixed training FLOPs budget, deliver higher quality models compared to dense architectures. Llama 4 models are designed with native multimodality, incorporating early fusion to seamlessly integrate text and vision tokens into a unified model backbone.

Llama 4 Scout and Llama 4 Maverick models are available today on Meta’s website llama.com and Hugging Face, an online model repository. Oracle Cloud Infrastructure (OCI) Data Science is a platform for data scientists and developers to work with open source models powered by OCI’s compute infrastructure with features that support the entire machine learning lifecycle. You can bring in Llama 4 models from Hugging Face or Meta to use inside OCI Data Science effortlessly.

Working with Llama 4 models through the Bring-Your-Own-Container approach

OCI Data Science supports a Bring Your Own Container approach for model deployment and jobs, which enables you to deploy and fine tune the Llama 4 models.  The Bring-Your-Own-Container approach requires downloading the model from the host repository, either through the Llama website or Hugging Face, and creating a Data Science model catalog entry. Next, you would download the latest vLLM container and push it to the OCI Registry. The newly released vLLM 0.8.3 is compatible with the Llama 4 models.  Then, you can deploy the model or run a fine tuning job with the vLLM container image in the OCI Registry.  Once the model is deployed, you’re set to invoke the model with an HTTP endpoint.  For more details, please check out our tutorials Deploy LLM Models using BYOC and Batch Inferencing guide.

Llama 4 Scout is a 17 billion active parameter model with 16 experts while Llama 4 Maverick is a 17 billion active parameter model with 128 experts. Llama 4 Scout (with Int4 quantization) fits on a H100 while Llama 4 Maverick fits on a H100.  Working with a H100 in OCI Data Sciences requires a reservation for the shape.  You can do so by submitting a service request and specifying the shape and region you are interested in using the shape.  For additional information on working with GPU in OCI Data Science, please check this page.   

Source

You May Also Like

Across the globe, Apple and its teams find new ways to give

The company’s Employee Giving program has raised over $880 million, with more…

Helping Indian startups drive global app innovations with MeitY Startup Hub

India is one of the fastest-growing app markets in the world. Millions…

New immersive AR experience brings student creativity to life

Australian artists create a new immersive educational experience, inspiring global cocreation and…

Samsung Electronics Unveils Far-Reaching, Next-Generation Memory Solutions at Flash Memory Summit 2022

Samsung Electronics, the world leader in advanced memory technology, today unveiled an…

Apple lands historic first Best Picture Oscar nomination for “CODA,”and secures six Academy Award nominations including Best Actor for Denzel Washington in “The Tragedy of Macbeth” and Best Supporting Actor for Troy Kotsur in “CODA”

CUPERTINO, CALIFORNIA Apple today made history, landing six Academy Award nominations in several…

New Cisco 800G Innovations Help to Supercharge the Internet for the Future

News Summary: Cisco’s new 28.8T / 36 x 800G line card, powered…

Accelerating telco transformation in the era of AI

AI is redefining digital transformation for every industry, including telecommunications. Every operator’s…

Mars and Microsoft work together to accelerate Mars’ digital transformation and reimagine business operations, Associate experience and consumer engagement

Mars and Microsoft work together to accelerate Mars’ digital transformation and reimagine business operations, Associate experience and consumer engagement