Google launches Gemini 3.1 Flash-Lite, its “fastest and most cost-efficient” AI model
ETtech March 04, 2026 09:19 PM
Synopsis

Google has launched Gemini 3.1 Flash-Lite, its “fastest and most cost-efficient Gemini 3 series model,” available in preview via AI Studio and Vertex AI. Priced lower than Gemini 3.1 Pro, it offers faster response times, flexible ‘thinking levels,’ and strong performance for tasks from high-volume translation to complex reasoning, with early testers praising its precision.

Google has introduced Gemini 3.1 Flash-Lite, which it says is its “fastest and most cost-efficient Gemini 3 series model.”

“Starting today, 3.1 Flash-Lite is rolling out in a preview to developers via the Gemini API in Google AI Studio and for enterprises via Vertex AI,” the company said in a blog post.

Priced at $0.25 per million input tokens and $1.50 per million output tokens, Flash-Lite is significantly cheaper than flagship models such as Gemini 3.1 Pro ($2.00 per million input tokens and $1.50 per million output tokens).


Google claims it “outperforms 2.5 Flash with a 2.5X faster Time to First Answer Token and 45% increase in output speed, according to the Artificial Analysis benchmark, while maintaining similar or better quality.”

What Gemini 3.1 Flash-Lite can do

The model comes with ‘thinking levels’ in AI Studio and Vertex AI, giving developers the ability to control how much the model “thinks” for each task — important for managing high-frequency workloads.

“3.1 Flash-Lite can tackle tasks at scale, like high-volume translation and content moderation, where cost is a priority. And it can also handle more complex workloads where more in-depth reasoning is needed, like generating user interfaces and dashboards, creating simulations or following instructions,” the blog post said.

Early-access developers and companies, including Latitude, Cartwheel, and Whering, are already testing Flash-Lite for large-scale problem solving. Early testers highlighted 3.1 Flash-Lite’s efficiency and reasoning capabilities, saying it can “handle complex inputs with the precision of a larger-tier model, plus follow instructions and maintain adherence,” according to the blog post

Benchmarks and performance

Gemini 3.1 Flash-Lite got an Elo score of 1432 on the Arena.ai Leaderboard, outperforming other models in its tier for reasoning and multimodal understanding. It achieved 86.9% on GPQA Diamond and 76.8% on MMMU Pro, surpassing even larger Gemini models from previous generations, such as 2.5 Flash.

The model combines speed, cost efficiency, and flexible reasoning, making it suitable for both high-volume routine tasks and more complex AI workloads.
© Copyright @2026 LIDEA. All Rights Reserved.