Lidea Feed

Is blindly trusting AI chatbots dangerous? Google’s new test revealed the truth

Samira Vishwas December 18, 2025 03:24 PM

What is Google FACTS Benchmark: if you AI chatbots If you accept the answers given without checking them as correct, then this news is a warning for you. Google has recently released an important assessment report, in which shocking revelations have been made about the accuracy of AI chatbots. Google’s new FACTS Benchmark Suite Through this, it has come to light that even the world’s most powerful AI models are not completely reliable in terms of facts. According to the report, the factual accuracy of any major AI model does not exceed 70 percent. In simple words, AI chatbots are giving wrong answers out of every three.

Google Gemini 3 Pro was the most accurate

In this Google benchmark test, the company’s Gemini 3 Pro model remained at the forefront. This model achieved 69 percent factual accuracy, which was considered better than all competing AI systems. The models of OpenAI, Anthropic and Elon Musk’s company xAI could not even reach this level.

According to the report, Gemini 2.5 Pro and ChatGPT-5 recorded 62 percent accuracy. Whereas the accuracy of Claude 4.5 Opus was 51 percent and that of Grok 4 was about 54 percent. The special thing is that in multimodal tasks where images, charts or diagrams have to be understood along with text, most of the AI models proved to fail miserably and their accuracy fell below 50 percent.

What is Google’s FACTS Benchmark Test?

This benchmark test from Google tests the capabilities of AI models differently from traditional methods. Typically, AI testing involves having the model generate text summaries, ask questions, or write code. But in FACTS Benchmark it is checked how true the information given by AI actually is.

AI models go through four important tests

This benchmark is based on four practical use-cases.

The first test sees whether the AI model can provide factual answers based only on its training data.
The second test examines the search performance of the model.
The third test examines how accurately the model fits the documents when new or additional data is received.
The fourth and final test assesses the model’s ability to understand multimodal understanding such as charts, diagrams, and images.

What is the lesson for users?

This report from Google clearly indicates that considering AI chatbots as the final truth can still be risky. Especially in news, medical information or sensitive decisions, it is very important to cross-check AI’s answers.