AI search engines cite incorrect sources at an alarming 60% rate, study says – Read
News Update March 15, 2025 06:24 PM

Recent research from Columbia Journalism Review’s Tow Center for Digital Journalism has uncovered alarming accuracy problems with AI-powered search tools that many Americans now rely on for news information.

The comprehensive study tested eight generative AI platforms with live search capabilities, revealing that these sophisticated tools incorrectly answered more than 60 percent of queries to news sources.

According to researchers Klaudia Jaźwińska and Aisvarya Chandrasekar, approximately one in four Americans now turn to AI models as alternatives to traditional search engines. This growing reliance raises serious concerns about information integrity, particularly given the substantial error rates documented in their analysis.

“What we’re seeing is not just occasional mistakes but systematic failures in basic information retrieval,” a spokesperson for the research team explained. “These platforms are being marketed as reliable information sources, but our findings suggest otherwise.”

The study exposed significant variations in error rates across different platforms. Perplexity, while performing better than most competitors, still provided incorrect information in 37 percent of test queries.

AI Search Tools Ignore Publisher Restrictions, Fabricate Information

ChatGPT Search fared considerably worse, incorrectly identifying 67 percent (134 out of 200) of articles queried. Most concerning was Grok 3, which demonstrated an astonishing 94 percent error rate.

Credits: Ars Technica

To conduct their analysis, researchers employed a straightforward methodology: they presented each AI model with direct excerpts from published news articles and asked the platforms to identify basic information including the article’s headline, original publisher, publication date, and URL.

The team ran 1,600 queries across eight different generative search tools to ensure comprehensive results.

One particularly troubling pattern emerged across all tested platforms. Rather than acknowledging limitations when faced with uncertainty, these AI systems frequently produced what researchers termed “confabulations” – plausible-sounding but incorrect or entirely fabricated answers.

The study emphasized that this behavior was not isolated to any single platform but was consistent across the entire spectrum of tested models.

Perhaps most surprising was the discovery that premium paid versions of these AI search tools sometimes performed worse than their free counterparts. Specifically, Perplexity Pro ($20/month) and Grok 3’s premium service ($40/month) displayed greater confidence in delivering incorrect responses.

While these premium models correctly answered more prompts overall, their reluctance to decline to answer uncertain queries resulted in higher error rates when they did respond.

The research also uncovered significant issues regarding publisher control and citation practices. Evidence suggested some AI tools ignored Robot Exclusion Protocol settings, which publishers implement to prevent unauthorized access to their content.

For example, Perplexity’s free version correctly identified all ten excerpts from paywalled National Geographic content, despite National Geographic explicitly blocking Perplexity’s web crawlers from accessing their material.

AI Search’s Citation Crisis

Citation practices raised additional concerns. When AI search tools did provide sources, they frequently directed users to syndicated versions on platforms like Yahoo News rather than to original publisher websites. This redirection occurred even in cases where publishers had established formal licensing agreements with AI companies.

URL fabrication emerged as another significant problem. More than half of the citations provided by Google’s Gemini and Grok 3 led users to fabricated or broken URLs that resulted in error pages. In the case of Grok 3, a staggering 154 out of 200 citations tested resulted in broken links.

Credits: Ars Technnica

These findings place publishers in a difficult position. Blocking AI crawlers might lead to complete loss of attribution for their content, while permitting them allows widespread reuse without driving valuable traffic back to publishers’ websites.

Mark Howard, chief operating officer at Time magazine, expressed concerns about the lack of transparency and control over how Time’s content appears in AI-generated searches. Despite these issues, Howard remains cautiously optimistic about future improvements, noting, “Today is the worst that the product will ever be,” and citing substantial investments aimed at enhancing these tools.

However, Howard also placed some responsibility on users, suggesting skepticism should be expected: “If anybody as a consumer is right now believing that any of these free products are going to be 100 percent accurate, then shame on them.”

When approached for comment, both OpenAI and Microsoft acknowledged receipt of the research findings but did not directly address the specific issues identified.

OpenAI reiterated its commitment to supporting publishers by driving traffic through summaries, quotes, clear links, and attribution. Microsoft stated it adheres to Robot Exclusion Protocols and publisher directives.

This latest research builds upon previous findings published by the Tow Center in November 2024, which identified similar accuracy problems in how ChatGPT handled news- content. The comprehensive report is available on the Columbia Journalism Review’s website for those seeking more detailed information about the study methodology and findings.

As AI search tools continue to gain popularity, this research provides an important warning about their current limitations and the potential consequences for information accuracy in an increasingly AI-mediated information landscape.

© Copyright @2025 LIDEA. All Rights Reserved.