AI Visibility Tracking: SEO’s Dirty Secret

“How are we performing in ChatGPT?”

That question, in many different iterations, is now part of the furniture in conversations involving search marketing. The growth of AI as a search tool has been explosive and shows no signs of disappearing. However, with the familiar signals of search performance being notably absent for AI search, the challenge for the SEO industry revolves around how SEO’s provide valuable answers to questions around performance.

As the industry has responded, an increasing number of agencies, specialists, and tools are emerging in the AI visibility tracking space promising to answer these questions with meaningful data. Battlegrounds are being drawn on new metrics, giving marketing teams targets to optimise towards. On the surface, this feels like progress.

The reality is that some of these tools are creating a much deeper problem. Not only are these attempts often methodologically fragile and detached from real outcomes, but they are also increasingly environmentally reckless. AI visibility tools are hitting LLMs (Large Language Models, AI systems that are capable of understanding and generating natural language, such as ChatGPT, Claude, and Gemini) with thousands of prompts to generate insight with very limited business impact.

That friction sits at the heart of SEO’s newest dirty secret.

What does ‘AI Visibility’ mean?

Traditional search metrics have been clustered around the general principle of visibility; whether measuring impressions, clicks, rankings, rich snippet features, or otherwise – performance is evaluating how visible your brand or product is in search engines.

AI Visibility takes those principles and applies them to answers generated by LLMs; how often is your brand or product cited, paraphrased or used as a source in AI-generated answers.

To understand AI Visibility, it is important to understand how LLMs and other AI tools generate responses. One critical concept is the idea of ‘grounding’, where AI models are given additional (usually context-specific or up-to-date) data in order to better ensure accuracy and prevent hallucinations. In search, grounding is done using techniques such as retrieval-augmented generation (RAG) to conduct a traditional web search using a relevant query or queries to retrieve the additional data required.

Not all AI responses use grounding. For prompts where grounding isn’t used, the model relies purely on its training data to generate a response. If your brand or site aren’t referenced in that training data, there is very little you can do in the short or medium term to impact it. But for queries where grounding is used, SEO (or GEO) can improve your chances of being cited in responses.

However, unlike traditional search there are far fewer reliable metrics to base visibility on. There are no rankings, no impressions in the traditional sense, and visibility isn’t particularly repeatable as recent ground-breaking research by Rand Fishkin at SparkToro outlines.

Why does AI Visibility Tracking matter?

Given the challenges in tracking AI visibility versus traditional search, it’d be easy to wonder why many in the industry are bothering. But there are legitimate strategic and business motivations behind doing so. Search marketing teams are driven to understand how their content is being interpreted, allowing them to identify opportunities, topic coverage gaps, and even citation patterns.

But perhaps most prevalently, there is immense pressure to prove that SEO still delivers value. Businesses are undoubtedly fearful of traffic loss to their websites, where significant amounts of conversion activity take place. The value anchors that traditional search has relied on is broken for AI search, making reassurance just as important as optimisation.

How does AI Visibility Tracking work?

With limited scope for traditional tracking, what is the state of play in the AI Visibility Tracking field?

Referral Traffic

Perhaps the most robust metric available is to look at referral traffic from AI tools. In a common marketing stack, this would look like referral traffic from ChatGPT, Perplexity, and others in Google Analytics 4. This provides real users and real outcomes, trackable through your conversion funnels much like traditional organic traffic.

The issue with referral traffic is that it is severely underreported. Research by Conductor shows that AI referral traffic now accounts for around 1% of all website traffic. The vast majority of AI prompts don’t end in a click, exacerbating a trend known as ‘zero-click searches’. As a result, referral traffic is useful context but far from a complete signal.

Query Fan-Out

Another school of thought exists around tracking queries discovered through query fan-out. This is where LLMs break down a prompt into several sub-queries to run as traditional searches, compiling gathered information to build a more comprehensively grounded response.

Whilst this can drive indirect impact by ensuring your site ranks well for the types of queries LLMs are making, it is incredibly difficult to scale and is not a direct metric of performance in AI-generated answers.

Bing Webmaster Tools AI Performance

One recent development worth acknowledging is the AI Performance dashboard in Bing Webmaster Tools, released in February 2026. It provides top-level insights into citation performance across pages as well as grounding queries. Whilst this is a step in the right direction, the data available is very limited and only available at a top-level view.

Dedicated AI Visibility Trackers

This brings us to dedicated tracking tools for AI visibility. With brand new platforms appearing alongside fresh tools bolted on to traditional all-in-one search suites, these tools promise renewed tracking directly relevant to AI search. At first look, they are attractive for marketing teams and senior leaders alike; with metrics such as ‘Share of Voice’, ‘Citation Frequency’, and ‘Brand Mentions’ sounding the part. They mirror the language of traditional reporting and feel credible. However, underneath the surface lies an uncomfortable truth for the search marketing industry.

SEO’s Dirty Secret: The Cost of Dedicated AI Visibility Tracking

With very little (or often no) data available from the search platforms themselves, dedicated tracking tools for AI search relies on thrashing LLMs with massive volumes of automated prompts to generate insight, repeating querying across different platforms, models, locations, and variants. Tools such as Profound and Peec AI offer to run anywhere from 2,000 to more than 30,000 prompts per month per customer.

Herein lies the issue. It is no secret that generative artificial intelligence and large language models are environmentally expensive to train and use. Google have released data suggesting that an average Gemini prompt emits 0.03g of carbon dioxide and consumes 0.26 millilitres of water. Whilst these numbers sound small, with 2.5 billion ChatGPT queries made per day, this would suggest the platform is creating equivalent emissions to powering 29,000 US homes, as per IEEE Spectrum.

Scaling tracking using these methods directly increases the load placed on AI search tools, and the associated environmental cost. For businesses or organisations with an eye on sustainability, such as our focus on environmental responsibility and sustainability as part of our Strategy 2030 goals, this makes understanding the cost and value of the data returned critically important. And this is where the model breaks.

The uncomfortable truth is that the data these tools provides is inherently flawed. As outlined in the SparkToro research cited above, LLM answers are inconsistent. The very systems themselves are designed to give users unique answers based on user personalisation and prompt context. Without those driving factors, prompt tracking is just returning small, non-representative samples of data without repeatability. Feeding this data into metrics measuring citations, share of voice, or ‘narrative consistency’ (which is rather nonsensical in itself) is at best unstable and at worst just made-up.

So, What Instead?

If visibility tracking for AI search doesn’t provide any reliable or scalable value and is environmentally destructive, what do we do instead?

The truth is that comprehensive tracking just isn’t possible yet. It might not ever be given the implications of complete prompt tracking on privacy. Bing’s move towards first-party platform data is a positive step, and many in the industry will be watching closely for Google to follow suit.

In the meanwhile, SEO and marketing teams should be doubling down on the factors that influence grounded AI results. Traditional SEO signals still matter; LLMs still rely on web searches to generate answers, especially for middle and bottom of funnel queries. Report on metrics you can defend, emphasising traffic quality in a world where traffic quantity is increasingly volatile.

The anxiety surrounding visibility in AI results is real and well-placed. Rather than spending marketing budget on ineffective and destructive third-party tracking tools, organisations and search marketers should channel that anxiety into building resilient search foundations through content quality, topical authority, and aligning search marketing closely with traffic quality and conversion rate optimisation.

I recommend further research and commentary on AI Search and Sustainable Marketing from Rand Fishkin and Mark Williams-Cook, which has informed the thinking behind this article.