← Back to Blog

AI TOOLS

How to Track AI Bots on Your Website: The Complete Server-Side Guide

2026-03-24

How to Track AI Bots on Your Website: The Complete Server-Side Guide

Google Analytics cannot see AI bots. Over 90% of AI crawler traffic is completely invisible to JavaScript-based analytics. If you want to know which AI bots crawl your website, you need server-side tracking, and this guide shows you exactly how to set it up.

Most website owners have no idea how often AI crawlers visit their sites, which pages get indexed, or whether their content is being used to train models. The reason is simple: every major AI crawler ignores JavaScript. Google Analytics 4, Plausible, Fathom, and every other client-side analytics tool will never register a single AI bot visit. You could have GPTBot hitting your site 10,000 times a day and your analytics dashboard would show zero.

This is not a minor gap. Lee (2026) found that AI platforms like ChatGPT and Claude perform live page fetches during search queries, while Perplexity and Gemini rely on pre-built indices from regular crawling. Either way, understanding crawl behavior is the first step to controlling your AI search visibility. Aggarwal et al. (2024) demonstrated that targeted optimization can boost generative engine visibility by up to 40%, but you cannot optimize what you cannot measure.

The Bottom Line: If you are relying on GA4 or any JavaScript-based analytics to understand AI bot activity on your site, you are flying blind. Server-side tracking is not optional. It is the only way to see what is actually happening.

For a broader look at AI search optimization strategy, see our Complete GEO Guide.

🚫 WHY GOOGLE ANALYTICS MISSES 90%+ OF AI BOT TRAFFIC

To understand why GA4 fails at bot tracking, you need to understand how it works. GA4 relies on a JavaScript snippet (gtag.js) that runs in the visitor's browser. When a human visits your page, their browser executes the script, which sends a tracking event to Google's servers.

AI crawlers do not run JavaScript. They send HTTP requests, download the raw HTML, parse the content, and leave. No browser. No JavaScript execution. No analytics event.

This applies to every single AI crawler currently operating:

  • GPTBot and OAI-SearchBot fetch raw HTML for OpenAI's training data and search features
  • ChatGPT-User fetches pages in real time during ChatGPT search sessions
  • ClaudeBot, Claude-User, and Claude-SearchBot crawl for Anthropic's products
  • PerplexityBot indexes pages for Perplexity's search engine
  • Google-Extended and GoogleOther crawl for Gemini and AI Mode (separate from Googlebot)
  • Bytespider crawls for ByteDance's AI products
  • Meta-ExternalAgent crawls for Meta AI
  • AmazonBot crawls for Alexa and Amazon Q
  • AppleBot crawls for Apple Intelligence and Siri
  • DuckAssistBot crawls for DuckDuckGo's AI features

None of these execute JavaScript. None of them will ever appear in GA4.

The Bottom Line: The 90% gap is not an estimate. It is a structural limitation. JavaScript-based analytics physically cannot detect HTTP-only crawlers. The only solution is to move tracking to the server side, where every incoming request is visible regardless of whether the client runs scripts.

🔍 THE 15+ AI BOTS YOU NEED TO MONITOR

Before setting up tracking, you need to know exactly what you are looking for. Here is the complete list of AI crawler user-agent strings currently active on the web, organized by the company that operates them:

Bot Name Operator Purpose Priority
GPTBot OpenAI Training data collection Critical
OAI-SearchBot OpenAI SearchGPT index building Critical
ChatGPT-User OpenAI Real-time page fetches during chat Critical
ClaudeBot Anthropic Training and index building Critical
Claude-User Anthropic Real-time fetches during conversations High
Claude-SearchBot Anthropic Search feature crawling High
PerplexityBot Perplexity Search index building Critical
Google-Extended Google Gemini and AI Mode training Critical
GoogleOther Google AI-specific crawling (separate from search) High
Bytespider ByteDance AI model training Medium
Meta-ExternalAgent Meta Meta AI training and features Medium
AmazonBot Amazon Alexa and Amazon Q Medium
AppleBot Apple Apple Intelligence and Siri Medium
DuckAssistBot DuckDuckGo AI-powered answer features Low
cohere-ai Cohere Model training Low

The five critical bots (GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot, and Google-Extended) account for the vast majority of AI crawler traffic on most sites. If you are only going to track a subset, start with these.

Lee (2026) found significant architectural differences between these crawlers. ChatGPT and Claude perform live page fetches when users ask questions, meaning ChatGPT-User and Claude-User visits are directly tied to real-time AI search queries. Perplexity and Gemini, on the other hand, rely on pre-built indices, so PerplexityBot and Google-Extended crawl activity reflects future search potential rather than current queries.

For a deeper dive into identifying these bots on your specific site, see our guide on how to see which AI bots crawl your website.

🖥️ METHOD 1: SERVER LOG ANALYSIS

Server logs are the most fundamental and reliable way to track AI bot activity. Every web server records every HTTP request in its access logs, including the user-agent string that identifies the client making the request.

What a Bot Visit Looks Like

A typical AI crawler entry in your Nginx access log looks like this:

66.249.73.42 - - [24/Mar/2026:10:15:32 +0000] "GET /blog/example-post HTTP/1.1" 200 45230 "-" "Mozilla/5.0 AppleWebKit/537.36 (compatible; GPTBot/1.2; +https://openai.com/gptbot)"

This tells you the IP address, timestamp, requested URL, HTTP status code, response size, and the user-agent string identifying GPTBot.

Basic Log Parsing Commands

# Count visits by each AI bot
grep -E "GPTBot|ClaudeBot|PerplexityBot|Google-Extended|ChatGPT-User|OAI-SearchBot" \
  /var/log/nginx/access.log | grep -oP 'compatible; \K[^/;)]+' | sort | uniq -c | sort -rn

# See which pages GPTBot is crawling most
grep "GPTBot" /var/log/nginx/access.log | awk '{print $7}' | sort | uniq -c | sort -rn | head -20

Strengths and Limitations

Server log analysis is free, requires no third-party tools, and captures every single request. But it has real drawbacks: it requires SSH access, manual parsing, and does not scale well across multiple servers. There is no built-in alerting, no trend visualization, and no way to correlate crawl activity with AI search performance without significant custom work.

The Bottom Line: Server logs are the ground truth. Every other tracking method is built on top of the same data. But raw log analysis is a starting point, not a complete solution.

☁️ METHOD 2: EDGE MIDDLEWARE (VERCEL, CLOUDFLARE, NETLIFY)

If your site runs on a modern hosting platform, edge middleware gives you a powerful middle ground between raw server logs and full monitoring tools. Edge functions execute at the CDN level before a request reaches your origin server, which means they see every request, including AI crawlers.

Cloudflare Workers Example

Cloudflare provides built-in bot detection (Security > Bots), but you can build custom AI bot tracking with Workers:

export default {
  async fetch(request, env) {
    const ua = request.headers.get("user-agent") || "";
    const aiPatterns = ["GPTBot", "OAI-SearchBot", "ChatGPT-User",
      "ClaudeBot", "PerplexityBot", "Google-Extended", "Bytespider"];
    const matchedBot = aiPatterns.find(bot => ua.includes(bot));
    if (matchedBot) {
      await env.BOT_LOGS.put(`${Date.now()}-${matchedBot}`,
        JSON.stringify({ bot: matchedBot, url: request.url,
          timestamp: new Date().toISOString() }));
    }
    return fetch(request);
  }
};

Vercel Middleware Example

For Next.js sites on Vercel, middleware intercepts every request at the edge:

import { NextResponse } from "next/server";
import type { NextRequest } from "next/server";

const AI_BOTS = ["GPTBot", "ClaudeBot", "PerplexityBot", "Google-Extended"];

export function middleware(request: NextRequest) {
  const ua = request.headers.get("user-agent") || "";
  const bot = AI_BOTS.find(b => ua.includes(b));
  if (bot) {
    fetch("https://your-api.com/bot-log", {
      method: "POST",
      body: JSON.stringify({ bot, path: request.nextUrl.pathname })
    });
  }
  return NextResponse.next();
}

Strengths and Limitations

Edge middleware is more structured than raw log parsing and integrates with modern deployment workflows. It works across distributed CDN nodes without aggregating log files manually. However, it requires custom code, ongoing maintenance, and you still need to build your own dashboards, alerting, and analysis pipeline.

The Bottom Line: Edge middleware is the best DIY option for teams already on Vercel, Cloudflare, or Netlify. But "best DIY" still means you are building and maintaining a custom analytics system.

📊 METHOD COMPARISON: WHICH TRACKING APPROACH IS RIGHT FOR YOU

Here is how the three primary tracking methods compare across the dimensions that matter most:

Feature Server Logs Edge Middleware Dedicated Monitoring (BotSight)
Setup complexity Low (already exists) Medium (custom code) Low (add script/integration)
Captures all AI bots Yes Yes Yes
Real-time alerting No Possible (custom) Yes
Historical trends Manual parsing Custom database Built-in dashboards
Page-level analysis Yes (manual) Yes (custom) Yes (automatic)
Crawl frequency tracking Manual Custom Automatic
Recrawl pattern detection Manual Custom Automatic
Correlation with AI citations No No Yes
Multi-site support Difficult Per-deployment Built-in
Cost Free Platform costs Subscription
Maintenance burden High Medium Low

For most teams, the decision comes down to scale and resources. If you have a single site and a technical team comfortable with log parsing, server logs work. If you are on a modern platform and want more structure, edge middleware is a solid middle ground. If you need actionable insights without building custom infrastructure, a dedicated monitoring tool saves significant engineering time.

For a detailed comparison of specific tools and methods, see our AI bot tracking tools and methods guide.

📈 WHAT TO LOOK FOR IN YOUR BOT TRACKING DATA

Raw traffic numbers are a starting point, not the finish line. Once you have AI bot tracking in place, here are the specific metrics and patterns that actually matter for your AI search visibility:

Crawl Frequency

How often each bot visits your site tells you how much that platform values your content. A site getting crawled daily by GPTBot is in much better shape than one getting crawled weekly. Sudden drops in crawl frequency can signal technical problems (blocked crawlers, slow response times, server errors) that need immediate attention.

Page Coverage

Which pages are getting crawled, and which are being ignored? If AI bots are only hitting your homepage and top-level category pages but skipping your blog posts and product pages, your deeper content is invisible to AI search. Coverage gaps often indicate sitemap issues, poor internal linking, or pages that are too slow to crawl efficiently.

Recrawl Patterns

How frequently do bots revisit the same pages? Regular recrawls of key pages mean the AI platform is keeping its index fresh for your content. If a page was crawled once three months ago and never revisited, the AI platform is working with stale data and may not surface it in current responses.

Response Codes

Monitor the HTTP status codes returned to AI crawlers. A spike in 403 (forbidden) or 429 (rate limited) responses means you are actively blocking crawlers, potentially without realizing it. Soft 404s (pages that return 200 but display error content) are particularly insidious because they look fine in your logs but deliver useless content to the crawler.

Crawl-to-Citation Correlation

The most valuable insight is connecting crawl activity to actual AI search citations. If GPTBot is crawling your product pages frequently but ChatGPT never cites them in relevant queries, something is wrong with your content or structure, not your crawl access. Lee (2026) identified 7 statistically significant page-level features that predict citation, including internal navigation link density (OR = 2.75), canonical tags (OR = 1.92), and schema markup (OR = 1.69).

The Bottom Line: Track the patterns, not just the numbers. A page getting crawled 100 times a day but never cited has a content problem. A page getting cited frequently but crawled rarely is at risk of going stale. The goal is to identify and close these gaps.

Check your site's current AI visibility metrics instantly with our AI Visibility Quick Check.

🛠️ BOTSIGHT: PURPOSE-BUILT AI BOT MONITORING

Building custom bot tracking from server logs or edge middleware works, but it requires ongoing engineering investment and still leaves you without the analysis layer that turns raw data into decisions. BotSight was built specifically to solve this problem.

BotSight is a dedicated AI bot monitoring platform that tracks every AI crawler visiting your site and connects that activity to AI search visibility outcomes. Here is what it provides that DIY approaches do not:

Automatic Bot Detection

BotSight identifies all 15+ AI crawlers automatically, without manual user-agent string matching or regex maintenance. When new AI bots appear (and they appear regularly), BotSight updates its detection automatically.

Crawl Analytics Dashboard

See crawl frequency, page coverage, and recrawl patterns across all AI bots in a single view. Filter by bot, date range, URL pattern, or response code. Identify which pages are getting attention from which platforms.

Visibility Score

BotSight calculates an AI visibility score that combines crawl access, crawl frequency, page coverage, and content quality signals into a single metric. This tells you not just whether bots can reach your content, but whether they are likely to cite it.

Freshness Alerts

Get notified when crawl patterns change. If GPTBot stops visiting your site, or if crawl frequency drops by more than 50%, BotSight sends an alert so you can investigate before your AI search visibility degrades.

Search Correlation

BotSight connects crawl activity to AI search performance, showing you which pages are being crawled and cited, crawled but not cited, or cited but rarely re-crawled. This is the insight that raw log analysis cannot provide without significant custom development.

For teams serious about AI search visibility, BotSight eliminates the build-versus-buy dilemma. Learn more about our monitoring capabilities on the AI Visibility services page.

🔧 GETTING STARTED: A 30-MINUTE SETUP

You do not need a full monitoring solution to start. Here is a minimal setup:

Step 1: Check your robots.txt (https://yoursite.com/robots.txt) to confirm you are not blocking GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot, or Google-Extended.

Step 2: Create a daily log summary script:

#!/bin/bash
BOTS="GPTBot|ClaudeBot|PerplexityBot|Google-Extended|ChatGPT-User|OAI-SearchBot"
echo "--- Visits by Bot ---"
grep -E "$BOTS" /var/log/nginx/access.log | grep -oP 'compatible; \K[^/;)]+' | sort | uniq -c | sort -rn
echo "--- Top 20 Pages Crawled ---"
grep -E "$BOTS" /var/log/nginx/access.log | awk '{print $7}' | sort | uniq -c | sort -rn | head -20

Step 3: Graduate to a full solution. Trend graphs, recrawl detection, and citation correlation require either significant custom work or a dedicated tool like BotSight. Start with the basics, then decide how much automation you need.

The Bottom Line: You can start tracking AI bots in 30 minutes with SSH access and a bash script. But the real value comes from sustained monitoring and analysis, which is where purpose-built tools earn their keep.

❓ FREQUENTLY ASKED QUESTIONS

Can Google Analytics 4 track AI bots?

No. GA4 relies entirely on client-side JavaScript execution. AI crawlers like GPTBot, ClaudeBot, and PerplexityBot do not execute JavaScript. They make HTTP requests, download the HTML response, and leave. This means GA4 will never register a single AI bot visit, regardless of your configuration. The same applies to all JavaScript-based analytics platforms including Adobe Analytics, Mixpanel, and Plausible.

Which AI bots should I prioritize tracking?

Start with the five that matter most for AI search visibility: GPTBot (OpenAI/ChatGPT), ChatGPT-User (real-time search fetches), ClaudeBot (Anthropic/Claude), PerplexityBot (Perplexity search), and Google-Extended (Gemini/AI Mode). These cover the four largest AI search platforms. Once you have these tracked, expand to OAI-SearchBot, GoogleOther, Bytespider, Meta-ExternalAgent, and AmazonBot.

How do I know if AI bots are being blocked on my site?

Check your robots.txt file at https://yoursite.com/robots.txt. Look for Disallow rules under any AI bot user-agent. Also check your server logs for 403 or 429 status codes returned to AI crawler user-agents. CDN-level bot protection (Cloudflare Bot Fight Mode, Sucuri, etc.) can also block AI crawlers even when your robots.txt allows them. If you see zero AI bot entries in your logs, your firewall or CDN may be blocking them before they reach your server.

How often should AI bots be crawling my site?

There is no universal benchmark, as crawl frequency depends on your site's size, update frequency, and authority. However, if you are publishing new content regularly and see zero visits from major AI crawlers over a 7-day period, something is likely wrong. Healthy sites typically see daily visits from at least GPTBot and PerplexityBot. Use our AI Visibility Quick Check to see how your current crawl access compares.

Does blocking AI bots hurt my regular Google search rankings?

No. Google has explicitly stated that blocking Google-Extended does not affect your traditional Google Search rankings. Google-Extended is a separate user-agent from Googlebot and only controls AI training and Gemini features. Similarly, blocking GPTBot, ClaudeBot, or PerplexityBot has no effect on Google Search rankings. However, blocking these bots will make you invisible on their respective AI search platforms, which is an increasingly significant source of traffic and visibility. For more on balancing these tradeoffs, see our guide to tracking AI bots effectively.

REFERENCES

  1. Lee, A. (2026). Query Intent, Not Google Rank: What Best Predicts AI Citation Behavior. A.I. Plus Automation. https://doi.org/10.5281/zenodo.18653093

  2. Aggarwal, P., Murahari, V., Rajpurohit, T., Kalyan, A., Narasimhan, K., & Deshpande, A. (2024). GEO: Generative Engine Optimization. Proceedings of KDD 2024. https://doi.org/10.48550/arXiv.2311.09735