PerplexityBot

How PerplexityBot works

PerplexityBot crawls publicly accessible web pages to build the index used by Perplexity's search and answer features. When a user asks a question in Perplexity, the system retrieves relevant pages from this index, extracts useful passages, and synthesizes them into an answer with source citations.

Perplexity documents two crawler variants:

PerplexityBot — the primary crawler for search indexing and result surfacing.

Perplexity-User — a crawler that operates on behalf of users during real-time searches.

robots.txt configuration

PerplexityBot respects robots.txt directives. To allow PerplexityBot to crawl your site:

User-agent: PerplexityBot Allow: /

To block PerplexityBot:

User-agent: PerplexityBot Disallow: /

If PerplexityBot is not mentioned in robots.txt, it is allowed by default.

Does PerplexityBot use content for model training?

According to Perplexity's documentation, PerplexityBot does not use blocked content for pre-training foundation models. Blocking PerplexityBot in robots.txt prevents your content from being indexed and surfaced in Perplexity search results.

This is similar to the distinction OpenAI makes between OAI-SearchBot (search) and GPTBot (model training) — search crawling and model training are separate concerns.

Should you allow PerplexityBot?

If AI Search visibility is a strategic goal, allowing PerplexityBot is recommended. Perplexity is one of the primary AI search platforms alongside ChatGPT Search and Google AI Overviews. Blocking it removes your content from consideration as a cited source in Perplexity answers.

How to verify your current settings

Open yourdomain.com/robots.txt and check whether PerplexityBot appears under any User-agent directive. If it is not mentioned, it is allowed by default.

Source

Perplexity documents PerplexityBot at perplexity.ai/help-center/en/articles/10354969-how-does-perplexity-follow-robots-txt