How PerplexityBot works
PerplexityBot crawls publicly accessible web pages to build the index used by Perplexity's search and answer features. When a user asks a question in Perplexity, the system retrieves relevant pages from this index, extracts useful passages, and synthesizes them into an answer with source citations.
Perplexity documents two crawler variants:
PerplexityBot — the primary crawler for search indexing and result surfacing.
Perplexity-User — a crawler that operates on behalf of users during real-time searches.
robots.txt configuration
PerplexityBot respects robots.txt directives. To allow PerplexityBot to crawl your site:
User-agent: PerplexityBot Allow: /
To block PerplexityBot:
User-agent: PerplexityBot Disallow: /
If PerplexityBot is not mentioned in robots.txt, it is allowed by default.
Does PerplexityBot use content for model training?
According to Perplexity's documentation, PerplexityBot does not use blocked content for pre-training foundation models. Blocking PerplexityBot in robots.txt prevents your content from being indexed and surfaced in Perplexity search results.
This is similar to the distinction OpenAI makes between OAI-SearchBot (search) and GPTBot (model training) — search crawling and model training are separate concerns.
Should you allow PerplexityBot?
If AI Search visibility is a strategic goal, allowing PerplexityBot is recommended. Perplexity is one of the primary AI search platforms alongside ChatGPT Search and Google AI Overviews. Blocking it removes your content from consideration as a cited source in Perplexity answers.
How to verify your current settings
Open yourdomain.com/robots.txt and check whether PerplexityBot appears under any User-agent directive. If it is not mentioned, it is allowed by default.
Source
Perplexity documents PerplexityBot at perplexity.ai/help-center/en/articles/10354969-how-does-perplexity-follow-robots-txt