host.tools

robots.txt parser

HTTP /api/v1/http/robots

Fetch and parse robots.txt — User-agent groups, Disallow/Allow, Sitemap, Crawl-delay.

https://cloudflare.com/robots.txt 200 1099 bytes 9 User-agent groups
User-agent: Cohere-ai
Allow
  • /
User-agent: Cohere-ai
Allow
  • /
User-agent: Cohere-ai
Allow
  • /
User-agent: Cohere-ai
Allow
  • /
User-agent: Cohere-ai
Allow
  • /
User-agent: Cohere-ai
Allow
  • /
User-agent: Cohere-ai
Allow
  • /
User-agent: Cohere-ai
Allow
  • /
User-agent: Cohere-ai
Allow
  • /
Raw robots.txt
# Robots.txt for www.cloudflare.com

User-agent: *
Allow: /

# Sitemap
Sitemap: https://www.cloudflare.com/sitemap.xml

# AI/LLM friendly content
# See https://llmstxt.org for the llms.txt specification
# llms.txt provides curated content for AI assistants and LLMs

# Allow AI crawlers to access markdown versions of pages
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Anthropic-AI
Allow: /

User-agent: Claude-Web
Allow: /

User-agent: CCBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Cohere-ai
Allow: /

# Content Signals — declare AI content usage preferences
# See https://contentsignals.org/ and https://datatracker.ietf.org/doc/draft-romm-aipref-contentsignals/
Content-Signal: ai-train=yes, search=yes, ai-input=yes

# AI-friendly content locations
# - /llms.txt - Curated overview for AI/LLMs (markdown)
# - /llms-full.txt - Full expanded content for larger context windows
# - /*.md - Markdown versions of all pages (append .md to any URL)
# - /.well-known/agents.json - Agent discovery and capabilities
How to use robots.txt parser
  1. 1
    Paste your input

    Enter the value at the top — domain, IP, URL, email, ASN, hash, whatever fits this tool. The smart input auto-detects type.

  2. 2
    Click "Inspect"

    host.tools issues real probes (DNS, HTTP, TCP, TLS, WHOIS where applicable) and renders the result in milliseconds.

  3. 3
    Open the API tab

    Every web tool has a sibling /api/v1/http/robots JSON endpoint with the same payload. One copy-as-curl click and you're scripting it.

Why this matters

Headers are how the modern web declares its security posture. Auditing them is the highest-ROI thing you can do this week.

API equivalent
/api/v1/http/robots?q=https%3A%2F%2Fcloudflare.com
curl -s '/api/v1/http/robots?q=https%3A%2F%2Fcloudflare.com'
Embed this tool
<iframe src="/http/robots?q={INPUT}&embed=1"
  width="100%" height="600" frameborder="0"></iframe>

Drop into any HTML page. The embed=1 flag hides nav and footer.

FAQ · robots.txt parser

Common questions

Is robots.txt parser free?
Yes — every tool is free on the web with a 200/hour rate limit per IP. The matching API endpoint /api/v1/http/robots is free up to 100 requests/hour, no key required.
Where does the data come from?
Real-time probes against authoritative sources (DNS root, RIRs, registries, the target server itself), plus partner data feeds from hostinfo.com (GeoIP/ASN) and hostcheck.com (reputation).
How fresh are the results?
Live by default. Cached for 5 minutes to make repeat queries instant; pass ?nocache=1 for a forced refresh.
Can I run this from the command line?
Yes — every tool ships with a copy-as-curl. There's also an official CLI: host.tools http robots YOUR_INPUT.
Can I monitor results over time?
Pro tier lets you schedule any tool to run every 1/5/15/60 min and alert on diff. See monitors.
host.tools Pro

Run robots.txt parser on a schedule. Get pinged when it changes.

Pro gets you bulk lookups, monitors, webhook alerts, history, exports and 10,000 API calls/day. $19/mo.

  • Schedule any tool — every 1, 5, 15, 60 min
  • Diff against last run, alert on change
  • Webhook + email + Slack + PagerDuty + OpsGenie
  • Bulk CSV upload, 1,000 inputs per job
  • Export results as CSV / NDJSON / Excel
  • 90-day history, comparison view