Documentation fetcher for AI
Fetch docs.
Get clean Markdown.
Turn any docs site into AI-ready Markdown. Built for RAG pipelines, Claude Code skills, and training datasets.
pip install docpullFeatures
Secure. Fast.
AI-Ready Output
Clean Markdown with YAML frontmatter. Ready for RAG and LLM training.
Streaming Dedup
Real-time duplicate detection. O(1) lookups, minimal memory.
JS Rendering
Playwright support for SPAs and JavaScript-heavy sites.
Secure by Default
HTTPS-only, robots.txt compliant, SSRF-protected.
Incremental Updates
ETag-based caching. Resume interrupted crawls automatically.
Content Filtering
Filter by language, path, or size. Full control over output.
Profiles
Presets for common workflows.
RAG
DefaultDeduped, metadata-rich output for LLMs and vector stores.
Mirror
ArchivalFull archive with caching and resume support.
Quick
Fast50 pages, depth 2. For testing and sampling.
Custom
AdvancedNo presets. Full control over every parameter.
Use with --profile rag
Examples
See what you get.
Input
docpull https://docs.stripe.comOutput
---
title: "Authentication"
source: https://docs.stripe.com/authentication
fetched: 2024-01-15T10:30:00Z
---
# Authentication
The Stripe API uses API keys to authenticate requests...Install
pip install docpull