docpull
Documentation fetcher for AI

Fetch docs.
Get clean Markdown.

Turn any docs site into AI-ready Markdown. Built for RAG pipelines, Claude Code skills, and training datasets.

pip install docpull

Features

Secure. Fast.

AI-Ready Output

Clean Markdown with YAML frontmatter. Ready for RAG and LLM training.

Streaming Dedup

Real-time duplicate detection. O(1) lookups, minimal memory.

JS Rendering

Playwright support for SPAs and JavaScript-heavy sites.

Secure by Default

HTTPS-only, robots.txt compliant, SSRF-protected.

Incremental Updates

ETag-based caching. Resume interrupted crawls automatically.

Content Filtering

Filter by language, path, or size. Full control over output.

Profiles

Presets for common workflows.

RAG

Default

Deduped, metadata-rich output for LLMs and vector stores.

Mirror

Archival

Full archive with caching and resume support.

Quick

Fast

50 pages, depth 2. For testing and sampling.

Custom

Advanced

No presets. Full control over every parameter.

Use with --profile rag

Examples

See what you get.

Input
docpull https://docs.stripe.com
Output
---
title: "Authentication"
source: https://docs.stripe.com/authentication
fetched: 2024-01-15T10:30:00Z
---

# Authentication

The Stripe API uses API keys to authenticate requests...

Install

pip install docpull