// flagship · 2024 — present
Ranger Discovery
Vendor-neutral LLM extraction system for mineral rights documents
Ranger automates the extraction of structured ownership data from mineral rights documents — deeds, lease amendments, conveyance instruments — using a custom LLM pipeline built to handle real-world legal documents at scale. It routes across OpenAI, Anthropic, and Gemini through a unified interface, with strict output validation and per-token cost tracking throughout.
The problem
Mineral rights ownership in Texas is tracked through decades of paper deeds, lease amendments, and conveyance instruments stored across county clerk offices. Determining who owns what requires reading through those documents manually — a slow, expensive process that doesn't scale. Ranger was built to automate it: feed in documents, get back structured ownership data ready for analysis. The goal was to replace hours of manual document review with a pipeline that runs overnight.
// module architecture
LLM Runtime Layer
The core of Ranger is a custom orchestration engine that breaks documents into chunks and processes them in defined stages — completing all chunks at one stage before advancing to the next. Each stage is independently configurable, and execution is isolated per chunk so failures are contained and surfaced as flagged records rather than silent data loss.
Vision-First Document Processing
Legal documents are passed to the model as page images rather than pre-processed text. This skips a brittle OCR step and lets the model work from the document as it actually appears — handling handwritten annotations, non-standard layouts, and formatting that OCR tools often misread.
Vendor-Neutral Gateway
OpenAI, Anthropic, and Gemini all sit behind a single interface — swapping models requires no changes to pipeline logic. Per-token cost tracking across all providers makes it straightforward to compare model performance against actual cost, and to control spend as usage scales.
Extraction Pipeline
Documents run through two sequential extraction passes: first identifying parties and property identifiers, then using those results as context for conveyance reasoning. All output is validated against a strict JSON schema in code — the model produces structured data, and the pipeline enforces correctness, not the other way around.
PostgreSQL Schema
Extraction outputs write into a PostgreSQL schema modeling mineral tracts, conveyance instruments, and ownership chains. Schema versioning means extraction outputs are always tagged with the version that produced them — making reprocessing straightforward as the data model evolves.
Engineering philosophy
- ·Deterministic extraction — prompts versioned, outputs validated against schema
- ·Vendor neutrality — no hard dependency on a single model provider
- ·Cost transparency — per-token accounting across all providers
- ·Frozen dataclasses and strong typing throughout the runtime
- ·Ruff-compliant, modular Python with clean module boundaries
- ·Explainable failures — edge cases surface as flagged records