← Projects

// flagship · 2024 — present

Ranger Discovery

Vendor-neutral LLM extraction system for mineral rights documents

Ranger automates the extraction of structured ownership data from mineral rights documents — deeds, lease amendments, conveyance instruments — using a custom LLM pipeline built to handle real-world legal documents at scale. It routes across OpenAI, Anthropic, and Gemini through a unified interface, with strict output validation and per-token cost tracking throughout.

The problem

Mineral rights ownership in Texas is tracked through decades of paper deeds, lease amendments, and conveyance instruments stored across county clerk offices. Determining who owns what requires reading through those documents manually — a slow, expensive process that doesn't scale. Ranger was built to automate it: feed in documents, get back structured ownership data ready for analysis. The goal was to replace hours of manual document review with a pipeline that runs overnight.

// module architecture

documents/DocumentBundle — ordered page images + provenance
llm_runtime/Plan/Step engine — chunked execution, cost tracking
llm/gatewayvendor dispatch — OpenAI · Anthropic · Gemini
pipelines/entity extraction → conveyance reasoning
db/PostgreSQL — mineral ownership schema

LLM Runtime Layer

The core of Ranger is a custom orchestration engine that breaks documents into chunks and processes them in defined stages — completing all chunks at one stage before advancing to the next. Each stage is independently configurable, and execution is isolated per chunk so failures are contained and surfaced as flagged records rather than silent data loss.

Vision-First Document Processing

Legal documents are passed to the model as page images rather than pre-processed text. This skips a brittle OCR step and lets the model work from the document as it actually appears — handling handwritten annotations, non-standard layouts, and formatting that OCR tools often misread.

Vendor-Neutral Gateway

OpenAI, Anthropic, and Gemini all sit behind a single interface — swapping models requires no changes to pipeline logic. Per-token cost tracking across all providers makes it straightforward to compare model performance against actual cost, and to control spend as usage scales.

Extraction Pipeline

Documents run through two sequential extraction passes: first identifying parties and property identifiers, then using those results as context for conveyance reasoning. All output is validated against a strict JSON schema in code — the model produces structured data, and the pipeline enforces correctness, not the other way around.

PostgreSQL Schema

Extraction outputs write into a PostgreSQL schema modeling mineral tracts, conveyance instruments, and ownership chains. Schema versioning means extraction outputs are always tagged with the version that produced them — making reprocessing straightforward as the data model evolves.

Engineering philosophy

  • ·Deterministic extraction — prompts versioned, outputs validated against schema
  • ·Vendor neutrality — no hard dependency on a single model provider
  • ·Cost transparency — per-token accounting across all providers
  • ·Frozen dataclasses and strong typing throughout the runtime
  • ·Ruff-compliant, modular Python with clean module boundaries
  • ·Explainable failures — edge cases surface as flagged records

Skills

Software Engineering
Python 3.11+Modular architectureFrozen dataclassesStrong typingRuffGit
AI / LLM Systems
Vision-first extractionChunked document promptingMulti-model orchestrationStructured JSON extractionToken cost modelingPrompt versioning
Pipeline Architecture
Plan/Step orchestrationMap-only execution semanticsExecution isolation per chunkCarry-forward contextJSON schema validation
Database & Backend
PostgreSQLOwnership chain modelingConveyance modelingEntity relationship modeling
Oil & Gas Domain
Mineral rightsLegal instrument interpretationConveyance reasoningOwnership chainsRRC datasets