← All posts
Engineering·8 min read

How AI bid parsing actually works (and where it fails)

A technical walkthrough of how modern AI parses subcontractor bids — where it gets things right, where it gets things wrong, and how a review queue keeps the wrong-things out of production.

By PreconIntel Engineering

Most GC estimators we talk to have heard some version of the pitch: AI will parse your sub bids automatically. Drop in a PDF, line items come out the other side, classified to CSI, scope normalized, leveling sheet built. It sounds too good because, historically, it has been.

It is getting good. The 2024–2026 generation of vision-capable LLMs (Gemini 2.5 Pro, Claude 4, GPT-4.5) genuinely can parse a messy sub bid PDF into structured data with high accuracy on well-formed bids and decent accuracy on the weird ones. This post is a technical walkthrough of how the parse actually happens — and where it falls over, because the failure modes matter more than the success stories.

The input problem

Subcontractor bids arrive in roughly eight formats, ranked by frequency:

• Attached PDF (text-extractable) — the bid was generated from Excel or Word and still has a text layer • Attached PDF (scanned) — the bid was printed, signed, scanned, and emailed back; no text layer, just an image • Attached Excel or Google Sheets — the sub's internal template exported to XLS • Attached Word — rare, but it happens • Email body (no attachment) — the sub typed or pasted the bid directly into the email • Attached image (JPG/PNG) — a photo of a printed bid, usually from a phone • Attached CSV — usually a sub running an ERP that exports this way • Forwarded thread — the bid is three emails deep in a reply chain with mixed formats

A parser that handles four of these is a toy. A parser that handles all eight is product.

The pipeline

Our parse pipeline has four stages:

1. Format detection. Identify what we're looking at before touching the model. PDFs split into text-extractable vs scanned; Excel handled separately; email body and image attachments each get their own path.

2. Text extraction. Text-PDFs use PDF.js with layout preservation. Scanned PDFs go through OCR (Tesseract with construction-vocabulary post-processing). Excel keeps the sheet/row/column structure intact. The goal is to give the LLM clean text with enough structure to reason about.

3. Structured extraction. The extracted text goes to the model with a Zod schema describing what we want back: subcontractor identity, base amount, alternates, unit prices, line items with CSI classification, inclusions/exclusions/clarifications, contact info. The schema is the secret — without it, LLMs produce beautiful prose that's useless to downstream code.

4. Validation and confidence scoring. Every field comes back with a confidence score. Dollar amounts get sanity-checked against the base amount (do the line items sum correctly? within what tolerance?). CSI classifications get cross-checked against the sub's historical bids. Anything below threshold gets flagged for human review.

Where it gets things right

Most things, on most bids. For a text-extractable PDF from a sub with a clean template, extraction accuracy on key fields (base amount, subcontractor, CSI divisions) is north of 98%. Line-item detail is 90%+. Scope extraction (inclusions, exclusions, clarifications as tagged lists) is 85%+.

The model is notably good at:

• Identifying the subcontractor from email domain, signature block, and letterhead — even when three subs share an office and one signed on behalf of another • Distinguishing base amount from total amount when a bid lists both • Catching alternates that are hidden on page 3 of a 5-page PDF • Classifying line items to CSI divisions based on description alone, even when the sub uses their own trade language

Where it gets things wrong

Scanned PDFs are the hardest case. OCR is getting better every year but it still introduces errors, and a single digit wrong in a dollar amount is a meaningfully different bid. Our mitigation is to always show the original PDF side-by-side with the parsed output on scanned bids, and to require human confirmation before anything auto-commits.

CSI classification fails on specialty work. If the project has a lot of low-voltage, mass timber, or seismic-specific detail, the model's classification accuracy drops because it's leaning on patterns from generic commercial work. We flag these for human review rather than guess.

Scope extraction fails when scope is implied rather than stated. If a sub's bid says 'per plans and specs,' the model has no inclusions or exclusions to extract — because they're in the plans, which the model hasn't seen. This is a limit of the input, not the parse.

Unit price extraction fails on complex bid forms. Unit-price-heavy work (site work, earthwork) often has quantities in one column and unit rates in another with a multiplier. The model mis-aligns rows more often than on flat line-item bids. Better schemas help; perfection requires project-specific templates.

Why the review queue matters more than the model

The most important architectural decision in a bid parsing product isn't which model you use. It's what happens when the model isn't sure.

A parser that commits low-confidence parses to production silently is a liability. A parser that fails catastrophically on hard inputs is frustrating but safe. A parser that commits high-confidence parses automatically and routes low-confidence parses to a review queue with the original document side by side — that's the product.

PreconIntel's Bid Inbox runs this third model. Every parsed field has a confidence score. High-confidence parses commit. Anything below threshold lands in a queue where an estimator confirms or corrects — not re-keys from scratch, but confirms. The system tracks corrections over time, so what the model gets wrong once, it gets right next time.

The numbers

On a mature deployment with a GC's own bid history feeding the classification model, we see:

• ~75% of parses auto-commit at high confidence • ~20% land in review with high-confidence fields pre-filled and low-confidence fields flagged • ~5% require substantial manual review (usually scanned PDFs with poor OCR)

The 5% matter. They're where trust gets built or destroyed. Getting the 5% right — flagging them clearly, showing the source, making correction fast — is the work.

Ready to estimate smarter?

Start your 14-day Pro trial today. No credit card required. See why GCs are switching from spreadsheets to intelligence.