Job Description
Role Overview
We are building an AI-powered system that extracts structured data from reports (PDFs,
including scanned documents), applies configurable rule logic (validity and impairment
classification), and supports a human-in-the-loop review workflow.
This is not a research role. This is a hands-on engineering role focused on building a
production-grade document intelligence pipeline.
Core Responsibilities
Build a PDF processing pipeline (machine-readable + scanned PDFs)
Implement OCR and layout-aware parsing
Extract structured test scores into JSON
Design LLM-based structured extraction with schema validation
Implement rule-based classification engine (configurable thresholds)
Add confidence scoring per field
Create evaluation framework to measure extraction and classification accuracy
Optimize for reliability and explainability (not just demo output)
Required Technical Skills
Must Have
5+ years Python development experience
Experience with PDF parsing libraries (e.g., PyMuPDF, pdfplumber)
Hands-on OCR experience (Tesseract or cloud OCR APIs)
Experience using LLMs for structured extraction (schema-based outputs)
Experience validating and post-processing LLM output
Experience building rule engines or decision logic systems
Strong debugging and data validation skills
English: good communication
Strongly Preferred
Experience with document AI or form extraction systems
Experience building evaluation pipelines (precision/recall tracking)
Familiarity with medical or legal documents
Experience handling noisy scanned documents
What This Role Is NOT
Not prompt engineering only
Not model fine-tuning research
Not frontend/UI focused
Not a junior ML experimenter
Confidence scoring
Accuracy report
We need someone who understands:
OCR layout parsing structured extraction validation rule logic confidence scoring evaluation loop
Interview process: 1-2 rounds: interview with Michael - CEO
AI will be a new team, this position will be the beginning of a different team so they will be a player coach
Expected start date: In March
Rate: Upto 3000$
Location: Ho Chi Minh City (Hybrid/Remote within Vietnam)
Benefits and entitlements besides salary: Health Insurance/Potential Equity/Bonus
Type of contract: labour contract