Contract Data/ML Engineer-Scoring Reliability & Candidate Archetypes (Part-time)

The Value Maximizer

Vietnam

3-5 Years

Save

Posted 9 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Job: Contract Data/ML Engineer Scoring Reliability & Candidate Archetypes - ASAP

Job Type: Part-time (for a 100-hour project)Job Presence: Remote (optional for Onsite and Hybrid in Vietnam)Candidate Location: Vietnam, IndiaJoining Date: ASAP as we're hiring this role urgently

Summary

Own the end-to-end implementation of two analytics features in Qode's multi-agent assessment stack: (1) bootstrap confidence intervals (CIs) for per-question scores to communicate stability/disagreement across evaluators, and (2) candidate archetype discovery via clustering to surface talent patterns beyond raw scores. You'll ship data plumbing, models, integrations, and lightweight reporting.

What you'll do

Data foundations: ensure per-candidate, per-question, per-agent criterion scores are structured and queryable; add/modify tables and JSON schemas as needed.
Bootstrap CIs: implement agent-level resampling, compute CI-90/CI-95, derive stability labels (high/medium/low), and persist alongside normalized scores; batch backfill existing records.
Archetypes: build standardized candidate feature vectors (per-question and/or per-criterion), run clustering (K-means/GMM/hierarchical), evaluate (e.g., silhouette), and generate human-readable labels from centroids and summaries.
Integrations: expose CI fields and cluster IDs/labels via API and internal dashboards; add basic charts/UX to surface stability and candidate type.
Reliability & performance: write unit/integration tests, guardrails (min N agents), and ensure pipeline runtime stays within agreed budgets
Docs & handoff: clear README/runbooks covering data contracts, thresholds, and ops

Must-have Skills And Qualifications

3-5 years of experience in a relevant role
Python (pandas, NumPy, scikit-learn), SQL, DB migrations (e.g., Postgres)
Statistical resampling (bootstrap), clustering, model selection/validation
Data engineering for batch jobs/backfills; API integration
Pragmatic product sense for labeling clusters and communicating uncertainty

Nice-to-haves

Airflow/dbt/Prefect; Grafana/Metabase; experience with multi-agent/LLM evaluation pipelines; cloud (GCP/AWS/Azure); Docker/Kubernetes

Deliverables & acceptance criteria

CI service/module + persisted mean, ci_low, ci_high, stability_label for 100% of scored candidate-question rows with N agents; reproducible backfill completed.
Clustering job that assigns cluster_id and cluster_label to each candidate; labels documented with centroid profiles and example candidates.
API fields and minimal dashboard tiles (scoreCI, stability badge; Candidate Type with top strengths/weaknesses)
Tests (unit + E2E), monitoring hooks, and runbooks