Search by job, company or skills

Madison Technologies

Machine Learning Operations (MLOps) - AI/ML Platform

Save
  • Posted 10 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

About the Role:

A dedicated startup is being formed to industrialize and scale a secure, AI-enabled, multi-source decision-support software offering. The platform is a multi-sensor fusion and agentic AI solution connecting to diverse data sources (for example geospatial layers, imagery, video, and other operational signals). This role will support the delivery of a scalable product and contribute to establishing the processes, standards, and collaboration practices required for sustainable growth.

Own the reliability and scalability of ML and LLM-enabled services by building robust pipelines, deployments, monitoring, and operational controls in a fast-moving startup environment.

Responsibilities

  • Design and operate end-to-end ML/LLM delivery pipelines: data to training/fine-tuning to evaluation to packaging to deployment.
  • Build CI/CD for models and services, including automated testing, validation gates, and rollback strategies.
  • Standardize experiment tracking, model/version lineage, and artifact management (datasets, prompts, checkpoints, embeddings).
  • Implement monitoring and observability: latency, cost, drift, quality signals, and safety/guardrails metrics.
  • Optimize inference performance and cost (batching, caching, quantization, hardware choices).
  • Define and enforce environment and dependency management across dev/stage/prod.
  • Work with engineering on scalable serving patterns (APIs, streaming, event-driven), and with security on access controls and secrets.
  • Support release readiness: runbooks, incident response, SLOs/SLAs, and post-release stability tracking.
  • Coordinate with procurement and legal where needed for tooling, cloud services, and vendor onboarding.
  • Startup mode: hands-on, flexible, comfortable pivoting, and able to unblock teams quickly.
  • Interfaces / stakeholders.
  • Software engineering (platform, backend, DevOps).
  • ML/LLM engineers and applied scientists.
  • Product and delivery teams (PM/PO/BA).
  • Security, IT, procurement, and finance (as applicable).

Qualifications

  • Typically, 5+ years in MLOps/DevOps/Data Platform roles, including production deployments of ML and/or LLM-powered systems. Experience in fast-paced product environments preferred.
  • Tools (examples).
  • ML lifecycle: MLflow / Weights & Biases / equivalent.
  • Serving: FastAPI, Triton (plus), Ray Serve (plus).
  • Orchestration: Airflow/Dagster (plus).
  • Observability: Prometheus/Grafana, OpenTelemetry, ELK.
  • Cloud: AWS/Azure/GCP (or private cloud).
  • KPIs
  • Deployment frequency and lead time for model releases
  • Production stability: incident rate, MTTR, SLO compliance
  • Model quality health: drift detection coverage, evaluation gate pass rate
  • Inference cost and latency improvements
  • Reproducibility and traceability coverage (lineage completeness)

Income/Benefit:

  • Competitive salary package (negotiable based on experience).
  • Opportunity for long-term growth in a leadership role.

Contact Information: 

If you are interested in this position, don't hesitate to send your CVs to: [Confidential Information]

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 148968383