Search by job, company or skills

VinRobotics

Machine Learning Scientist

This job is no longer accepting applications

new job description bg glownew job description bg glownew job description bg svg
  • Posted 3 months ago

Job Description

Overview

You will design and train next-generation Vision-Language-Action (VLA) models that let humanoid robots understand instructions, perceive complex scenes, and act safely in real industrial environments.

Your focus islearning from limited real-world teleoperation data,and closing thedistribution shift between low-data real demos and rich synthetic worlds. You'll explore new model architectures, training schemes, and loss functions, and combine them withrandomized, high-fidelity simulation and world-modelbased data generation (e.g., Isaac/Omniverse)to build generalizable VLA policies for humanoids in factories and logistics.

You'll work closely with our Teleoperation, RL & Controls, Simulation, and Platform teams to bring these models from research into production robots.

Key Responsibilities

Design and implement VLA architectures for humanoids

  • Build multi-modal policies that ingest RGB/Depth, language, robot state, and task history to generate actions (pose targets, motion primitives, or low-level controls).
  • Explore transformers, diffusion-style policies, hierarchical VLA, recurrent memory, and world-modelaugmented controllers.

Learn effectively from scarce, noisy teleoperation data

  • Work with the teleop team to define data schemas, logging, and dataset curation from real humanoid operators.
  • Develop training strategies for low-data regimes: strong augmentations, self-/semi-supervised pretraining, contrastive objectives, multi-task learning, and behavior cloning / offline RL hybrids.
  • Propose loss designs and regularizers (e.g., action smoothness, safety margins, temporal consistency, language-grounding consistency) to mitigate overfitting and distribution shift.

Tackle distribution shift between real-world demos and simulation / synthetic data

  • Design domain randomization and sim parameter sampling (lighting, materials, sensor noise, robot dynamics, task layouts, human styles) to cover real-world variation.
  • Set up pipelines where VLA policies are trained jointly on real teleop demos and large synthetic datasets.
  • Analyze failure modes (out-of-distribution visual scenes, unseen language instructions, contact edge cases) and iteratively refine data, models, and objectives.

Build synthetic and simulated data pipelines (Isaac / Omniverse / Cosmos)

  • Configure high-fidelity humanoid simulation environments (manipulation cells, factory workcells, shared spaces with humans).
  • Integrate or prototype workflows thatuse world foundation models (e.g., NVIDIA Cosmos Predict/Transfer/Reason)to generate diverse video and interaction data for downstream VLA training and evaluation.
  • Automate large-scale curriculum & scenario generation (edge cases, rare events, long-horizon tasks).

Evaluation, benchmarking, and deployment support

  • Define metrics and test suites: task success, safety violations, instruction following, sim-to-real gap, robustness to visual/language perturbations.
  • Run structured ablations (architecture data mix losses) and communicate findings with clear plots, reports, and logs.
  • Collaborate with RL/Controls and Platform teams to integrate VLA policies into the humanoid stack and run on real robots under safety constraints.

Required Qualifications

Core skills

  • Strong background indeep learning for sequence / multimodal modeling(e.g., transformers, diffusion models, recurrent architectures, latent world models).
  • Hands-on experience building and trainingvision-language or VLA-style models(e.g., VLMs, embodied LLMs, policy networks conditioned on language).
  • Solid understanding of at least one of:
  • Imitation learning / behavior cloning
  • Offline / batch RL
  • Inverse RL or preference-based learning
  • Proven ability to work inlow-data regimes: data augmentation, self-supervised representation learning, regularization, careful validation design.
  • Experience withrobot learning from demonstrationor teleoperation data (any platform; humanoids is a plus).
  • Strong engineering skills inPythonand modern ML frameworks (PyTorch preferred; JAX/TF is a plus), including:
  • Writing clean training loops and data pipelines
  • Profiling and debugging training/inference
  • Managing experiments at scale (config systems, logging, basic MLOps)

General

  • Bachelor's/Master's/Ph.D. in Computer Science, Robotics, EE, or related field; or equivalent industry experience.
  • Ability to work cross-functionally with controls, hardware, and teleoperation teams.

Preferred Qualifications

  • Experience withNVIDIA physical-AI stacks: Isaac (Sim/Lab), Omniverse, orNVIDIA Cosmos world foundation modelsfor synthetic data generation and sim-to-real workflows. Comfortable designing synthetic datasets: specifying scenario distributions, parameter ranges, and validation protocols.
  • Prior work onhumanoid robots(control, perception, or policy learning) or other complex articulated robots in industrial settings.
  • Contributions toembodied AI / robot learningresearch: publications, open-source projects, or widely-used codebases.
  • Familiarity withsafety-critical robotics(safe action constraints, human-in-the-loop supervision, fallbacks).
  • Experience deploying models onGPU clusters and edge devices(profiling latency, memory usage, batching, mixed precision).

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 135039433