Machine Learning Scientist

VinRobotics

Ho Chi Minh, Vietnam

Fresher

This job is no longer accepting applications

Posted 3 months ago

Job Description

Overview

You will design and train next-generation Vision-Language-Action (VLA) models that let humanoid robots understand instructions, perceive complex scenes, and act safely in real industrial environments.

Your focus islearning from limited real-world teleoperation data,and closing thedistribution shift between low-data real demos and rich synthetic worlds. You'll explore new model architectures, training schemes, and loss functions, and combine them withrandomized, high-fidelity simulation and world-modelbased data generation (e.g., Isaac/Omniverse)to build generalizable VLA policies for humanoids in factories and logistics.

You'll work closely with our Teleoperation, RL & Controls, Simulation, and Platform teams to bring these models from research into production robots.

Key Responsibilities

Design and implement VLA architectures for humanoids

Build multi-modal policies that ingest RGB/Depth, language, robot state, and task history to generate actions (pose targets, motion primitives, or low-level controls).
Explore transformers, diffusion-style policies, hierarchical VLA, recurrent memory, and world-modelaugmented controllers.

Learn effectively from scarce, noisy teleoperation data

Work with the teleop team to define data schemas, logging, and dataset curation from real humanoid operators.
Develop training strategies for low-data regimes: strong augmentations, self-/semi-supervised pretraining, contrastive objectives, multi-task learning, and behavior cloning / offline RL hybrids.
Propose loss designs and regularizers (e.g., action smoothness, safety margins, temporal consistency, language-grounding consistency) to mitigate overfitting and distribution shift.

Tackle distribution shift between real-world demos and simulation / synthetic data

Design domain randomization and sim parameter sampling (lighting, materials, sensor noise, robot dynamics, task layouts, human styles) to cover real-world variation.
Set up pipelines where VLA policies are trained jointly on real teleop demos and large synthetic datasets.
Analyze failure modes (out-of-distribution visual scenes, unseen language instructions, contact edge cases) and iteratively refine data, models, and objectives.

Build synthetic and simulated data pipelines (Isaac / Omniverse / Cosmos)

Configure high-fidelity humanoid simulation environments (manipulation cells, factory workcells, shared spaces with humans).
Integrate or prototype workflows thatuse world foundation models (e.g., NVIDIA Cosmos Predict/Transfer/Reason)to generate diverse video and interaction data for downstream VLA training and evaluation.
Automate large-scale curriculum & scenario generation (edge cases, rare events, long-horizon tasks).

Evaluation, benchmarking, and deployment support

Define metrics and test suites: task success, safety violations, instruction following, sim-to-real gap, robustness to visual/language perturbations.
Run structured ablations (architecture data mix losses) and communicate findings with clear plots, reports, and logs.
Collaborate with RL/Controls and Platform teams to integrate VLA policies into the humanoid stack and run on real robots under safety constraints.

Required Qualifications

Core skills

Strong background indeep learning for sequence / multimodal modeling(e.g., transformers, diffusion models, recurrent architectures, latent world models).
Hands-on experience building and trainingvision-language or VLA-style models(e.g., VLMs, embodied LLMs, policy networks conditioned on language).
Solid understanding of at least one of:
Imitation learning / behavior cloning
Offline / batch RL
Inverse RL or preference-based learning
Proven ability to work inlow-data regimes: data augmentation, self-supervised representation learning, regularization, careful validation design.
Experience withrobot learning from demonstrationor teleoperation data (any platform; humanoids is a plus).
Strong engineering skills inPythonand modern ML frameworks (PyTorch preferred; JAX/TF is a plus), including:
Writing clean training loops and data pipelines
Profiling and debugging training/inference
Managing experiments at scale (config systems, logging, basic MLOps)

General

Bachelor's/Master's/Ph.D. in Computer Science, Robotics, EE, or related field; or equivalent industry experience.
Ability to work cross-functionally with controls, hardware, and teleoperation teams.

Preferred Qualifications

Experience withNVIDIA physical-AI stacks: Isaac (Sim/Lab), Omniverse, orNVIDIA Cosmos world foundation modelsfor synthetic data generation and sim-to-real workflows. Comfortable designing synthetic datasets: specifying scenario distributions, parameter ranges, and validation protocols.
Prior work onhumanoid robots(control, perception, or policy learning) or other complex articulated robots in industrial settings.
Contributions toembodied AI / robot learningresearch: publications, open-source projects, or widely-used codebases.
Familiarity withsafety-critical robotics(safe action constraints, human-in-the-loop supervision, fallbacks).
Experience deploying models onGPU clusters and edge devices(profiling latency, memory usage, batching, mixed precision).