Search by job, company or skills

vindynamics

Data Solution Architect

Save
  • Posted 11 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

About Vindynamics

At VinDynamics, we are building a global robotics and AI infrastructure platform — combining humanoid robotics, embodied AI, and large-scale data ecosystems to power the next generation of intelligent machines.

Backed by Vingroup, Vietnam's leading technology conglomerate, VinDynamics is on a mission to accelerate the adoption of robotics worldwide through advanced AI, scalable platforms, and real-world deployment at global scale.

Our vision is to make robots more accessible, intelligent, and commercially scalable — enabling safer, more productive, and more connected lives across industries and everyday environments.

Job Summary

  • The Data Architect / Solution Architect (Data Platform) will be the mastermind behind the technical architecture of our next-generation Data Management and Processing Platform. This role is responsible for designing scalable, highly secure, and cost-effective cloud/hybrid storage systems alongside automated data pipelines capable of handling petabytes ($PB$) of unstructured multimedia data (e.g., massive egocentric video streams, audio, and sensor logs for humanoid robot training).
  • The ideal candidate will bridge the gap between traditional enterprise big data infrastructure and advanced AI engineering. You will architect high-throughput pipelines that leverage Computer Vision, Vision-Language Models (VLMs), and Vision-Language-Action (VLAs) models for automated data pre-processing, semantic scene cutting, and pre-labeling. Concurrently, you will ensure that the underlying data management infrastructure and data governance layers scale in perfect parallel to support this heavy computing throughput, while maintaining rigorous anti-copying security controls.

Key Responsibilities

  • End-to-End Architecture Design: Design and implement the core data platform architecture, including data ingestion, stream/batch processing, petabyte-scale storage, and seamless data delivery layers.
  • AI-Driven Pipeline Integration (Vision/VLM/VLA): Architect and build high-throughput ETL/ELT pipelines that integrate state-of-the-art AI models (Computer Vision, VLMs, VLAs) to automate data pre-processing. This includes automated video curation, filtering out low-quality/redundant frames, semantic scene indexing, and automated pre-labeling of Action Atoms before human QA/QC verification.
  • Parallel Infrastructure Scaling: Design a framework where data processing compute power (GPU/CPU clusters for AI model inference) and data management storage/cataloging infrastructure scale seamlessly in parallel. Ensure zero bottlenecks as data volume expands towards the petabyte scale.
  • Storage Optimization & Cost Management: Optimize cloud storage topologies (primarily AWS S3 and hybrid cloud solutions) to achieve ultra-low storage costs for 2000TB+ environments, utilizing smart lifecycle policies, tiering (Glacier), and efficient indexing.
  • Secure Data Governance & Anti-Copying Solutions: Implement robust Data Governance, Identity and Access Management (IAM), and Confidential Computing frameworks. Design mechanisms (e.g., secure streaming, DRM, or sandboxed environments) that allow cross-functional teams (PO, BA, QA/QC, AI Engineers) to process and validate data without being able to download or make unauthorized local copies.
  • Cross-functional Alignment: Collaborate closely with the Head of Data Acquisition, POs, BAs, and AI Research teams to translate data collection business requirements and client specs into solid, future-proof technical blueprints.

Job Requirements

Relevanteducation andexperience:

  • Education: Bachelor's or Master's degree in Computer Science, Data Science, Artificial Intelligence, Software Engineering, or a related technical field. Professional certifications (e.g., AWS Certified Data Engineer, AWS Solutions Architect Professional, or Google Cloud Professional Data Architect) are highly preferred.
  • Core Technical Experience:
    • Minimum of 5+ years of experience in Data Architecture, System Architecture, or Senior Data Engineering, with a proven track record of building large-scale data platforms.
    • Strong experience with Big Data processing frameworks and ecosystems (such as PySpark, Apache Spark, Hadoop) and Python programming.
    • Hands-on mastery of Cloud Infrastructures (AWS preferred: S3, EC2, IAM, Lambda, Athena, EMR).
  • AI & Multimedia Pipeline Orchestration: Proven experience orchestrating AI inference within distributed data pipelines (deploying Vision models, VLMs, or LLMs at scale using frameworks like Ray, Triton Inference Server, or Kubernetes) to process high-volume unstructured video/audio data.
  • Enterprise Security Background: Direct experience in setting up enterprise-grade data security, access control, and data loss prevention (DLP) frameworks.
Preferred Qualifications

  • Preferred Qualifications: Experience in building data platforms specifically for Autonomous Vehicles, Robotics, Computer Vision AI, or Large Multi-Modal Model (LMM) R&D startups is a massive advantage.
  • Composed & Analytical Mindset: Exceptionally composed and methodical under pressure. Able to systematically diagnose distributed computing bottlenecks, model inference latency, system failures, or security vulnerabilities and deliver structural architectural fixes.
  • Forward-Thinking & Scalability-Obsessed: Always designs systems with tomorrow's scale in mind, focusing on automation, infrastructure-as-code, and eliminating single points of failure through parallel scaling.
  • Collaborative Communicator: Capable of breaking down highly complex architectural and AI concepts into clear, understandable business terms for non-technical stakeholders (Ops, Clients, Managers).

Personality/ Attitude

  • Exceptionally Composed under Complexity & Pressure: Maintains absolute calm, mental clarity, and a methodical approach when dealing with system failures, high-stakes security threats, model inference latencies, or tight deployment deadlines.
  • Parallel & Scalability-Obsessed Mindset: A forward-thinking engineer who naturally rejects short-term, manual patches. Driven by a passion for automation, infrastructure-as-code, and designing parallel systems that eliminate single points of failure.
  • Analytical & Deeply Structural Troubleshooter: Possesses an uncompromising, root-cause-analysis approach to problem-solving. Able to isolate bottlenecks systematically across distributed computing clusters, cloud storage layers, or complex AI models.
  • Collaborative & Articulate Communicator: Exceptional ability to translate highly dense architectural blueprints, data governance risks, and VLM/VLA capabilities into clear, actionable business insights for non-technical stakeholders (Ops teams, Product Owners, and clients).
  • Security-First Integrity: Demonstrates an unyielding commitment to data privacy, intellectual property protection, and secure infrastructure design, ensuring anti-copying frameworks are never compromised for convenience.

Benefits

  • Highly competitive executive salary package tailored for Senior/Lead Architectural roles.
  • Deep technical ownership over a cutting-edge, petabyte-scale AI & Robotics data infrastructure from the ground up.
  • Premium private health insurance, top-tier hardware provisions (including access to high-performance compute resources), and flexible work opportunities.
  • Direct exposure to the most advanced AI and Humanoid Robotics training data paradigms in the region.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 149327171