Working location: District 2, Thu Duc City
Salary range:Up to USD 1,600 Gross
RESPONSIBILITIES
The Technical Platform Operations Specialist will ensure the platform&aposs operational excellence, reliability, and scalability. Reporting to the CTO, this is a hands-on role involving monitoring, incident resolution, and client engagement. In the short term, the role does not involve direct team management but will require strong cross-functional coordination.
This role coordinates with Product Owners and developers, who separately report to the CTO for project-related tasks.
Platform Operations & Maintenance:
- Monitor and manage platform services to ensure uptime and performance SLAs are met.
- Troubleshoot and resolve platform incidents to maintain seamless operations.
- Implement monitoring tools and dashboards for real-time observability.
Incident & SLA Management:
- Own the incident management process, ensuring rapid issue resolution.
- Develop and enforce SLA compliance, tracking key service metrics.
- Lead post-incident reviews, ensuring corrective actions are implemented.
Crisis & Client Communication:
- Serve as the primary technical contact for platform-related escalations.
- Communicate effectively with internal teams, leadership, and external clients.
- Provide clear, structured updates during crises.
Security & Compliance:
- Conduct regular security audits and vulnerability management.
- Implement preventative security measures to mitigate operational risks.
Platform Upgrades & Optimization:
- Plan and execute platform upgrades with minimal service disruption.
- Continuously optimize performance, scalability, and infrastructure resilience.
Database & Cloud Infrastructure Management:
- Administer and optimize MS SQL Server databases for scalability and reliability.
- Manage AWS and Kubernetes-based microservices infrastructure.
REQUIREMENTS
- Bachelor&aposs degree in Computer Science, Information Technology, or a related field.
- Minimum 5 years of hands-on experience in platform operations, incident management, and SLA governance.
- Proven ability to lead incident resolution and conduct structured post-mortems.
- Advanced MS SQL Server expertise (query optimization, database scaling).
- Experience managing AWS cloud services and Kubernetes-based microservices.
- Familiarity with monitoring tools (e.g., Prometheus, Datadog, New Relic).
- Strong security and compliance knowledge, including best practices.
- Excellent communication skills-ability to explain technical issues to both technical and business stakeholders.
- Comfortable in client-facing engagements.
Behavioral Competencies:
- Accountability: Owns platform health and incident resolution end-to-end.
- Problem-Solving: Ability to troubleshoot and resolve platform issues efficiently.
- Adaptability: Comfortable working in a dynamic, fast-paced SaaS environment.
- Strong Communication: Can translate technical issues into clear, actionable insights for stakeholders.
*** Nice to have
- Exposure to fintech or SaaS platforms.
- Certifications in AWS, Kubernetes, or MS SQL Server.
- Familiarity with CI/CD pipelines and DevSecOps practices.