Responsibilities
- Ensure Platform Stability: Monitor infrastructure and customer environments, analyze alerts, and act quickly to prevent downtime
- Lead Incident Response: Own critical incidents, troubleshoot complex issues, and coordinate with System Admins, Developers, and Security teams
- Drive Continuous Improvement: Perform root cause analysis, suggest process enhancements, and contribute to post-mortem reviews
- Mentor & Grow Teams: Support junior colleagues, share expertise, and help build a strong, autonomous team
- Maintain Knowledge & Documentation: Create, update, and consolidate internal knowledge articles and operational procedures
- Support Releases & Patches: Track system updates, monitor impacts, and ensure smooth patch deployment
- Automation & Process Improvement: Develop and implement automation tools and scripts to streamline system administration tasks. Identify opportunities for process improvements and lead initiatives to enhance efficiency
Requirements
- 3+ years in technical support/system or Database administration, preferably 24/7 operations
- Hands-on with Linux, MySQL, Apache/Nginx, Bash, Python, Proxmox Bonus: Ansible, CI/CD
- Strong problem-solving, analytical skills, and clear communication
- Proven expertise in incident management, automation, and monitoring systems
- Strong understanding of data management, backup/recovery strategies, and security protocols
- Experience coordinating with teams and owning technical decision-making during critical situations
Benefits
Working location:Remote full-time (shift-based schedule, Mauritius time)
Salary range:Middle: Up to USD 1,000 Gross
Senior: Up to USD 1,400 Gross