The Backend & DevOps Engineer will be responsible for the following:
- Setup SLOs, monitoring practices. Enhance stability of the system by means of database optimization, and system optimization in terms of performance, scalability and fault tolerance.
- Lead backend stability and performance efforts, particularly in high-transaction environments.
- Monitor system health to proactively identify and resolve issues, especially during peak usage periods.
- Provide on-call support, mitigating production issues like crashes, errors, and outages in real time.
- Analyze logs, error reports, and monitoring data to detect and address potential system problems before they escalate.
- Collaborate with cross-functional teams to ensure smooth, automated CI/CD pipelines and effective incident management.
- Implement system optimizations to improve uptime, performance, and scalability.
- Establish and maintain robust monitoring and alerting systems for rapid issue detection and response.
- Work closely with development, infrastructure, and operations teams to resolve technical issues affecting backend performance.
- Document system changes, incident reports, and troubleshooting guidelines to enhance operational workflows and knowledge sharing.
The Backend and DevOps Engineer is expected to have the following qualifications and skills:
- 5+ years of experience in backend engineering or DevOps roles, preferably in high-transaction, production-grade environments.
- Expertise in backend development with Supabase, Firebase or similar platforms.
- Strong proficiency with Google Cloud Platform (GCP) services, including monitoring, scaling, and performance optimization. Should be experienced with queue systems like GCP Pub/Sub and GCP Cloud Task
- Optimize database queries, ensuring data consistency and low-latency access, working with databases like PostgreSQL, MongoDB, etc.
- Hands-on experience with DevOps tools, particularly CircleCI for CI/CD pipelines and GitHub for version control.
- Strong TypeScript. Should also know how to optimize javascript code. Async operations, error handling, etc.
- Frontend (Optional): React TypeScript. For optimization you should know memoization concepts, state management, etc.
- Proven experience in system monitoring, incident management, and large-scale system optimization.
- Excellent troubleshooting skills and the ability to perform root cause analysis in complex technical environments.
- Maintain observability through monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK).
Working time: 4pm-1am (VNT)