Search by job, company or skills

greennode

Senior Site Reliability Engineer (Database)

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 6 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Description:

  • We are looking for a Senior Site Reliability Engineer (SRE) with deep experience in deploying, operating, and optimizing Enterprise-Grade Database Clusters and Kubernetes (K8S).
  • You will play a key role in ensuring the data infrastructure is stable, high-performing, scalable, and proactively monitored through modern observability systems.

Key Responsibilities:

  • Research, deploy, manage, and optimize database systems (SQL Server, Oracle, MongoDB, MySQL, PostgreSQL, Redis, etc.).
  • Operate, optimize, and scale Kubernetes clusters.
  • Set up and manage monitoring & alerting systems such as Prometheus, Alertmanager, Grafana, ELK, etc.
  • Define and fine-tune metrics, alert thresholds, SLO/SLA, and error budgets for database services and critical infrastructure.
  • Participate in incident response, conduct root cause analysis, and perform post-mortems to improve system reliability.
  • Automate operational processes (backup, failover, scaling, recovery, patching, CI/CD, etc.).
  • Build and standardize runbooks / playbooks / documentation to enable fast and effective response in emergency situations.
  • Collaborate with development teams to improve database/big data products.

Requirements

  • Minimum 3 years of experience working as an SRE/DBA/System Engineer.
  • Proficient in deploying, operating, and optimizing database systems (SQL Server, Oracle, MongoDB, MySQL, PostgreSQL, Redis, etc.) in on-premise or on-cloud environments.
  • Experience in deploying and operating Kubernetes in on-premise or cloud environments (EKS, GKE, AKS).
  • Experience in setting up metrics, alert thresholds, and dashboards for database systems and infrastructure.
  • Ability to be on-call, monitor alerts, and handle or escalate system incidents in a timely manner.
  • Proficient with monitoring & logging tools such as Prometheus, Alertmanager, Grafana, Loki, ELK Stack,...
  • Ability to write automation scripts using Python / Bash / Go.
  • Good knowledge of networking, storage, performance tuning, backup & recovery.
  • Strong system thinking, proactive in identifying and resolving issues.

Preferred Qualifications

  • Experience operating Enterprise-grade database clusters such as SQL Server, Oracle, MongoDB Enterprise is a plus.
  • Experience operating distributed databases or high availability clusters (Patroni, Galera, Sentinel, etc.) is a plus.
  • Experience with big data systems (Kafka, ClickHouse, Elasticsearch, etc.).
  • Relevant certifications such as MCSA/MCSE/Azure Database Administrator, Oracle Database (OCA / OCP), MongoDB Certified DBA / Developer, CKA/CKAD, AWS/GCP Certified or other DB Admin Certifications are a plus.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 146640405

Similar Jobs