Job Description:
Infrastructure Automation & CI/CD
- Design, implement, and maintain CI/CD pipelines for chatbot and AI services.
- Automate environment provisioning using tools like Terraform, Ansible, or Pulumi.
- Integrate testing and deployment workflows to support agile delivery cycles.
Cloud Infrastructure Management
- Build and manage infrastructure on cloud platforms AWS, tailored for AI workloads.
- Implement secure and scalable architectures for real-time chatbot interactions.
Monitoring, Logging & Incident Management
- Setup Logging Centralized using EFK (ElasticSearch, Fluentbit, Kibana)
- Set up monitoring tools (Prometheus, Grafana, ELK, or Datadog) for proactive alerting.
- Define and enforce SLOs/SLAs for chatbot uptime and response time.
- Lead incident response and root cause analysis for system failures.
Security & Compliance
- Ensure best practices in infrastructure security (IAM, VPC, secrets management).
- Support compliance efforts for data protection (GDPR, SOC2) in chatbot data pipelines.
- Perform ad-hoc DevOps tasks as required, including emergency patches, incident support, or rapid deployment of security updates.
AI Deployment Model
- Collaborate with teams to containerize and deploy NLP models (e.g., with Docker, Kubernetes).
- Manage GPU/TPU workloads, including dynamic scaling and resource optimization.
- Monitor model inference performance and latency across staging and production environments.
- Optimize cost, compute, and storage strategies for high-volume inference and training.
Key requirements for this position include:
- At least 5 year experience in Network/ System Engineer position;
- Bachelor&aposs degree in computer science, Information Technology or other technical field preferred from TOP UNIVERSITY specializing in Information Technology
- Security concepts related to DNS, routing, authentication, VPN, proxy services and DDOS mitigation technologies.
- Having experience in design/implementing networks is required. HA pattern is a big advantage.
- Have knowledge and experience in cloud AWS (VPC, EC2, EKS, RDS, MSK, OPENSEARCH, ELASTICACHE, SES...)
- Have experience with EKS, K8s, and the ability to write helm charts.
- Have experience with databases MySQL, PostgreSQL.
- Have experience hardening OS and troubleshooting.
- Have experience with Linux as Centos, Ubuntu.
- Have experience with ActiveMQ, Redis, and Memcache.
- Have experience in monitoring, and logging alerting tools.
- Have experience with CI/CD tools such as Jenkins and Gitlab.
- Have experience with API Gateway and Load balancing
- Familiar with configuration and operating Nginx/Nginx Ingress/Apache.
Plus:
- Experience supporting LLM/chatbot-based products in production.
- Having knowledge of GCP or Azure
- Having experience with Terraform/Terragrunt and Ansible
- Having knowledge and experience ElasticSearch, and Kafka
- Having knowledge and experience with postfix, FTP servers, and other services
- Having knowledge about security, checking vulnerability and fix/update OS and application