Senior System Engineer (OpenStack/ VMware / SDN)

greennode

Ho Chi Minh, Vietnam

Fresher

Save

Posted 11 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Job Summary:

The Senior System Engineer position is responsible for operating, troubleshooting, and optimizing large-scale cloud systems based on the OpenStack platform, with a strong focus on networking, SDN data plane, kernel interaction, container/runtime behavior, automation, and system performance analysis.

Key Responsibilities:

Operate and troubleshoot components of OpenStack (such as Neutron, Nova, LB) or equivalent cloud platforms, focusing on tenant networking, routing, NAT, security groups, and production issue resolution.
Analyze end-to-end packet flows, debug connectivity issues, packet loss, high latency, or unstable system behavior using tools such as tcpdump, iproute2, flow inspection, logs, and system traces.
Work with SDN or virtual networking technologies such as Open vSwitch (OVS), Open Virtual Network (OVN), Tungsten Fabric/Contrail, VMware NSX, or equivalent solutions; possess a strong understanding of overlay networking models such as VXLAN, MPLS, and EVPN.
Preferred additional experience includes: OpenStack Neutron, Tungsten Fabric/Contrail, EVPN/MPLS, VPN/IPSec, kernel tuning, Docker/containerd internals, or high PPS processing systems.
Investigate performance bottlenecks, including PPS limitations, CPU saturation, NIC offload behavior, MTU mismatch, RSS, NUMA/CPU pinning, kernel network stack behavior, and feature compatibility across operating systems, kernels, drivers, and platform versions.
Debug system-level issues related to the Linux kernel, Docker/container runtime behavior, differences between cgroup v1/v2, kernel modules, driver interactions, and feature mismatches across different distributions or kernel versions.
Build or use automation to collect logs, inspect system configurations, validate runtime states, compare configurations across nodes, and support large-scale operational standardization using tools such as Ansible combined with shell or Python scripts.
Handle production incidents, conduct root cause analysis, and coordinate with monitoring/logging systems to identify systemic issues and prevent recurrence.

Requirements:

Strong Linux system troubleshooting skills, with a solid understanding of kernel interactions with system internals, networking stack, process/resource management mechanisms, and container runtime behavior such as Docker or containerd.
Strong networking fundamentals, including TCP/IP, routing, NAT, and L2/L3 operations in virtualization and overlay network environments.
Hands-on experience with virtual networking, SDN, or cloud networking platforms such as OpenStack, Kubernetes networking, VMware, or equivalent systems.
Ability to debug issues using packet-level and system-level tools, rather than relying solely on configurations, management interfaces, or vendor documentation.
Experience using automation/configuration management tools such as Ansible to collect logs, inspect system parameters, validate configuration consistency, and safely deploy operational changes across multiple nodes.
Strong programming mindset, with the ability to read, review, and troubleshoot code or logic in Python, Go, Shell, or C/C++, and analyze root causes beyond standard runbooks.