Search by job, company or skills

greennode

Senior System Engineer (OpenStack/ VMware / SDN)

Save
new job description bg glownew job description bg glow
  • Posted 11 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Summary:

The Senior System Engineer position is responsible for operating, troubleshooting, and optimizing large-scale cloud systems based on the OpenStack platform, with a strong focus on networking, SDN data plane, kernel interaction, container/runtime behavior, automation, and system performance analysis.

Key Responsibilities:

  • Operate and troubleshoot components of OpenStack (such as Neutron, Nova, LB) or equivalent cloud platforms, focusing on tenant networking, routing, NAT, security groups, and production issue resolution.
  • Analyze end-to-end packet flows, debug connectivity issues, packet loss, high latency, or unstable system behavior using tools such as tcpdump, iproute2, flow inspection, logs, and system traces.
  • Work with SDN or virtual networking technologies such as Open vSwitch (OVS), Open Virtual Network (OVN), Tungsten Fabric/Contrail, VMware NSX, or equivalent solutions; possess a strong understanding of overlay networking models such as VXLAN, MPLS, and EVPN.
  • Preferred additional experience includes: OpenStack Neutron, Tungsten Fabric/Contrail, EVPN/MPLS, VPN/IPSec, kernel tuning, Docker/containerd internals, or high PPS processing systems.
  • Investigate performance bottlenecks, including PPS limitations, CPU saturation, NIC offload behavior, MTU mismatch, RSS, NUMA/CPU pinning, kernel network stack behavior, and feature compatibility across operating systems, kernels, drivers, and platform versions.
  • Debug system-level issues related to the Linux kernel, Docker/container runtime behavior, differences between cgroup v1/v2, kernel modules, driver interactions, and feature mismatches across different distributions or kernel versions.
  • Build or use automation to collect logs, inspect system configurations, validate runtime states, compare configurations across nodes, and support large-scale operational standardization using tools such as Ansible combined with shell or Python scripts.
  • Handle production incidents, conduct root cause analysis, and coordinate with monitoring/logging systems to identify systemic issues and prevent recurrence.

Requirements:

  • Strong Linux system troubleshooting skills, with a solid understanding of kernel interactions with system internals, networking stack, process/resource management mechanisms, and container runtime behavior such as Docker or containerd.
  • Strong networking fundamentals, including TCP/IP, routing, NAT, and L2/L3 operations in virtualization and overlay network environments.
  • Hands-on experience with virtual networking, SDN, or cloud networking platforms such as OpenStack, Kubernetes networking, VMware, or equivalent systems.
  • Ability to debug issues using packet-level and system-level tools, rather than relying solely on configurations, management interfaces, or vendor documentation.
  • Experience using automation/configuration management tools such as Ansible to collect logs, inspect system parameters, validate configuration consistency, and safely deploy operational changes across multiple nodes.
  • Strong programming mindset, with the ability to read, review, and troubleshoot code or logic in Python, Go, Shell, or C/C++, and analyze root causes beyond standard runbooks.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 148596379