Caro usuário, habilite o javascript para que esse site funcione corretamente.

Senior DevOps Engineer

Salário: R$ 11.000 a R$ 20.000 por mês

Área: Tecnologia da Informação

Nível: Senior

We are delivering scalable Kubernetes and Linux compute foundations for GPU-heavy workloads, and a Senior DevOps Engineer will help keep them reliable and fast. You will manage Kubernetes and Volcano scheduling, enforce quotas, and automate workflows using Python and UNIX Shell scripting in a client-facing delivery setup. Apply now to join the team

Responsibilities

  • Build, configure, and operate GPU-enabled Kubernetes clusters and standalone Linux compute environments to maximize workload scheduling and performance
  • Run Volcano scheduling end-to-end, including queue creation, POD execution, GPU assignment, and enforcing namespace quotas
  • Manage Kubernetes environments comprehensively, including namespaces, RBAC, resource quotas, and workload isolation approaches
  • Create and support automation scripts in Python and Shell to streamline job submission, provisioning, and reporting
  • Partner with orchestration, optimization, and observability teams to improve scheduling efficiency, capacity utilization, and researcher workflows
  • Track infrastructure health and resource utilization, and provide data to support optimization and reporting needs
  • Recommend and drive enhancements to infrastructure, tooling, and automation workflows to improve performance, scalability, and usability
  • Maintain operational processes that enable a seamless and efficient researcher experience across AI and computational workloads

Requirements

  • Minimum 3 years of experience in DevOps or infrastructure engineering roles within complex, large-scale environments
  • Deep expertise in Kubernetes administration and orchestration, including namespaces, POD scheduling/distribution, PVC, NFS, and resource quota management
  • Practical experience using Volcano for GPU job execution, queue configuration, and workload prioritization integrated with Kubernetes
  • Demonstrated experience running GPU cluster environments in Kubernetes and on standalone Linux compute nodes
  • Advanced skills in Python scripting for infrastructure automation and strong UNIX Shell scripting such as Bash
  • Strong Linux administration knowledge, including troubleshooting, performance tuning, and configuration management
  • Good command of infrastructure automation and orchestration concepts and related tooling
  • Fluent English communication skills (spoken and written) to work directly with clients

Nice to have

  • Working knowledge of Helm for Kubernetes application packaging
  • Experience with observability tooling such as Prometheus, Grafana and Loki
  • Exposure to Infrastructure as Code tooling, including Terraform
  • Familiarity with multi-cloud Kubernetes options such as Amazon EKS and Google GKE
  • Knowledge of Azure Networking, including VPN, ExpressRoute and network security
  • Comfort with AI-assisted coding tools like GitHub Copilot, ChatGPT and Claude
  • Understanding of hybrid (cloud and on-premises) scheduling and resource optimization