Caro usuário, habilite o javascript para que esse site funcione corretamente.

Senior DevOps Engineer

* Salário: R$ 11.000 a R$ 20.000 por mês (estimado)

* O valor exibido é uma estimativa calculada com base em dados públicos e referências do mercado. Não garantimos que este seja o salário oferecido para esta vaga específica.

Área: Tecnologia da Informação

Nível: Senior

We are building scalable, GPU-ready Kubernetes platforms for AI and research workloads, focusing on reliable orchestration and performance. As a Senior DevOps Engineer, you will operate Kubernetes and Linux compute environments, run Volcano scheduling, and automate workflows with Python and UNIX shell scripting in a client-facing delivery setup. Apply now to help deliver efficient compute at scale

Responsibilities

  • Deploy, configure, and sustain GPU-enabled Kubernetes clusters and standalone Linux compute environments to maximize scheduling efficiency and performance
  • Implement and operate Volcano job scheduling, including queue setup, POD execution, GPU allocation, and namespace quota enforcement
  • Administer Kubernetes end-to-end, covering namespaces, RBAC, resource quotas, and workload isolation approaches
  • Create and maintain Python and Shell automation to simplify job submission, resource provisioning, and system reporting
  • Collaborate with orchestration, optimization, and observability teams to improve scheduling efficiency, capacity utilization, and researcher workflows
  • Monitor platform health and resource utilization, sharing data and feedback to support optimization and reporting needs
  • Recommend and drive enhancements to infrastructure, tooling, and automation workflows to improve performance, scalability, and usability
  • Ensure operations provide a smooth and efficient experience for researchers across diverse AI and computational workloads

Requirements

  • Minimum 3 years of experience in DevOps or infrastructure engineering roles within complex, large-scale environments
  • Expert-level Kubernetes administration knowledge, including namespaces, POD scheduling/distribution, PVC, NFS, and resource quota management
  • Hands-on experience with Volcano scheduler for GPU job execution, queue configuration, workload prioritization, and Kubernetes integration
  • Demonstrated experience running GPU cluster environments in Kubernetes and on standalone Linux compute nodes
  • Advanced Python scripting skills for infrastructure automation, plus proficiency in UNIX Shell scripting (e.g., Bash)
  • Strong Linux system administration capability, including troubleshooting, performance tuning, and configuration management
  • Solid understanding of infrastructure automation and orchestration concepts and supporting tooling
  • Fluent English communication skills (spoken and written) for direct client interaction

Nice to have

  • Helm for Kubernetes application packaging and releases
  • Monitoring and observability tooling, especially Prometheus, Grafana, and Loki
  • Infrastructure as Code tools such as Terraform
  • Multi-cloud Kubernetes exposure (Amazon EKS, Google GKE)
  • Azure Networking knowledge including VPN, ExpressRoute, and network security
  • Familiarity with AI-assisted coding tools (e.g., GitHub Copilot, ChatGPT, Claude)
  • Experience with hybrid (cloud + on-premises) scheduling and resource optimization