Caro usuário, habilite o javascript para que esse site funcione corretamente.

Middle DevOps Engineer

* Salário: R$ 3.000 a R$ 6.000 por mês (estimado)

* O valor exibido é uma estimativa calculada com base em dados públicos e referências do mercado. Não garantimos que este seja o salário oferecido para esta vaga específica.

Área: Tecnologia da Informação

Nível: Junior

We are building scalable Kubernetes and Linux infrastructure designed for GPU workloads, efficient scheduling, and repeatable automation at scale. As a Middle DevOps Engineer, you will support and enhance Kubernetes environments with Volcano, operate Linux compute nodes, and deliver automation in Python and Bash within a client-facing team. Apply to help researchers run AI jobs smoothly on reliable compute platforms.

Responsibilities

  • Install, configure, and operate GPU-enabled Kubernetes clusters and standalone Linux compute environments to maintain optimized scheduling and performance
  • Configure and administer Volcano job scheduling, including queue setup, POD execution, GPU allocation, and namespace quota enforcement
  • Manage Kubernetes end to end, covering namespaces, RBAC, resource quotas, and workload isolation approaches
  • Build and maintain Python and Shell automation to streamline job submission, resource provisioning, and system reporting
  • Partner with orchestration, optimization, and observability teams to improve scheduling efficiency, increase capacity utilization, and simplify researcher workflows
  • Track infrastructure health and resource utilization, providing data and feedback for optimization and reporting needs
  • Drive enhancements to infrastructure, tooling, and automation workflows to improve performance, scalability, and usability
  • Support operational processes that ensure a smooth and efficient experience for researchers running diverse AI and computational workloads

Requirements

  • Hands-on background with 2+ years of experience in DevOps or infrastructure engineering within complex, large-scale environments
  • Strong expertise in Kubernetes administration and orchestration, including namespaces, POD scheduling/distribution, PVC, NFS, and resource quota management
  • Practical experience with the Volcano scheduler for GPU job execution, queue configuration, and workload prioritization integrated with Kubernetes
  • Proven ability to operate GPU cluster environments in Kubernetes as well as on standalone Linux compute nodes
  • Advanced Python scripting skills for infrastructure automation, plus proficiency in UNIX Shell scripting such as Bash
  • Strong Linux system administration skills, including troubleshooting, performance tuning, and configuration management
  • Solid understanding of infrastructure automation and orchestration concepts and related tooling
  • Fluent English communication skills (spoken and written) for direct client interaction

Nice to have

  • Knowledge of Helm package management for Kubernetes applications
  • Familiarity with monitoring and observability solutions, particularly Prometheus, Grafana, and Loki
  • Skills in Infrastructure as Code tools such as Terraform
  • Background in multi-cloud Kubernetes environments including Amazon EKS and Google GKE
  • Understanding of Azure Networking including VPN, ExpressRoute, and network security
  • Familiarity with AI-assisted coding tools such as GitHub Copilot, ChatGPT, and Claude
  • Experience with hybrid (cloud and on-premises) scheduling and resource optimization

BUSCAS DE VAGAS SEMELHANTES