* Salário: R$ 3.000 a R$ 6.000 por mês (estimado)
* O valor exibido é uma estimativa calculada com base em dados públicos e referências do mercado. Não garantimos que este seja o salário oferecido para esta vaga específica.
Área: Tecnologia da Informação
Nível: Junior
We are expanding our delivery team with a Middle DevOps Engineer focused on reliable Kubernetes and Linux platforms for AI and research workloads.
You will help automate and optimize GPU-enabled orchestration with Kubernetes and Volcano, supporting scheduling, quotas, and scripting in Python and Shell in a client-facing environment. Apply to help build efficient, scalable compute environments
Responsibilities
- Deploy and operate GPU-enabled Kubernetes clusters and standalone Linux compute environments to keep scheduling and performance efficient
- Implement and support Volcano job scheduling, including queue setup, POD execution, GPU allocation, and namespace quota enforcement
- Administer Kubernetes environments end-to-end, covering namespaces, RBAC, resource quotas, and workload isolation approaches
- Build and maintain Python and Shell automation to simplify job submission, resource provisioning, and system reporting
- Collaborate with orchestration, optimization, and observability teams to raise scheduling efficiency, capacity utilization, and researcher workflows
- Monitor platform health and resource usage, sharing data and feedback to meet optimization and reporting needs
- Recommend improvements to infrastructure, tooling, and automation workflows to boost performance, scalability, and usability
- Ensure operations provide a smooth and effective experience for researchers running diverse AI and computational workloads
Requirements
- Hands-on experience with 2+ years in DevOps or infrastructure engineering roles supporting complex, large-scale environments
- Expert-level knowledge of Kubernetes administration and orchestration, including namespaces, POD scheduling/distribution, PVC, NFS, and resource quota management
- Practical experience with Volcano scheduler for GPU job execution, queue configuration, workload prioritization, and Kubernetes integration
- Proven background managing GPU cluster environments in Kubernetes and on standalone Linux compute nodes
- Advanced scripting skills in Python for infrastructure automation plus proficiency with UNIX Shell scripting (e.g., Bash)
- Strong Linux system administration capability, including troubleshooting, performance tuning, and configuration management
- Solid understanding of infrastructure automation and orchestration concepts and related tooling
- Fluent English communication skills (spoken and written) for direct client interaction
Nice to have
- Helm for Kubernetes application package management
- Monitoring and observability tooling, especially Prometheus, Grafana, and Loki
- Infrastructure as Code tools such as Terraform
- Multi-cloud Kubernetes exposure, including Amazon EKS and Google GKE
- Azure Networking knowledge, including VPN, ExpressRoute, and network security
- Familiarity with AI-assisted coding tools (e.g., GitHub Copilot, ChatGPT, Claude)
- Experience with hybrid (cloud + on-premises) scheduling and resource optimization
