* Salário: R$ 2.000 a R$ 5.000 por mês (estimado)
* O valor exibido é uma estimativa calculada com base em dados públicos e referências do mercado. Não garantimos que este seja o salário oferecido para esta vaga específica.
Área: Outros
Nível: Senior
Detalhes da vaga
- R$ 100 por hora
- Há 2 dias
Qualificações
- Ciência da Computação
- Ansible
- DevOps
- Certificação AWS
- Banco de Dados
- Terraform
- New Relic
- Engenharia de Sistemas
- Angular
- Contratos
- Desenvolvimento de Software
- GitHub
- APIs
- S3
- Apache
- Análise de Causa Raíz
- Cybersecurity
- Liderança
- Jenkins
- Python
- Debug
Descrição completa da vaga
Senior SRE
Location: Argentina, Bolivia, Mexico, Paraguay, Colombia,
Are you looking for a career that makes a positive difference in your life and reimagines learners and educators across the globe? Do you want to work with fun and social people in a positive and engaged virtual office environment? We are hiring a **Senior Site Reliability Engineer **who will build and support reliable, high-capacity, and well-performing systems in support of our mission to protect and improve our customer platforms, with an ever-watchful eye on reliability, security, performance, cost, and operational excellence. As a Sr Site Reliability Engineer, you will collaborate in a DevOps model with product development teams; designing, deploying, and managing automation tools that increase predictability as well as time to market while reducing cost. Our cloud stack includes:
- Cloud: AWS ( Cloudfront, S3, EC2, ECS, SES, SQS, SNS, Load Balancing, VPC, Config, Systems Manager, Lambda, API Gateway, DB services many more).
- Cloud (OCI cloud know how a plus. ( Exacs,OCI Compute, Load Balancers, Networking, VCN, Object storage)
- Infrastructure as Code: Terraform
- Programming: Python, Golang, Bash , Ansible
- Containers: AWS ECS
- Security: Rapid7, WAF
- Web: Apache httpd, Apache Tomcat, Angular
- Config Management and provisioning: Ansible, Packer
- Telemetry: NewRelic, CloudWatch, DataDog
- DevSecOps: Artifactory, Jenkins, CircleCI, SonarQube, Jfrog X-Ray, Control Tower, GitHub Enterprise and more
Your contributions
- Cloud Engineering
- Collaborate with product development teams in a DevOps model, designing, deploying, and managing automation tools to enhance predictability and accelerate time to market
- Identify the highest-impact opportunities to optimize existing systems; ensuring “right-sized” solutions in consideration of technical and business constraints
- Drive initiatives to enhance system reliability and performance
- Ensure repeatability, traceability, and transparency of our infrastructure automation (infrastructure-as-code, monitoring-as-code)
- Participate in continual learning of the AWS ecosystem, game day scenarios, and professional conferences
- Actively monitor AWS costs, using optimization tools to maximize ROI while meeting Service Level Objectives.
- Observability Engineering
- Ownership of reliability, uptime, system security, cost, operations, capacity, resiliency, and performance-analysis thereof
- Leads initiatives to improve the reliability and stability of applications and platforms using data-driven analytics to improve service levels
- Ensure that the architecture and deployment models are adequately designed to meet SLA commitments
- Serve as the primary point of contact during major incidents for your application, and demonstrate the ability to identify and resolve issues that trigger on-call alarms.
- Maintain and enhance telemetry systems to improve visibility into application performance and business metrics, ensuring operational workloads are effectively managed
- Develop, communicate, collaborate, and monitor standard processes to promote the long-term health and sustainability of operational development tasks
- DevSecOps
- Support healthy software development practices, including complying with agile software development methodology, building standards for code reviews, work packaging, and continuous delivery
- Partner with CyberSecurity and develop plans and automation to respond to new risks and vulnerabilities
- Resiliency Engineering
- Collaborate with dev teams to identify failure points and blast radius of systems
- Validate the effectiveness of monitoring and observability configurations
- Coordinate failure injection testing
- Observe and document steady state production levels, growth patterns
- Plan and forecast for seasonal growth, communicate trend lines with leadership, enhance infrastructure scaling plans to accommodate 2x planned load
- Coordinate improvements of existing software and infrastructure to meet resiliency goals
- Mentor and nurture engineers across varying levels of experience; foster growth by setting high-reaching goals, and providing support to achieve them.
- Ability to expand and collaborate across different levels and stakeholder groups.
- Documents and shares knowledge within the organization via internal forums and communities of practice.
- Good to have Kubernetes experience, EKS or managed their own Kubernetes clusters
- Must have used terraform to create infrastructure within AWS. Must bring an automation-first mindset to the team.
- On-call participation required. Person will lead triage bridges when necessary
- Will be expected to monitoring customer experience, application metrics like golden signals/KPIs and infrastructure health.
- Needs to work proactively across team boundaries on a daily basis.
Qualifications
- Experience as a software engineer, with practical experience developing, debugging, and deploying enterprise applications
- Experience with infrastructure automation technologies, preferably Terraform
- Experience in container/container-fleet-orchestration technologies, preferably EKS or ECS
- Versatility with troubleshooting diverse sets of hosting technologies: web server platforms, application platforms, operating systems, network components, virtualization technologies, storage, and database platforms.
- Experience with continuous-deployment based software development lifecycles (e.g. CI/CD)
- Experience with application caching strategies and high concurrency workloads
- Strong communication, problem solving, root cause analysis and systems engineering skills
- Ability to design and manage escalation response plans from monitoring, react, respond, remediate and retrospect in culturally aligned (proactive, customer focused, collaborative, data-driven) ways.
- Demonstrated expertise building and managing highly scaled production infrastructure in the cloud
- BS Degree in Computer Science (or related technical field and/or equivalent industry experience)
Job Type: Contract
Contract length: 12 months
Pay: R$100.00 per hour
Expected hours: 8 per week
Work Location: Remote
