* Salário: R$ 2.000 a R$ 5.000 por mês (estimado)

* O valor exibido é uma estimativa calculada com base em dados públicos e referências do mercado. Não garantimos que este seja o salário oferecido para esta vaga específica.

Área: Outros

Nível: Senior

Detalhes da vaga

R$ 100 por hora
Há 2 dias

Qualificações

Ciência da Computação
Ansible
DevOps
Certificação AWS
Banco de Dados
Terraform
New Relic
Engenharia de Sistemas
Angular
Contratos
Desenvolvimento de Software
GitHub
APIs
S3
Apache
Análise de Causa Raíz
Cybersecurity
Liderança
Jenkins
Python
Debug

Descrição completa da vaga

Senior SRE

Location: Argentina, Bolivia, Mexico, Paraguay, Colombia,

Are you looking for a career that makes a positive difference in your life and reimagines learners and educators across the globe? Do you want to work with fun and social people in a positive and engaged virtual office environment? We are hiring a **Senior Site Reliability Engineer **who will build and support reliable, high-capacity, and well-performing systems in support of our mission to protect and improve our customer platforms, with an ever-watchful eye on reliability, security, performance, cost, and operational excellence. As a Sr Site Reliability Engineer, you will collaborate in a DevOps model with product development teams; designing, deploying, and managing automation tools that increase predictability as well as time to market while reducing cost. Our cloud stack includes:

Cloud: AWS ( Cloudfront, S3, EC2, ECS, SES, SQS, SNS, Load Balancing, VPC, Config, Systems Manager, Lambda, API Gateway, DB services many more).

Cloud (OCI cloud know how a plus. ( Exacs,OCI Compute, Load Balancers, Networking, VCN, Object storage)

Infrastructure as Code: Terraform

Programming: Python, Golang, Bash , Ansible

Containers: AWS ECS

Security: Rapid7, WAF

Web: Apache httpd, Apache Tomcat, Angular

Config Management and provisioning: Ansible, Packer

Telemetry: NewRelic, CloudWatch, DataDog

DevSecOps: Artifactory, Jenkins, CircleCI, SonarQube, Jfrog X-Ray, Control Tower, GitHub Enterprise and more

Your contributions

Cloud Engineering

Collaborate with product development teams in a DevOps model, designing, deploying, and managing automation tools to enhance predictability and accelerate time to market

Identify the highest-impact opportunities to optimize existing systems; ensuring “right-sized” solutions in consideration of technical and business constraints

Drive initiatives to enhance system reliability and performance

Ensure repeatability, traceability, and transparency of our infrastructure automation (infrastructure-as-code, monitoring-as-code)

Participate in continual learning of the AWS ecosystem, game day scenarios, and professional conferences

Actively monitor AWS costs, using optimization tools to maximize ROI while meeting Service Level Objectives.

Observability Engineering

Ownership of reliability, uptime, system security, cost, operations, capacity, resiliency, and performance-analysis thereof

Leads initiatives to improve the reliability and stability of applications and platforms using data-driven analytics to improve service levels

Ensure that the architecture and deployment models are adequately designed to meet SLA commitments

Serve as the primary point of contact during major incidents for your application, and demonstrate the ability to identify and resolve issues that trigger on-call alarms.

Maintain and enhance telemetry systems to improve visibility into application performance and business metrics, ensuring operational workloads are effectively managed

Develop, communicate, collaborate, and monitor standard processes to promote the long-term health and sustainability of operational development tasks

DevSecOps

Support healthy software development practices, including complying with agile software development methodology, building standards for code reviews, work packaging, and continuous delivery

Partner with CyberSecurity and develop plans and automation to respond to new risks and vulnerabilities

Resiliency Engineering

Collaborate with dev teams to identify failure points and blast radius of systems

Validate the effectiveness of monitoring and observability configurations

Coordinate failure injection testing

Observe and document steady state production levels, growth patterns

Plan and forecast for seasonal growth, communicate trend lines with leadership, enhance infrastructure scaling plans to accommodate 2x planned load

Coordinate improvements of existing software and infrastructure to meet resiliency goals

Mentor and nurture engineers across varying levels of experience; foster growth by setting high-reaching goals, and providing support to achieve them.

Ability to expand and collaborate across different levels and stakeholder groups.

Documents and shares knowledge within the organization via internal forums and communities of practice.

Good to have Kubernetes experience, EKS or managed their own Kubernetes clusters

Must have used terraform to create infrastructure within AWS. Must bring an automation-first mindset to the team.

On-call participation required. Person will lead triage bridges when necessary

Will be expected to monitoring customer experience, application metrics like golden signals/KPIs and infrastructure health.

Needs to work proactively across team boundaries on a daily basis.

Qualifications

Experience as a software engineer, with practical experience developing, debugging, and deploying enterprise applications

Experience with infrastructure automation technologies, preferably Terraform

Experience in container/container-fleet-orchestration technologies, preferably EKS or ECS

Versatility with troubleshooting diverse sets of hosting technologies: web server platforms, application platforms, operating systems, network components, virtualization technologies, storage, and database platforms.

Experience with continuous-deployment based software development lifecycles (e.g. CI/CD)

Experience with application caching strategies and high concurrency workloads

Strong communication, problem solving, root cause analysis and systems engineering skills

Ability to design and manage escalation response plans from monitoring, react, respond, remediate and retrospect in culturally aligned (proactive, customer focused, collaborative, data-driven) ways.

Demonstrated expertise building and managing highly scaled production infrastructure in the cloud

BS Degree in Computer Science (or related technical field and/or equivalent industry experience)

Job Type: Contract
Contract length: 12 months

Pay: R$100.00 per hour

Expected hours: 8 per week

Work Location: Remote

Site Reliability Engineer - SRE

Detalhes da vaga

Qualificações

Descrição completa da vaga