Summary/Mission:
Perform tasks in all phases of the development cycle with little or none technical supervision. Appropriately assess problematic situations to gain adequate understanding of problems involved and assume the responsibility of delivering complex tasks on time and in scope within the team’s plan.
Responsibilities:
Work with the team to design for the performance, capacity and high availability of infrastructure and services Participate in problem resolution activities; Troubleshoot issues across the entire stack - software, database and infrastructure.
Diagnose and troubleshoot complex distributed systems handling large volumes of data and develop solutions that have a significant impact at scale.
Participate in building advanced tooling for testing, monitoring, administration and operations of multiple clusters across multiple geographically distributed data centers
Develop innovative ways to smartly measure, monitor & report application and infrastructure health
Experience improving the performance of micro-services and solve scaling/performance issues
Define and Monitor SLI/SLO Error Budgets
Drive efficiencies in systems and processes: capacity planning, configuration management, performance tuning, monitoring and root cause analysis.
Requisitos
Requirements / Experience:
Creative when solving problems and continuously seeking improvements for processes and solutions facilitate knowledge sharing by creating and maintaining comprehensive documentation & diagrams
Write high quality code to deliver automated solutions across the entire stack.
Translate a passion for improvement into design & roadmap contributions, despite existing technical challenges
Partner with the Engineering community to establish metrics, review & sign off on changes and introduce new services and schema changes
Strong team player with a high degree of self-motivation
Ability to learn new systems & manage additional technical resources to meet the project requirements
Collaborate with development teams on best practices and infrastructure planning activities with a focus on reliability, performance and security
3+ years of hands-on experience with cloud computing - including infrastructure, storage, platforms and data management, preferably in AWS.
Experience with container orchestration technologies, like Docker & Kubernetes
Hands-on experience on AWS Elastic Kubernetes Service.
Hands-on experience with Github Actions.
Preferred Qualifications
BS degree in computer science or proven software engineering capability
Experience with traditional enterprise data-center technologies, including compute, storage appliances, virtual machines, and networking
Experience managing Databases: MySQL, MariaDB, SQL Server, or PostgreSQL
Experience working with scalable networking technologies such as Load Balancers/Firewalls and web standards (REST APIs,, web security mechanisms, OWASP top 10).
Broader Integration and management experience of DevOps ecosystems and related
Deployment/orchestration tools such as Helm, Terraform, Gitlab CI/CD, Jenkins, Artifactory
3+ years of experience in Linux Systems and general programming/scripting (Python, Shell, Java, Golang) and automation frameworks.
Able to identify the root cause and resolve critical issues by looking across multiple layers (storage, OS, network, and application / DB stack)
Play a part in incident management and emergency response
Location: LATAM
USD Pay