As Site Reliability and Operations Engineer (SRE), you'll be part of the action—working closely with application teams to automate operations, optimize infrastructure, and troubleshoot issues in an exciting, fast-paced environment. You'll play a vital role in ensuring that our systems are reliable, scalable, and high-performing. This role is designed for driven individuals who:
- Love learning new technologies and thrive in solving complex challenges.
- Are independent, motivated, and excited to take on ambitious projects.
- Excel at collaborating with engineering teams and can stay calm under pressure.
- Have a passion for delivering quality, reliable solutions in a dynamic, high-energy workplace.
Key Responsibilities:
- Build and maintain robust and highly available Continuous Integration pipelines.
- Automate build & deployment processes.
- Design and implement new software to streamline manual operations and enhance developer productivity for application management.
- Troubleshoot and handle platform issues for both cloud and on premises infrastructures.
- Improve operational readiness by performing root cause analysis of critical issues and creating long-term solutions.
- Maintain scalable logging and monitoring infrastructure.
Minimum Qualifications
- BS degree or higher in Computer Science or a related field with at least 4 years of prior demonstrated experience in Site Reliability Engineering, or an Infrastructure-focused role.
- Proficiency in one or more programming languages ( eg. Java, Python ).
- Understanding of data structure and algorithms, software development life cycle (SDLC).
- Knowledge on fundamentals of network, databases, system administration.
- Understanding on version control systems like Github.
- Course work on Web technologies, Machine Learning will be a plus.
- Strong problem-solving, communication skills
Preferred Qualifications
- Knowledgeable with container based technologies such as Docker, Kubernetes, or EKS.
- Knowledgeable with modern web services architectures and cloud platforms such as AWS, GCP.
- Exceptional analytical and troubleshooting skills in complex Unix/Linux systems environment and applications implementations.
- Ability to build tools from scratch.
- Ability to work in a collaborative environment.