Centific Global Solutions is a leading information and communication technology company dedicated to delivering innovative IT support and help desk solutions. We pride ourselves on providing exceptional service to our clients whilst maintaining a collaborative and supportive workplace culture. Our team of skilled professionals is committed to solving complex technical challenges and ensuring seamless operations for businesses across various sectors.
This is a Cloud Server Operations Engineer role responsible for maintaining, troubleshooting, automating, and supporting large-scale Linux-based cloud infrastructure and networking environments, with a strong emphasis on operations, incident response, and infrastructure reliability.
Key Responsibilities
- Support cloud infrastructure deployment with internal teams and cloud vendors.
- Reinstall, reboot, and maintain Linux/cloud servers and bare-metal systems.
- Troubleshoot server, network, hardware, and cloud infrastructure issues.
- Monitor infrastructure health, alerts, and asset utilization.
- Handle incident management and participate in on-call support rotations.
- Maintain operational records, tickets, repair logs, and tracking systems.
- Develop automation scripts and operational tools using Shell/Bash or Python.
- Improve operational processes and standardization.
- Troubleshoot networking issues involving TCP/IP, VLAN, DNS, IPv6, and routing.
- Support cloud fleet operations, performance management, documentation, and reporting.
Required Skills & Experience
- Bachelor's degree in Computer Science, Engineering, or related field.
- Experience in cloud, server operations, or data center environments.
- Strong Linux administration and troubleshooting skills.
- Knowledge of cloud platforms such as:
- Amazon Web Services (AWS)
- Google Cloud
- Oracle
- Microsoft
- Experience with OS provisioning technologies (PXE, iPXE).
- Ability to write Shell/Bash and Python scripts.
- Familiarity with automation tools:
- Ansible
- Terraform
- GitLab CI/CD
- Strong networking knowledge (TCP/IP, VLAN, DNS, IPv6).
- Experience with monitoring, operational tooling, and incident management.
Nice-to-Have Skills
- Large-scale cloud fleet operations.
- GPU infrastructure support.
- Firmware lifecycle management.
- RDMA networking experience.