- Singapore Singapore
工作地点
职位描述
岗位职责
Job Location: Singapore. (Onsite)
Job Summary
We are seeking a skilled High-Performance Computing (HPC) Engineer with 5–10 years of experience to design, deploy, manage, and optimize HPC cluster environments. The ideal candidate will have hands-on experience with cluster scheduling, monitoring, performance tuning, and supporting scientific or engineering workloads in Linux-based environments.
Key Responsibilities
· Design, deploy, and maintain HPC cluster infrastructure to ensure high availability and performance.
· Manage and configure job scheduling systems such as PBS and SLURM.
· Implement and maintain monitoring solutions using Grafana, Nagios, Prometheus, and Ganglia.
· Administer cluster management tools including Bright Cluster Manager, xCAT, and Puppet for infrastructure automation.
· Configure and troubleshoot high-speed networking technologies including InfiniBand and Gigabit Ethernet.
· Perform system performance analysis, profiling, and debugging using tools like Intel VTune, Valgrind, and gprof.
· Provide application support for scientific and engineering workloads using GNU and Intel CUDA compilers, as well as MKL libraries.
· Manage virtualization environments using Proxmox and handle license management tools like FlexLM.
· Configure and maintain storage solutions including parallel file systems and enterprise object storage platforms.
· Ensure system security, patching, and compliance in Red Hat Linux environments.
· Collaborate with research, engineering, and IT teams to optimize workloads and resource utilization.
· Document system architecture, processes, and troubleshooting guides.
Required Skills & Qualifications
· 5–10 years of experience in HPC systems administration or engineering.
· Strong experience with job schedulers such as PBS and SLURM.
· Hands-on experience with monitoring tools: Grafana, Nagios, Prometheus, Ganglia.
· Expertise in cluster management tools like Bright Cluster Manager, xCAT, and Puppet.
· Solid understanding of HPC networking, including InfiniBand and Ethernet.
· Experience with performance profiling and debugging tools (Intel VTune, Valgrind, gprof).
· Familiarity with compilers and libraries: GNU, Intel CUDA, MKL.
· Experience with virtualization platforms like Proxmox and license management (FlexLM).
· Knowledge of storage technologies: parallel file systems (e.g., Lustre, GPFS) and object storage.
· Strong Linux administration skills, specifically Red Hat Enterprise Linux.
· Scripting skills (Bash, Python, or similar) for automation and troubleshooting.
重要安全守则
申请工作时,切勿提供您的银行或信用卡详细资料。不要转账或完成无关的在线调查问卷。如果您发现可疑内容,请举报此招聘广告。