jobs in Tekwissen Software Private Limited

全职 HPC Engineer 工作, 薪水 up to SGD 141,667, Tekwissen Software Private Limited 公司招聘中 - Ricebowl

HPC Engineer

Tekwissen Software Private Limited

SGD4,167 - SGD141,667 每月

Singapore

分享
保存

工作地点

  • Singapore Singapore

职位描述

岗位职责

Job Location: Singapore. (Onsite) 

Job Summary

We are seeking a skilled High-Performance Computing (HPC) Engineer with 5–10 years of experience to design, deploy, manage, and optimize HPC cluster environments. The ideal candidate will have hands-on experience with cluster scheduling, monitoring, performance tuning, and supporting scientific or engineering workloads in Linux-based environments.

Key Responsibilities

·       Design, deploy, and maintain HPC cluster infrastructure to ensure high availability and performance.

·       Manage and configure job scheduling systems such as PBS and SLURM.

·       Implement and maintain monitoring solutions using Grafana, Nagios, Prometheus, and Ganglia.

·       Administer cluster management tools including Bright Cluster Manager, xCAT, and Puppet for infrastructure automation.

·       Configure and troubleshoot high-speed networking technologies including InfiniBand and Gigabit Ethernet.

·       Perform system performance analysis, profiling, and debugging using tools like Intel VTune, Valgrind, and gprof.

·       Provide application support for scientific and engineering workloads using GNU and Intel CUDA compilers, as well as MKL libraries.

·       Manage virtualization environments using Proxmox and handle license management tools like FlexLM.

·       Configure and maintain storage solutions including parallel file systems and enterprise object storage platforms.

·       Ensure system security, patching, and compliance in Red Hat Linux environments.

·       Collaborate with research, engineering, and IT teams to optimize workloads and resource utilization.

·       Document system architecture, processes, and troubleshooting guides.

 Required Skills & Qualifications

·       5–10 years of experience in HPC systems administration or engineering.

·       Strong experience with job schedulers such as PBS and SLURM.

·       Hands-on experience with monitoring tools: Grafana, Nagios, Prometheus, Ganglia.

·       Expertise in cluster management tools like Bright Cluster Manager, xCAT, and Puppet.

·       Solid understanding of HPC networking, including InfiniBand and Ethernet.

·       Experience with performance profiling and debugging tools (Intel VTune, Valgrind, gprof).

·       Familiarity with compilers and libraries: GNU, Intel CUDA, MKL.

·       Experience with virtualization platforms like Proxmox and license management (FlexLM).

·       Knowledge of storage technologies: parallel file systems (e.g., Lustre, GPFS) and object storage.

·       Strong Linux administration skills, specifically Red Hat Enterprise Linux.

·       Scripting skills (Bash, Python, or similar) for automation and troubleshooting.

重要安全守则

申请工作时,切勿提供您的银行或信用卡详细资料。不要转账或完成无关的在线调查问卷。如果您发现可疑内容,请举报此招聘广告。

了解更多