NVIDIA
Senior Hpc Devops Engineer (Remote)
Senior Hpc Devops Engineer | NVIDIA | Netherlands
Senior Hpc Devops Engineer | NVIDIA | Netherlands
What You’ll Be Doing
- Innovate and Implement: Design, implement, and maintain large-scale HPC/AI clusters with state-of-the-art monitoring, logging, and alerting systems.
- Infrastructure as Code (IaC): Utilize and develop tools to manage infrastructure as code, ensuring scalable and repeatable deployments.
- Streamline CI/CD Pipelines: Develop and maintain continuous integration and continuous delivery (CI/CD) pipelines to automate and streamline deployment processes.
- Automate Everything: Develop automation scripts and tools to automate deployment, configuration management, and operational monitoring.
- Enhance Monitoring: Deploy advanced monitoring solutions for servers, networks, and storage to ensure seamless operations.
- Troubleshoot Complex Issues: Perform comprehensive troubleshooting from bare metal to application level, ensuring system reliability and efficiency.
- Lead and Educate: Serve as a technical resource, developing and sharing best practices with internal teams.
- Drive Innovation: Support R&D activities and engage in proof of concepts (POCs) and proof of values (POVs) for future improvements.
What We Need To See
- B.Sc. in Computer Science, Engineering, or a related field with 5+ years of experience.
- Deep knowledge of HPC and AI solution technologies, including CPUs, GPUs, high-speed interconnects, and supporting software.
- Advanced proficiency in programming and scripting languages, with a solid understanding of object-oriented programming principles.
- Familiarity with Jenkins, Ansible, Puppet/Chef.
- Excellent knowledge of Windows and Linux (Redhat/CentOS and Ubuntu), networking and OS-level security.
- Deep understanding of networking protocols such as InfiniBand and Ethernet.
- Experience with job scheduling workloads and orchestration tools such as Slurm and Kubernetes.
- Experience with multiple storage solutions like Lustre, GPFS, ZFS, and XFS.
- Expertise with virtual systems (VMware, Hyper-V, KVM, Citrix).
- Familiarity with cloud platforms (AWS, Azure, Google Cloud).
Ways To Stand Out From The Crowd
- Architectural Insight: Knowledge of CPU and/or GPU architecture.
- Container Expertise: Understanding of Kubernetes and container-related microservice technologies.
- GPU Focus: Experience with GPU-focused hardware/software (DGX, CUDA).
- RDMA Fabrics: Background with RDMA (InfiniBand or RoCE) fabrics.
NVIDIA is at the forefront of breakthroughs in Artificial Intelligence, High-Performance Computing, and Visualization. Our teams are composed of driven, innovative professionals dedicated to pushing the boundaries of technology. We offer highly competitive salaries, an extensive benefits package, and a work environment that promotes diversity, inclusion, and flexibility. As an equal opportunity employer, we are committed to fostering a supportive and empowering workplace for all. .
JR1993265
Show more
Show less
Related Jobs
See more All Other Remote Jobs-
NewSave
-
NewSave
-
NewSave
-
NewSave
-
NewSave
-
NewSave
-
NewSave
-
NewSave
-
NewSave
-
NewSave