Arc Compute operates high-performance GPU clusters and is focused on improving efficiency, throughput, and reliability at scale. We’re looking for an Embedded Software Engineer to help build the software that makes our GPU infrastructure faster and more efficient.

What You’ll Be Doing

  • Build and improve GPU performance telemetry using CUDA, DCGM and low-level profiling data.
  • Participate in exploring scheduling and optimization strategies to make multi-GPU workloads run more efficiently.
  • Performance optimization, analysis, and tuning of DL models in various domains like LLM, Multimodal, and Generative AI
  • Scale performance of DL models across different architectures and types of NVIDIA accelerators.
  • Collaborate with team members and other partners.

What We’re Looking For

  • Completed Bachelors or Masters Computer Engineering, Electrical Engineering or equivalent experience in relevant fields.
  • 4+ years of work experience in software development, design patterns and software engineering principles.
  • At least 1 year of experience in CUDA development and GPU performance concepts.
  • C/C++ programming and software design skills. Python experience is a plus.
  • Modeling, profiling, debug, and code optimization or architectural knowledge of CPU and GPU is a plus.
  • Familiarity with Linux environments and debugging on real hardware.
  • Comfortable working onsite with GPU servers and real workloads.
  • Experience with Git.

Nice to Have

  • Experience deploying or operating systems in Kubernetes, Docker-based environments, or other job orchestration frameworks.
  • Understanding of AI model serving backends, ML runtimes, or AI compilers (e.g., TensorRT, TVM, XLA).
  • Basic experience building or extending backend web services (e.g., REST APIs, data ingestion pipelines, or simple dashboards).

Apply Now

    We’ll send you job opportunities, Q&As with industry leaders, career insights, and more.

    To apply for this job email your details to zahid.iqbal@hardbootinc.com