A company is looking for a CUDA Engineer to engineer performant CUDA kernels with a focus on numerical correctness and efficiency.
Key Responsibilities
Engineer and optimize CUDA kernels for training and inference with large MoE models
Profile and debug CUDA applications to identify and resolve performance bottlenecks
Collaborate with team members to tackle technical challenges in building self-improving agents
Required Qualifications
Strong understanding of CUDA programming and performance optimization techniques
Familiarity with LoRA adapters and related optimizations such as Punica / S-LoRA kernels and Flash Attention
Experience in model training, particularly using reinforcement learning (RL)
Demonstrated ability to learn quickly and deliver impactful results
Prior experience in software development or engineering roles is preferred
Engineer • Flushing, New York, United States