Research Engineer, Performance RL
if you: Have expertise with accelerators (CUDA, ROCm, Triton, Pallas), ML framework programming (JAX or PyTorch). Have worked across the...
if you: Have expertise with accelerators (CUDA, ROCm, Triton, Pallas), ML framework programming (JAX or PyTorch). Have worked across the...
services in containerized / cloud-native environments (e.g., vLLM, SGLang, Triton). â–¸ Deep understanding of 1M+ token context...
Infrastructure software (e.g., NVIDIA DCGM, BCM, Triton Inference), Kubernetes and Cloud. Hands-on experience with ML frameworks...
with inference optimization using vLLM, TensorRT-LLM, Triton Inference Server, or similar DevOps & Platform Skills Advanced...
/CuTeDSL/cutlass/Triton). Prior contributions to major LLM inference frameworks (e.g. vLLM) or prior experience with graph...
GenAI models Experience with low-level GPU programming (CUDA, Triton, NCCL) and frameworks such as PyTorch or JAX...
performance for AI operations, leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform...
is a plus Hands-on experience with Kubernetes, Docker, and ML-Ops platforms (e.g., MLflow, KServe, Triton) Familiarity with CUDA...
-performance inference framework (e.g. Triton and TensorRT) Deep understanding of Diffusion Architecture Experience profiling...
, Triton, TVM, or similar systems Exposure to: Neural networks Tree-based models (e.g., LightGBM) State space...