AI Inference Performance Engineer - New College Grad 2026
) on large GPU clusters. Expertise in kernel development (CUTLASS, cuteDSL, tilelang, OpenAI Triton) or compiler/runtime paths...
) on large GPU clusters. Expertise in kernel development (CUTLASS, cuteDSL, tilelang, OpenAI Triton) or compiler/runtime paths...
Certification. Hands-on experience with NVIDIA GPUs and SDKs (e.g. CUDA, RAPIDS, Triton etc.) System-level experience...
with relevant libraries, compilers, and languages - CUDNN, CUBLAS, CUTLASS, MLIR, Triton, CUDA, OpenCL Experience with the...
) and generative AI workloads. Enhance performance tuning using TensorRT/TensorRT-LLM, vLLM, Dynamo, and Triton Inference Server... Knowledge of Inference technologies - NVIDIA NIM, TensorRT-LLM, Dynamo, Triton Inference Server, vLLM, etc Proficiency...
. Nice To Haves GPU serving expertise - Experience with frameworks like NVIDIA Triton, TensorRT-LLM, ONNX Runtime, or vLLM... on scaling strategies, observability, and cost optimization. Prior contributions to OSS serving ecosystems (e.g., vLLM, Triton...
in Fairfax, VA. As a Senior Software Engineer on the Triton team, you will have the following responsibilities: Develop...
features using WebSockets or similar. Knowledge of model serving frameworks (e.g., TensorFlow Serving, Triton). Experience...
serving frameworks (TorchServe, TensorFlow Serving, Triton Inference Server). Experience deploying and managing GPU workloads...
. Experience working with high-performance inference engines (e.g., vLLM, NVIDIA Triton, TorchServe) for efficient LLM serving...
learning model serving technologies (e.g. NVIDIA Triton, RayServe, etc.), multi-phase ranking infrastructure, and Elasticsearch...