Distinguished Engineer - AI
inference engines (TensorRT, vLLM, ONNX Runtime, Triton), model optimization (quantization, pruning, distillation), and serving...
inference engines (TensorRT, vLLM, ONNX Runtime, Triton), model optimization (quantization, pruning, distillation), and serving...
, C++ or other relevant coding languages. Expertise in ML inference, model serving frameworks (triton, rayserve, vLLM...
innovative solutions Model serving/inference (e.g., ONNX Runtime, vLLM, Triton, quantization, distillation, caching, dynamic...
-level programming to maximize performance for AI operations, leveraging tools like Compute Kernel (CK), CUTLASS, and Triton...
inference scaling across multi-node clusters using Ray Serve and Triton Experience in leading technical projects and supporting...
and other academic institutions, as well as in industry and governmental organizations. LIME maintains a Thermo Scientific Triton...
in software/hardware co-design Deep understanding of GPU and/or other AI accelerators Experience with CUDA, Triton...
, autoscaling, service mesh, GPU operators) and LLM serving engines (e.g., vLLM, TensorRT-LLM, Triton, KServe/Seldon, Ray Serve...
with MLIR, MLIR Dialects (LinAlg, Affine), Pytorch 2.0, TVM, Triton, and/or LLVM Bachelor's degree in Computer Science...
. in Triton, with a focus on generating efficient, low-level code. Experience in workload mapping strategies exhibiting sharding...