Machine Learning Engineer- Inference Optimization | Experienced Hire
compilation. Exposure to CUDA, Triton language, ROCm, PTX, CuTe, CUTLASS, FlashInfer, or similar low-level GPU programming tools...
compilation. Exposure to CUDA, Triton language, ROCm, PTX, CuTe, CUTLASS, FlashInfer, or similar low-level GPU programming tools...
of modern inference runtimes (vLLM, SGLang, TensorRT-LLM, Dynamo, Triton) and the optimization patterns that matter...
of experience with deployment and optimization: Kubernetes, Docker, NVIDIA TensorRT/Triton, RAPIDs, Kubeflow, MLflow, Kafka...
, and Triton Develop and optimize system components for tensor/data parallelism and disaggregated serving Implement and optimize...
integration or partner perspective. Hands-on proficiency with ad serving platforms (Triton, GAM, FreeWheel, or equivalent...
, FlashInfer etc., would be well suited. - Familiar with syntax and tile-level semantics similar to Triton. - Experience...
experience with JAX and/or PyTorch at scale. Experience writing or optimizing custom GPU kernels using Pallas (JAX) or Triton...
of multi-node, multi-GPU AI training environments Knowledge of AI inferencing platforms such as Nvidia NIM/TRITON, vLLM...
is a plus. Our Research Stack Core Research: Python, PyTorch, NumPy, Triton, and CUDA Backend & Infra: Kubernetes, GCP, and large-scale...
River, MD customer. The position provides journeyman-level executive operations support services to the PMA-262 MQ-4C Triton...