servers such as Triton Inference Server and vLLM for high-throughput, low-latency serving at scale. Oversees production... in large-scale environments typical of major tech firms. Hands-on experience building LLM inference engines using Triton...
AI workloads using NVIDIA technologies including CUDA-X libraries, TensorRT-LLM, Triton Inference Server, NVIDIA NeMo, NIM... such as TensorRT-LLM, Triton Inference Server, CUDA, RAPIDS, or similar GPU acceleration technologies. Experience building or scaling...
efficiency. Leads deployment and optimization using Model Inference servers such as Triton Inference Server and vLLM for high... environments typical of major tech firms. Hands-on experience building LLM inference engines using Triton Inference Server...
) and inference serving frameworks (e.g., vLLM, Triton Inference Server, TensorRT-LLM, ONNX Runtime, Ray Serve, DeepSpeed-MII). 3...+ years of experience in GPU programming and optimization, with expert knowledge of CUDA, ROCm, Triton, PTX, CUTLASS...
performance for AI operations, leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform...
. Foundational understanding of NVIDIA GPU Infrastructure software (e.g., NVIDIA DCGM, BCM, Triton Inference), Kubernetes and Cloud...
with inference optimization using vLLM, TensorRT-LLM, Triton Inference Server, or similar DevOps & Platform Skills Advanced...
Lugar:
Manhattan, NY | 25/03/2026 02:03:12 AM | Salario: S/. No Especificado | Empresa:
GEICO if you: Have expertise with accelerators (CUDA, ROCm, Triton, Pallas), ML framework programming (JAX or PyTorch). Have worked across the...
services in containerized / cloud-native environments (e.g., vLLM, SGLang, Triton). â–¸ Deep understanding of 1M+ token context...
Infrastructure software (e.g., NVIDIA DCGM, BCM, Triton Inference), Kubernetes and Cloud. Hands-on experience with ML frameworks...