Principal AI Forward Deployment Engineer
management, audit logging) for engineering orgs. Familiarity with GPU/inference infrastructure, vLLM/TGI/Triton, fine-tuning...
management, audit logging) for engineering orgs. Familiarity with GPU/inference infrastructure, vLLM/TGI/Triton, fine-tuning...
, and speculative decoding for LLM serving. Drive compiler-level optimizations using Triton, XLA, TorchInductor, or TVM, working..., TensorRT-LLM, DeepSpeed, or similar projects. Familiarity with custom kernel authoring in Triton or CUTLASS. Experience...
infrastructure and AI workloads (e.g., Triton Inference Server monitoring). Preferred Qualifications Background in implementing...| Prometheus| OpenTelemetry) applied to both standard infrastructure and AI workloads (e.g.| Triton Inference Server monitoring...
environment. This position ensures that the TRITON model delivers scientifically accurate, operationally relevant...
, RAG architecture, VLLM Experience with NVIDIA Triton Server APIs for inference at scale Proficiency with data...
and maintain high-performance GPU kernels using Triton or CUDA to accelerate custom model layers and critical workloads. Improve... and scaling large-scale inference or serving workloads using distributed frameworks and runtime systems (e.g., Triton, vLLM...
Toolkit, Guardrails, Megatron, Framework, NIM), Nemotron, OSS, Transformer Engine, TensorRT-LLM, Triton, RAPIDS...
framework experience: NVIDIA Dynamo/Triton, vLLM, SGLang, or equivalent. Experience with distributed or disaggregated model...
, and repeatable delivery of operational flood forecasting capabilities in a remote work environment. This role ensures that TRITON...
servers such as Triton Inference Server and vLLM for high-throughput, low-latency serving at scale. Oversees production... in large-scale environments typical of major tech firms. Hands-on experience building LLM inference engines using Triton...