Staff Machine Learning Engineer - Leasing
workflows. GPU performance tuning (vLLM, TensorRT, Triton, or similar). Experience with ontology-driven systems or knowledge...
workflows. GPU performance tuning (vLLM, TensorRT, Triton, or similar). Experience with ontology-driven systems or knowledge...
NCCL, RDMA, and high-performance networking. Implement custom operators and fused kernels in PyTorch, JAX, or Triton... Experience with Triton, CUTLASS, or other GPU kernel authoring frameworks. Familiarity with TensorRT, FasterTransformer, or vLLM...
standards. This role is accountable for Quality oversight across the manufacturing process of formulation, filling, triton...
in one or more of: NVIDIA Stack (CUDA, NeMo, Triton, TensorRT, NIM, DGX Cloud, and the broader DSX software portfolio), Inference Systems (large...
, and speculative decoding for LLM serving. Drive compiler-level optimizations using Triton, XLA, TorchInductor, or TVM, working..., TensorRT-LLM, DeepSpeed, or similar projects. Familiarity with custom kernel authoring in Triton or CUTLASS. Experience...
, and speculative decoding for LLM serving. Drive compiler-level optimizations using Triton, XLA, TorchInductor, or TVM, working..., TensorRT-LLM, DeepSpeed, or similar projects. Familiarity with custom kernel authoring in Triton or CUTLASS. Experience...
Hours Per Week: 40 Schedule Details/Additional Information: .5 triton surgical technologist apprenticeship/.5 work...
infrastructure (Ray Serve, KServe, Triton, FastAPI-based services, MLFlow, etc.) Experience designing low-latency and high...
. Experience writing or optimizing custom GPU kernels using Pallas or Triton. Demonstrable career progression. Ability to engage...
Performance Engineer to work at the hardware-software boundary of this platform, crafting high-performance CUDA and Triton kernels... job responsibilities Design and implement high-performance CUDA and Triton kernels for quantization-aware training, sparse matrix...