AI Research Engineer, Pre-Training
, especially any of: CUDA/Triton/Pallas/CuTe DSL kernel development, lower-level PyTorch/JAX/XLA development, CUDA Graphs, FPGA...
, especially any of: CUDA/Triton/Pallas/CuTe DSL kernel development, lower-level PyTorch/JAX/XLA development, CUDA Graphs, FPGA...
, planning, prediction, or control. Deep familiarity with NVIDIA platforms such as DRIVE™, Jetson™, CUDA®, TensorRT™, Triton...
of inference servers (vLLM, Triton) using KServe, KubeRay, or Knative to ensure serverless-style scaling for AI workloads...., vLLM, ONNX, TorchServe, Triton). Familiarity with quantization techniques (AWQ, GPTQ) to optimize model size/speed...
performance tuning (vLLM, TensorRT, Triton, or similar). Prior Staff-level role at a company with a significant AI infra...
AI system performance using tools such as NVIDIA Nsight, TensorRT, Triton Inference Server, or similar profiling...
and/or PyTorch at scale. Experience writing or optimizing custom GPU kernels using Pallas (JAX) or Triton. Demonstrable career...
and/or PyTorch at scale. Experience writing or optimizing custom GPU kernels using Pallas (JAX) or Triton. Demonstrable career...
and/or PyTorch at scale. Experience writing or optimizing custom GPU kernels using Pallas (JAX) or Triton. Demonstrable career...
and/or PyTorch at scale. Experience writing or optimizing custom GPU kernels using Pallas (JAX) or Triton. Demonstrable career...
and/or PyTorch at scale. Experience writing or optimizing custom GPU kernels using Pallas (JAX) or Triton. Demonstrable career...