compute, memory, and scheduling 4. Use and extend Triton / CUDA / CUTLASS, and integrate optimized kernels with PyTorch / XLA... and execution models 3. Proficiency in CUDA C++ or Triton, with the ability to independently write and optimize kernels 4...
Lugar:
San Jose, CA | 07/02/2026 03:02:40 AM | Salario: S/. No Especificado | Empresa:
TikTok ecosystem (XLA, Flax, etc.) Experience in programming for GPUs or other accelerators (CUDA, Triton, Pallas...
with JAX ecosystem (XLA, Flax, etc.) Familiarity with GPU libraries and tools such as Triton, CUB, cuDNN, and cuBLAS Linux...
Crowd: Experience with Deep Learning Frameworks (PyTorch, Jax, etc.), ML compilers (XLA, Triton, etc.), GPU Technology...
supporting MQ-4C Triton Program. This position will support the design, development, and optimization of secure, scalable...
on Kubernetes/EKS, serverless FaaS, or on-prem containers. Knowledge of GPU runtime tuning or Triton-based multi-model serving...
, vLLM, TensorRT, Nvidia Triton, or PyTorch. Experience working with cloud providers like AWS and working with K8s...
) Familiarity with PyTorch ecosystem components such as TorchInductor / torch.compile, Triton, CUDA/HIP-style programming models...
, Triton, KServe, or Ray, along with experience deploying LLM workloads, is highly desirable. Knowledge of ROCm, AMD MI300...
technologies (Omniverse, Cosmos, CUDA, TensorRT, Triton Inference Server, etc.). Widely considered to be one of the technology...