Coordinate projects and programs related to AI/ML infrastructure (e.g. pre-training, post-training pipelines, inference & model serving stacks), including end-to-end planning, timelines, milestones, performance metrics, and resource needs. ...
Gather competitive and market intelligence on other AI labs and approaches, synthesize insights into succinct briefs, and translate them into recommendation memos for leadership. Drive decision cadence and governance organizing and facilita...
They present enormous opportunities for acceleration and performance improvement on custom architectures. This Research Internship is an opportunity to work at the confluence of AI, architecture and performance optimization. Research Intern...
Design, and develop large-scale distributed cloud services and solutions with a focus on high availability, scalability, robustness, and observability. Lead project development across the organization and work with subject matter experts an...
Develop and tune the pretraining scalable software for Nvidia GB200 72NVL CX8 and AMD MIxxx architectures Benchmark GB200 and AMD MIxxx GPU clusters Gather data and insights to develop the pretraining compute roadmap Care deeply about conve...
Are passionate about the role of data in large-scale AI model training Will thrive in a highly collaborative, fast-paced environment Have a high degree of expertise and pay close attention to details Demonstrate a proactive attitude and ent...
Deeply understand the pipeline of collecting data, training, evaluating, and serving language models and multimodal models. Have experience working side-by-side with AI researchers and engineers. Thrive in a 0->1, scrappy, innovative env...
Design, and develop large-scale distributed cloud services and solutions with a focus on high availability, scalability, robustness, and observability. Lead project development across the organization and work with subject matter experts an...
Primary Skills: Marketing(Expert), Cross-Functional Leadership(Advanced), Data Analysis(Expert), AI Utilization(Intermediate), Project Management(Advanced) Location: Remote Contract :W2 Duration:12 months Pay Range:$95 - $100 Per Ho...
Design, implement, test, and optimize AI models in Python and CUDA C++ for large-scale GPU clusters. Profile, benchmark, and debug performance bottlenecks across compute, memory, and networking subsystems. Optimize collective communication ...