foundation for the customer’s AI capabilities, focusing on inference services while supporting the broader ecosystem... inference at scale Lead the development and maintenance of production AI services and applications, including retrieval...
, and the discipline to design, automate, and operate complex services so that reliability becomes a first-class engineering... objectives (SLOs), service-level indicators (SLIs), and error budgets for critical services, and use those measures to drive...
, and the discipline to design, automate, and operate complex services so that reliability becomes a first-class engineering... objectives (SLOs), service-level indicators (SLIs), and error budgets for critical services, and use those measures to drive...
configurations using GitOps patterns. Implement comprehensive observability using Prometheus, Grafana, Datadog, or Confluent Control.... Experience operating Kafka on Kubernetes (Strimzi, Confluent Operator). Exposure to managed Kafka services (AWS MSK, Azure Event...
on Amazon Web Services. This is a deeply hands-on engineering role spanning architecture, infrastructure-as-code, automation... environments across compute, networking, storage, identity, and managed data services, with strong attention to scalability...
foundation for the customer’s AI capabilities, focusing on inference services while supporting the broader ecosystem... inference at scale Lead the development and maintenance of production AI services and applications, including retrieval...
​platforms (Azure/AWS ​hybrid environments) Observability & monitoring tools (Dynatrace, Splunk, Prometheus, ​Grafana... and include a wide variety of professional services, project, and talent solutions. By always striving for excellence and focusing...
, and the discipline to design, automate, and operate complex services so that reliability becomes a first-class engineering... objectives (SLOs), service-level indicators (SLIs), and error budgets for critical services, and use those measures to drive...
, and connector configurations using GitOps patterns. Implement comprehensive observability using Prometheus, Grafana, Datadog.... Experience operating Kafka on Kubernetes (Strimzi, Confluent Operator). Exposure to managed Kafka services (AWS MSK, Azure Event...
, and connector configurations using GitOps patterns. Implement comprehensive observability using Prometheus, Grafana, Datadog.... Experience operating Kafka on Kubernetes (Strimzi, Confluent Operator). Exposure to managed Kafka services (AWS MSK, Azure Event...