across Kubernetes, Terraform, and CI/CD pipelines. You will implement monitoring and observability tooling including Datadog, New Relic... such as Datadog or New Relic;- Exposure to GitOps practices;- Experience with Jenkins or GitHub Actions specifically...
across Kubernetes, Terraform, and CI/CD pipelines. You will implement monitoring and observability tooling including Datadog, New Relic... such as Datadog or New Relic;- Exposure to GitOps practices;- Experience with Jenkins or GitHub Actions specifically...
across Kubernetes, Terraform, and CI/CD pipelines. You will implement monitoring and observability tooling including Datadog, New Relic... such as Datadog or New Relic;- Exposure to GitOps practices;- Experience with Jenkins or GitHub Actions specifically...
(Splunk, Datadog), and strong proficiency in Linux environments Preferred qualifications, capabilities, and skills...
platforms (Datadog, Splunk), including platform governance, roadmap alignment, and operational oversight. Lead enterprise-scale... with ITSM platforms (ServiceNow). Design and manage multi-org Datadog architecture, including org structure, RBAC, SSO/SAML...
and operational excellence through proactive monitoring, alerting, logging, and performance management practices using Datadog... scripting, Docker, and Kubernetes. Experience implementing observability and DevSecOps practices using tools such as Datadog...
& Quality Production experience with APM and observability tools (Datadog, New Relic, Application Insights, or similar...
AI pipelines using Langfuse and Datadog to debug failed agent runs, detect regressions, and maintain production health. Team... AI pipelines, and utilizing AI observability tooling (e.g., Langfuse, Datadog). You bring strong engineering skills in Python...
, Prometheus, Elastic , DataDog or similar. Oversee all planned outages, assess RCA and assist with major upgrades to ensure..., Grafana, Prometheus, Datadog, Dynatrace) to ensure system performance and reliability. Proven experience managing distributed...
, security groups PostgreSQL on Amazon RDS (~15 instances) Datadog + CloudWatch (APM, logs, alerting) Java microservices / API... or similar) Strong written English for escalation + post-incident write-ups Nice-to-have: Datadog / CloudWatch fluency AWS...