Site Reliability Engineer
tooling — metrics, logging, tracing, and alerting (e.g., Cloud Monitoring, Datadog, or Prometheus/Grafana) Understanding...
tooling — metrics, logging, tracing, and alerting (e.g., Cloud Monitoring, Datadog, or Prometheus/Grafana) Understanding...
tooling — metrics, logging, tracing, and alerting (e.g., Cloud Monitoring, Datadog, or Prometheus/Grafana) Understanding...
& Quality Production experience with APM and observability tools (Datadog, New Relic, Application Insights, or similar...
Familiarity with microservices and event-driven systems Experience monitoring production systems using Datadog, Grafana...
AI pipelines using Langfuse and Datadog to debug failed agent runs, detect regressions, and maintain production health. Team... AI pipelines, and utilizing AI observability tooling (e.g., Langfuse, Datadog). You bring strong engineering skills in Python...
response times and overall application performance Monitor applications using tools such as New Relic, Datadog, Rollbar...
within AWS environments including S3 and EC2 Monitor applications and workflows using DataDog Engagement highlights... and attribution modeling Exposure to DataDog or similar monitoring tools Spanish proficiency is a plus...
, Logging & Incident Response Implement observability tools such as: Prometheus Grafana Datadog New Relic Configure...
, Prometheus, Elastic , DataDog or similar. Oversee all planned outages, assess RCA and assist with major upgrades to ensure..., Grafana, Prometheus, Datadog, Dynatrace) to ensure system performance and reliability. Proven experience managing distributed...
resources Monitoring & Alerts Use monitoring tools such as Datadog, Splunk, New Relic, or similar Identify issues... certification Experience supporting APIs, integrations, or SaaS platforms Exposure to monitoring tools such as Datadog, Splunk...