services, and any other characteristic protected by applicable federal, state or local law. Posting Notes: || United States..., Azure). Experience with monitoring/observability/log management tools. (such as Grafana, Prometheus, Thanos, Loki, DataDog...
Zscaler Internet Access (ZIA) product, which has redefined how security services are delivered in the cloud... solutions while maintaining ownership from API tiers to back-end services Drive a culture of quality by setting coding...
Reliability Engineer to join our IT team, focusing on supporting Linux infrastructure services and advancing our CI/CD pipelines... and delivering Infrastructure services? Do you thrive in a fast-paced environment, and want to be an integral part of a truly great...
Lugar:
Reston, VA | 13/05/2026 02:05:26 AM | Salario: S/. $79100 - 158200 per year | Empresa:
Oracleservices in the cloud. Manage and maintain our Kubernetes infrastructure. This includes optimizing cluster resource usage... as bash, etc. Experience with monitoring and logging tools such as Datadog, Grafana and Prometheus. Strong communication...
Integration Standards: Own the OTLP, Prometheus, and JSON-log compatibility surface and validate ingestion into Datadog, Splunk.... Strong systems programming: Production Go and/or Rust preferred. Comfort across the stack, from agent code to backend services...
About NDi: Network Designs, Inc. (NDi) is a leading Federal contractor that specializes in designing, developing... and operating secure, scalable platform services in AWS GovCloud to support cloud-native application delivery within the VA Benefits...
-driven solutions that modernize mission‑critical systems for federal clients. We are seeking an experienced AI/ML Lead... analysis). MLOps integration and experience with API-based AI services. Production deployment experience including packaging...
Indicators (SLIs) for critical AI/ML services. Error Budgeting: Manage error budgets to balance the velocity of feature releases... Model Serving Reliability: Ensure the high availability of Vertex AI endpoints and custom inference services. GPU/TPU...
solutions (metrics, logs, and traces) using tools such as Prometheus, Grafana, and OpenTelemetry to ensure system reliability...) and containerized services Programming proficiency: Strong scripting and coding skills in Bash and/or Python for building custom...
) and Service Level Indicators (SLIs) for critical AI/ML services. Error Budgeting: Manage error budgets to balance the velocity... inference services. GPU/TPU Optimization: Monitor and optimize compute resource utilization (accelerators) to ensure cost...