Job Description
Job Description 2/2
🔹 Observability, Monitoring & Distributed Tracing
- Implement centralized logging using Grafana Loki and ELK Stack.
- Build dashboards and alerts using Grafana and Datadog.
- Implement distributed tracing using OpenTelemetry to improve system visibility.
- Improve monitoring coverage and alert accuracy.
🔹 Performance & Load Testing
- Conduct load and stress testing using tools such as k6, Locust, or JMeter.
- Analyze performance bottlenecks and implement tuning strategies.
- Support capacity planning and performance optimization.
🔹 Data Streaming & Integration
- Support Change Data Capture (CDC) and real-time data streaming pipelines.
- Work with Confluent Platform / Apache Kafka to ensure reliable event-driven data flow.
🔹 Security & Secret Management
- Manage secrets securely using Google Cloud Secret Manager and Kubernetes secrets, Vault Hashicorp.
- Implement secure CI/CD and platform access practices.
Education
Bachelor’s degree in Computer Science, Informatics, Information Systems, Electrical Engineering, Mathematics/Statistics, or related field.
Experience
- 0–4 years of experience in SRE, DevOps, Cloud Engineering, or Platform Engineering.
- Hands-on experience supporting production systems and cloud infrastructure.
Technical Skills
- Strong Linux system administration and networking fundamentals.
- Hands-on experience with Kubernetes and containerized environments.
- Experience designing and maintaining CI/CD pipelines.
- Infrastructure as Code experience (Terraform), Ansible.
- Helm chart development and Kubernetes deployment management.
- Monitoring, logging, and observability best practices.
- Programming/scripting skills in Bash, Python (Go is a plus).
- Familiarity with Google Cloud Platform (GCP).
