Senior Site Reliability Engineer

3 days ago


Menteng Jakarta, Indonesia BookCabin Full time

Responsibilities

  • Design, build, and maintain scalable, reliable, and secure infrastructure across AWS (including Elastic Beanstalk) and Azure.
  • Develop and manage CI/CD pipelines using Azure DevOps, GitHub Actions, or similar tools to ensure smooth and automated deployments.
  • Operate, monitor, and troubleshoot Kubernetes clusters (EKS, AKS, or self-managed) to ensure system stability and uptime.
  • Implement comprehensive observability solutions using Prometheus, Grafana, Loki, and Alertmanager.
  • Automate infrastructure provisioning and configuration using Terraform, Helm, CloudFormation, and/or Ansible.
  • Define, measure, and improve system reliability through SLOs, SLIs, and SLAs.
  • Enhance system resilience and incident response through proactive monitoring and capacity planning.
  • Manage secrets, access control, and security policies to maintain a robust and compliant infrastructure.
  • Participate in on-call rotations, respond to incidents, and drive root cause analysis and post-incident reviews.
  • Collaborate closely with development teams to embed reliability and scalability best practices throughout the software lifecycle.

Requirements

  • 5+ years of experience in a Site Reliability, DevOps, or Cloud Engineering role.
  • Strong hands-on experience with AWS (EC2, VPC, IAM, CloudWatch, Elastic Beanstalk, RDS, S3) and familiarity with Azure services.
  • Proven experience deploying and managing containerized applications using Kubernetes (EKS/AKS) and Docker.
  • Skilled in CI/CD pipeline development and multi-cloud workflows (Azure DevOps, GitHub Actions, etc.).
  • Solid understanding of observability tools such as Prometheus, Grafana, Loki, and Alertmanager.
  • Proficiency in infrastructure-as-code tools like Terraform, CloudFormation, or similar.
  • Scripting skills in Bash, Python, or PowerShell.
  • Strong grasp of networking, Linux systems, and cloud security best practices.
  • Excellent problem-solving skills with a focus on performance, scalability, and reliability.


  • Jakarta, Indonesia Abhidi Solution Private Limited Full time

    **Responsibilities**: - Administer production related jobs - Address production issue - Improve system reliability through configuration or code changes - System monitoring and improve system observability - Remove toil and automate whenever possible - Problem solving, including troubleshoot a production issue **Skills**: - Experience with cloud...


  • Jakarta, Jakarta, Indonesia AVOWS TECHNOLOGIES PRIVATE LIMITED Full time

    About the RoleWe are looking for an experienced Site Reliability Engineerto design, implement, and manage our cloud-based infrastructure onGoogle Cloud Platform (GCP)from the ground up. The ideal candidate will ensure our systems are highly available, reliable, scalable, and efficient while collaborating closely with software engineers to deliver robust...


  • Jakarta, Indonesia Kalibrr Full time

    Your main responsibilities as a Site Reliability Engineer at Kalibrr are: Engage in and improve the whole lifecycle of the Kalibrr services-design, deployment, operation, and refinement. Practice incident response and blameless postmortems. Participate in an on-call rotation Scale systems and operations through automation. Maintain services by monitoring...


  • Jakarta, Jakarta, Indonesia PT. Alto Network Full time

    COMPANY DESCRIPTION ALTO Network is a leading payment infrastructure provider as well as the pioneer in payment solution by always bringing the most innovative and impactful technology to connect merchants or financial institutions with their customers to grow their businesses nationwide and beyond.DESIGNATION : Senior Site Reliability...


  • Jakarta, Indonesia Global Tiket Network Full time

    We think you also hate when travel app is giving you a headache, right? A slight misinformation can ruin the trip. - That is exactly what we are tackling as t-fam! Making sure that our 17+ million users have the best experience in crafting their own adventure. LI-Hybrid Catch the sunrise on the top of Padar Island and see fascinating views of the boundless...


  • Jakarta, Indonesia PT ALTO Network Full time

    COMPANY DESCRIPTION ALTO Network is a leading payment infrastructure provider as well as the pioneer in payment solution by always bringing the most innovative and impactful technology to connect merchants or financial institutions with their customers to grow their businesses nationwide and beyond. DESIGNATION : Senior Site Reliability Engineer...


  • Jakarta, Jakarta, Indonesia Fazz Full time

    About the RoleThe Site Reliability Engineering (SRE) team architects, builds, and maintains the rock-solid infrastructure that applications rely on. We work closely with development teams to ensure scalability, reliability, and efficiency. This collaboration empowers us to deliver exceptional customer experiences while enabling developers to focus on...


  • Jakarta, Jakarta, Indonesia StraitsX Full time

    About the RoleThe Site Reliability Engineering (SRE) team architects, builds, and maintains the rock-solid infrastructure that applications rely on. We work closely with development teams to ensure scalability, reliability, and efficiency. This collaboration empowers us to deliver exceptional customer experiences while enabling developers to focus on...


  • Jakarta, Indonesia Hukumonline.com Full time

    Manage and optimize cloud infrastructure on AWS, GCP, and Azure. - Administer and maintain Linux-based systems, ensuring their stability and security. - Implement and maintain observability solutions, including OpenTelemetry, HoneyComb, and Sentry, to monitor system performance and diagnose issues. - Configure and optimize content delivery networks, with a...

  • Site Enginering

    1 day ago


    Menteng, Jakarta, Indonesia PT. GOMEDS NETWORK Full time

    PT. GOMEDS NETWORKSite Engineer – Telekomunikasi BTS 4GPosisi : Engineer BTS/VSAT/MW/CME/PowerRuang Lingkup Utama :Instalasi, konfigurasi, O&M BTS/RBS 4G ; Instalasi & O&M VSAT ; Microwave/Transmission sebagai backhaul BTS & POI ; CME Site: tower, pondasi, grounding, infrastruktur fisik; Power System: PLN, PLTS, genset, rectifier, baterai; Penerapan K3 &...