Site Reliability Engineer
2 days ago
At AccelByte, our mission is to empower game creators by providing them with the backend platform and tools required to make scalable, reliable AAA-quality games. The company was founded in 2016 by industry veterans who have engineered online systems for some of the largest game and distribution platforms in the world including Fortnite, Epic Store, Xbox Live, PlayStation Network, and EA Origin. We are backed by top investors including Softbank, Sony Interactive Entertainment, Galaxy Interactive, NetEase, and Krafton. Our latest Series B funding has firmly solidified our place as a top player in the gaming industry. AccelByte's talent has decades of experience building and shipping some of the largest game and distribution platforms in the world.
We believe that the best companies empower employees to make decisions, obsess about the best user experience, and are not afraid to make and learn from their mistakes. Our culture is based on humility, openness to feedback, drive, and collaboration, which we feel results in the best performing teams. As a company that values diversity, inclusion, and employee growth, our employees have opportunities to work with and learn from teams all over the world. We offer competitive salaries, a full range of health benefits, social activities, career growth opportunities, and an amazing team. Come join us
**Position Summary**
As an SRE/Cloud Engineer, your primary responsibility revolves around enhancing the observability of our infrastructure. You play an important role in strategically optimizing resources and driving initiatives to ensure effective infrastructure management aligned with business objectives. Your focus lies in implementing tools and practices that enable comprehensive monitoring, logging, and tracing of system components and processes. By doing so, you contribute to improving system reliability, troubleshooting efficiency, and overall operational transparency.
**Essential Functions/Responsibilities**
The SRE/Cloud Engineer is accountable for the following functions and responsibilities:
- Configure and maintain monitoring tools (Prometheus, Grafana, AWS CloudWatch) for real-time visibility into system performance and health.
- Enhance observability strategies and tools to monitor the performance, availability, and reliability of distributed systems.
- Maintain robust monitoring and alerting solutions for timely issue detection and resolution.
- Promote best practices in observability, including logging, tracing, and metrics collection within development teams.
- Utilize Kubernetes (K8s) for container orchestration, scalability, reliability, and efficient resource utilization.
- Assist in performance analysis, capacity planning, and optimizing system performance and resource utilization.
- Identify and address bottlenecks, inefficiencies, and potential failure points in the system.
- Assist in creating and enforcing cost control measures, monitor AWS resource utilization, and identify optimization opportunities to decrease infrastructure costs.
- Implement containerization strategies to improve deployment efficiency and resource utilization in the AWS environment.
- Contribute to the analysis of cloud resource usage patterns and identify opportunities for cost optimization.
- Perform other duties as assigned.
**Qualifications/Experience Required**
- Bachelor's Degree background or relevant work experience, certification, or courses
- At least 3 years of experience specializing in roles such as Site Reliability Engineering (SRE) or similar, with a particular focus on improving observability within distributed systems.
- Experience in designing and implementing log collection, aggregation, and visualization systems using Fluentd, Fluentbit, prom-tail, Loki & LokiQL, Logstash, OpenSearch, and AWS Athena.
- Experience in designing and implementing metric collection, aggregation, and visualization solutions using technologies like Prometheus & PromQL, Grafana, cadvisor, metric-server, and Cloudwatch.
- Practical knowledge of trace collection, aggregation, and visualization methodologies employing tools such as Grafana tempo & TraceQL, tail sampling, and open telemetry.
- Basic experience in Kubernetes, including using Kubectl, flux, and other tools for debugging and modifying cluster states and understanding containerization technology's limitations and usage within a Kubernetes cluster.
- Basic experience in using Infrastructure-as-Code (IaC) tools (e.g., Terraform, Cloudformation) for provisioning and configuration management, including the ability to apply, modify, or delete modules and create custom Terraform modules.
- Basic experience in performing cloud system operations on AWS infrastructure, including backups, snapshots, and other administrative tasks.
- Practical knowledge of defining budgets, forecasting expenses, and building automated tools to identify cost trends and anomalies for cloud infrastructure
- Understanding of distributed systems
-
Site Reliability Engineer
6 days ago
Jakarta, Indonesia AccelByte Full timeAt AccelByte, our mission is to empower game creators by providing them with the backend platform and tools required to make scalable, reliable AAA-quality games. The company was founded in 2016 by industry veterans who have engineered online systems for some of the largest game and distribution platforms in the world including Fortnite, Epic Store, Xbox...
-
Site Reliability Engineer
7 days ago
Jakarta, Indonesia Pro Sigmaka Full timeWe established at 2012. With experience in several industry sectors, a broad portfolio and technology platform as well as bringing a dedicated and highly qualified team, enabling the talent we provide to provide fast and responsive services, making it the best choice for companies that want to increase the usability of their businesses. OUR SERVICES -...
-
Site Reliability Engineer
1 week ago
Jakarta, Indonesia PT. Amalura Multi Dimensi Full timeManage and optimize cloud infrastructure (AWS, GCP, Azure). - Administer Linux system, ensuring stability and security. - Implement observability (e. g, OpenTelemtry, HoneyComb, Sentry) to monitor performance. - Optimize content delivery networks (e. g., Akamai) to enhance user experience. - Design monitoring, alerting, and incident response procedure for...
-
Site Reliability Engineer Remote
2 weeks ago
Jakarta, Indonesia Kalibrr Full timeYour main responsibilities as a Site Reliability Engineer at Kalibrr are: Engage in and improve the whole lifecycle of the Kalibrr services-design, deployment, operation, and refinement. Practice incident response and blameless postmortems. Participate in an on-call rotation Scale systems and operations through automation. Maintain services by monitoring...
-
Site Reliability Engineer
6 days ago
Jakarta, Indonesia PT Tiga Daya Digital Indonesia (Eksad Technology) Full timeTiga Daya Digital Indonesia, a susidiary company of Triputra Group and DCI Group To be IT partner to enable client growth rapidly. Eksad Providing Services High Quality Based on Strong Experience in the industry and technology. Building the right IT Service Solution to enable it Partners in speeding up business development based on digital technology by...
-
Senior Site Reliability Engineer
1 week ago
Jakarta, Indonesia Amartha Full timeAmartha is embarking on an exciting new journey and is in need of experienced engineers to work with senior management, existing engineers, and product in shaping the next wave of innovative product offerings, ensuring Amartha leapfrogs into the next phase of its journey! Job Description: As a Site Reliability Engineer (SRE) you will combines software and...
-
Site Reliability Engineer
1 week ago
Jakarta, Indonesia PT Salva Teknologi Digital Full timeSite Reliability Engineer (Junior) - Applicants should have sufficient qualification and relevant experiences in the respective fields "Waspada terhadap Modus Penipuan pada saat proses interview. Perusahaan tidak akan memungut biaya apapun dalam melakukan proses interview. Mohon segera melaporkan ke kami, jika pada saat Anda diundang untuk interview dan...
-
Site Reliability Engineer
12 hours ago
Jakarta, Indonesia AccelByte Full timeAt AccelByte, our mission is to empower game creators by providing them with the backend platform and tools required to make scalable, reliable AAA-quality games. The company was founded in 2016 by industry veterans who have engineered online systems for some of the largest game and distribution platforms in the world including Fortnite, Epic Store, Xbox...
-
Site Reliability Engineer
3 days ago
Jakarta, Indonesia AccelByte Full timeAt AccelByte, our mission is to empower game creators by providing them with the backend platform and tools required to make scalable, reliable AAA-quality games. The company was founded in 2016 by industry veterans who have engineered online systems for some of the largest game and distribution platforms in the world including Fortnite, Epic Store, Xbox...
-
Site Reliability Engineer(DevOps)
1 week ago
Jakarta, Indonesia Digital Muda Solutions Full timeDeskripsi: - Menjaga ketersediaan, kehandalan, dan performa sistem dengan fokus pada infrastruktur teknis, keamanan, dan skala pengguna. - Berkolaborasi dengan tim pengembangan dan operasi untuk merancang, menguji,dan menerapkan praktik terbaik dalam infrastruktur teknologi, serta melakukan perbaikan dan peningkatan sesuai kebutuhan. - Memastikan integrasi...
-
Site Reliability Engineer
3 days ago
Jakarta, Indonesia PT Astra Digital Mobil (mobbi) Full timeJob Description: - Maintain system availability, reliability and performance by focusing on technical infrastructure, security and user scale. - Collaborate with development and operations teams to design, test, and implement best practices in technology infrastructure, and make fixes and improvements as needed. - Conduct in-depth analysis of incidents and...
-
Site Reliability Engineer
3 days ago
Jakarta, Indonesia Zenius Education Full timeDesign and implement the architecture of the next generation of automated infrastructure following Infrastructure as a Code model. Build and maintain container native CI/CD pipelines. Build tools and automation to improve system’s observability, availability,reliability. Design & Implement observability stack for the infrastructure - System/Application...
-
Senior Site Reliability Engineer
1 week ago
Jakarta, Indonesia AccelByte Full timeAt AccelByte, our mission is to empower game creators by providing them with the backend platform and tools required to make scalable, reliable AAA-quality games. The company was founded in 2016 by industry veterans who have engineered online systems for some of the largest game and distribution platforms in the world including Fortnite, Epic Store, Xbox...
-
Site Reliability Engineer Associate
1 week ago
Jakarta, Indonesia PT ALTO Network Full timeCOMPANY DESCRIPTION ALTO Network is a leading payment infrastructure provider as well as the pioneer in payment solution by always bringing the most innovative and impactful technology to connect merchants or financial institutions with their customers to grow their businesses nationwide and beyond. DESIGNATION : Site Reliability Engineer...
-
Site Reliability Engineer
2 days ago
Jakarta, Indonesia AccelByte Full timeAt AccelByte, our mission is to empower game creators by providing them with the backend platform and tools required to make scalable, reliable AAA-quality games. The company was founded in 2016 by industry veterans who have engineered online systems for some of the largest game and distribution platforms in the world including Fortnite, Epic Store, Xbox...
-
Senior Site Reliability Engineer
3 days ago
Jakarta, Indonesia AccelByte Full timeAt AccelByte, our mission is to empower game creators by providing them with the backend platform and tools required to make scalable, reliable AAA-quality games. The company was founded in 2016 by industry veterans who have engineered online systems for some of the largest game and distribution platforms in the world including Fortnite, Epic Store, Xbox...
-
Site Reliability Engineer
2 days ago
Jakarta, Indonesia PT Sinar Mas Digital Ventures Full timeSetup, maintain and scale our infrastructure along with the business needs Collaborate closely with the software engineers and product managers Improve our infrastructure that affect to dev velocity and infra reliability, by giving any form of a technical initiative Documenting routine procedures in a playbook Bachelor's degree in Computer Science or...
-
Site Reliability Engineer
12 hours ago
Jakarta, Indonesia Flip Full time**About Flip** Rafi, Luqman, and Anjar, who were college friends in Universitas Indonesia, started Flip as a project in 2015 to transfer payments to each other at a fraction of what banks would charge them. They are pioneers in the Indonesian market, with their technology now helping millions of Indonesians, both individuals and businesses, carry out...
-
Site Reliability Engineer Manager
12 hours ago
Jakarta, Indonesia Flip Full time**About Flip** Flip helps people send money securely with the best experience in Indonesia. Individual users are able to use Flip to send money across 88 banks in Indonesia with zero cost. Meanwhile, business users can send money swiftly, securely, and at a lower cost to more than 100 banks across Indonesia. At Flip, we value fairness. We believe that we...
-
Site Engineer
6 days ago
Jakarta, Indonesia PT. City-Ad Expo Indonesia Full timeKami sedang mencari seorang Site Engineer yang berpengalaman dan berkualitas untuk bergabung dengan tim kami. Sebagai Site Engineer, Anda akan bertanggung jawab atas pengawasan dan koordinasi proyek konstruksi di lapangan. Tugas Anda meliputi pemantauan progres proyek, pengelolaan sumber daya, penyelesaian masalah, dan memastikan kepatuhan terhadap rencana...