Site Reliability Engineer I
7 months ago
At AccelByte, our mission is to empower game creators by providing them with the backend platform and tools required to make scalable, reliable AAA-quality games. The company was founded in 2016 by industry veterans who have engineered online systems for some of the largest game and distribution platforms in the world including Fortnite, Epic Store, Xbox Live, PlayStation Network, and EA Origin. We are backed by top investors including Softbank, Sony Interactive Entertainment, Galaxy Interactive, NetEase, and Krafton. Our latest Series B funding has firmly solidified our place as a top player in the gaming industry. AccelByte's talent has decades of experience building and shipping some of the largest game and distribution platforms in the world.
We believe that the best companies empower employees to make decisions, obsess about the best user experience, and are not afraid to make and learn from their mistakes. Our culture is based on humility, openness to feedback, drive, and collaboration, which we feel results in the best performing teams. As a company that values diversity, inclusion, and employee growth, our employees have opportunities to work with and learn from teams all over the world. We offer competitive salaries, a full range of health benefits, social activities, career growth opportunities, and an amazing team. Come join us
**Position Summary**
AccelByte is seeking an SRE/ Cloud Engineer I - Incident Response for our 24x7 operations team dedicated to AAA multiplayer video games. This position requires a driven individual who can maintain the high reliability of the service, identify, and mitigate service-impacting problems. Coding knowledge is necessary for routine task automation and root cause analysis.
**Essential Functions/Responsibilities**
The SRE/ Cloud Engineer I - Incident Response is accountable for the following functions and responsibilities:
- Collaborate within a LiveOps/L3 support team, covering shifting schedules.
- Proactively ensure production uptime, stability, and resiliency while providing constructive feedback on coworkers' changes.
- Ensure the continuous availability, performance, security, and scalability of infrastructure components, adhering to platform SLA.
- Assist in Root Cause Analysis and identify solutions to production events.
- Provide modern Infrastructure as Code (IaC) principles, identifying efficiency opportunities through automation and process improvement.
- Utilize modern Infrastructure as Code (IaC) principles, and identify opportunities for efficiencies by leveraging automation and process improvement.
- Contribute to the development of automation solutions, streamlining tasks, enhancing efficiency, and minimizing manual effort.
- Engage in direct communication with clients, understanding their needs and providing valuable support as a team member.
- Meet requirements for engineering excellence.
- Perform other duties as assigned.
**Qualifications/Experience Required**
- Bachelor's Degree background or relevant work experience, certification, or courses
- At least 1 year of experience specializing in operations and reliability automation, with a focus on a variety of modern infrastructure and operational technologies, including Linux and AWS Cloud Infrastructure.
- Basic experience in incident management, emphasizing prompt service restoration after incidents, alongside adept problem-solving during production events and compliance with incident management processes.
- Basic experience in performing cloud system operations on an AWS environment.
- Basic experience in cloud monitoring, logging, and APM solutions, with exposure to monitoring tools such as Prometheus, Grafana, and Datadog.
- Basic experience in Kubernetes and Docker: hands-on experience with many AWS services such as EC2, EKS, S3, ELB, RDS, DocDB, OpenSearch, ElastiCache, EBS, CloudFront, CloudWatch, CloudTrail, etc.
- Practical knowledge of scripting in programming languages such as Python, Bash, GoLang, etc.
- Practical knowledge of using support ticketing solutions like Jira Helpdesk and Zendesk, with effective communication and collaborative problem-solving skills.
- Practical knowledge of problem-solving abilities under pressure during production events, ensuring compliance with incident management processes.
- Practical knowledge of Infrastructure as Code (IAC) using Terraform and/or CloudFormation.
- Practical knowledge of CI/CD tooling and pipeline. Primarily Gitlab, Jenkins, and Flux.
- Practical knowledge of similar products or services offered by AccelByte, preferably in a AAA game studio or software product company. Expected to acquire practical knowledge of how AccelByte's products are hosted within the infrastructure upon joining.
- Solid understanding and implementation of security best practices is a big plus.
- A good understanding of DevSecOps, Cloud, microservices, and containers is a big plus.
- Familiarity with web services patterns/architectures (REST, SOAP
-
Site Reliability Engineer
2 days ago
Jakarta, Indonesia Abhidi Solution Private Limited Full time**Responsibilities**: - Administer production related jobs - Address production issue - Improve system reliability through configuration or code changes - System monitoring and improve system observability - Remove toil and automate whenever possible - Problem solving, including troubleshoot a production issue **Skills**: - Experience with cloud...
-
Site Reliability Engineer
7 months ago
Jakarta, Indonesia Pro Sigmaka Full timeWe established at 2012. With experience in several industry sectors, a broad portfolio and technology platform as well as bringing a dedicated and highly qualified team, enabling the talent we provide to provide fast and responsive services, making it the best choice for companies that want to increase the usability of their businesses. OUR SERVICES -...
-
Site Reliability Engineer
6 days ago
Jakarta, Indonesia Ajaib Full timeCompany Description **Job Description**: - Perform day-to-day operations to support developers and DevOps. - Create end-to-end monitoring, logging, and alerting system. - Provide technical assistance to improve system performance, capacity, reliability and scalability - Perform root cause analysis of reliability issues. - Document every action so your...
-
Site Reliability Engineer
2 months ago
Jakarta, Indonesia PT. Amalura Multi Dimensi Full timeManage and optimize cloud infrastructure (AWS, GCP, Azure). - Administer Linux system, ensuring stability and security. - Implement observability (e. g, OpenTelemtry, HoneyComb, Sentry) to monitor performance. - Optimize content delivery networks (e. g., Akamai) to enhance user experience. - Design monitoring, alerting, and incident response procedure for...
-
Site Reliability Engineer
9 months ago
Jakarta, Indonesia PT Tiga Daya Digital Indonesia (Eksad Technology) Full timeTiga Daya Digital Indonesia, a susidiary company of Triputra Group and DCI Group To be IT partner to enable client growth rapidly. Eksad Providing Services High Quality Based on Strong Experience in the industry and technology. Building the right IT Service Solution to enable it Partners in speeding up business development based on digital technology by...
-
Site Reliability Engineer
7 months ago
Jakarta, Indonesia Digital Muda Solutions Full timeDeskripsi: - Menjaga ketersediaan, kehandalan, dan performa sistem dengan fokus pada infrastruktur teknis, keamanan, dan skala pengguna. - Berkolaborasi dengan tim pengembangan dan operasi untuk merancang, menguji,dan menerapkan praktik terbaik dalam infrastruktur teknologi, serta melakukan perbaikan dan peningkatan sesuai kebutuhan. - Memastikan integrasi...
-
Site Reliability Engineer
4 days ago
Jakarta, Indonesia Catalyst Tech Full timeAt Catalyst, People are the heartbeat for our company. We believe that good quality people will have a positive impact to our business. We are looking for a **Site Reliability Engineer / DevOps** to join our growing team. If you are passionate about being part of the team, building some of the most critical products, Working alongside teams in the industry...
-
Site Reliability Engineer
3 months ago
Jakarta, Indonesia Paymentology Full timePaymentology is the first truly global issuer-processor, giving banks and fintechs the technology, team and experience to rapidly issue and process Mastercard, Visa and UnionPay cards across more than 50 countries, at scale. Our advanced, multi-cloud platform, offering both shared and dedicated processing instances, vast global presence and richer,...
-
Site Reliability Engineer
9 months ago
Jakarta, Indonesia PT Salva Teknologi Digital Full timeSite Reliability Engineer (Junior) - Applicants should have sufficient qualification and relevant experiences in the respective fields "Waspada terhadap Modus Penipuan pada saat proses interview. Perusahaan tidak akan memungut biaya apapun dalam melakukan proses interview. Mohon segera melaporkan ke kami, jika pada saat Anda diundang untuk interview dan...
-
Site Reliability Engineer
3 months ago
Jakarta, Indonesia Hukumonline.com Full timeManage and optimize cloud infrastructure on AWS, GCP, and Azure. - Administer and maintain Linux-based systems, ensuring their stability and security. - Implement and maintain observability solutions, including OpenTelemetry, HoneyComb, and Sentry, to monitor system performance and diagnose issues. - Configure and optimize content delivery networks, with a...
-
Site Reliability Engineer
7 months ago
Jakarta, Indonesia AccelByte Full timeAt AccelByte, our mission is to empower game creators by providing them with the backend platform and tools required to make scalable, reliable AAA-quality games. The company was founded in 2016 by industry veterans who have engineered online systems for some of the largest game and distribution platforms in the world including Fortnite, Epic Store, Xbox...
-
Site Reliability Engineer
7 months ago
Jakarta, Indonesia AccelByte Full timeAt AccelByte, our mission is to empower game creators by providing them with the backend platform and tools required to make scalable, reliable AAA-quality games. The company was founded in 2016 by industry veterans who have engineered online systems for some of the largest game and distribution platforms in the world including Fortnite, Epic Store, Xbox...
-
Site Reliability Engineer
7 months ago
Jakarta, Indonesia AccelByte Full timeAt AccelByte, our mission is to empower game creators by providing them with the backend platform and tools required to make scalable, reliable AAA-quality games. The company was founded in 2016 by industry veterans who have engineered online systems for some of the largest game and distribution platforms in the world including Fortnite, Epic Store, Xbox...
-
Senior Site Reliability Engineer
7 months ago
Jakarta, Indonesia DKatalis Full time**Site Reliability Engineer**: **About DKatalis** DKatalis is a financial technology company with multiple offices in the APAC region. In our quest to build a better financial world, one of our key goals is to create an ecosystem linked financial services business. DKatalis is built and backed by experienced and successful entrepreneurs, bankers, and...
-
Site Reliability Engineer
7 months ago
Jakarta, Indonesia PT Astra Digital Mobil (mobbi) Full timeJob Description: - Maintain system availability, reliability and performance by focusing on technical infrastructure, security and user scale. - Collaborate with development and operations teams to design, test, and implement best practices in technology infrastructure, and make fixes and improvements as needed. - Conduct in-depth analysis of incidents and...
-
Site Reliability Engineer(DevOps)
7 months ago
Jakarta, Indonesia Digital Muda Solutions Full timeDeskripsi: - Menjaga ketersediaan, kehandalan, dan performa sistem dengan fokus pada infrastruktur teknis, keamanan, dan skala pengguna. - Berkolaborasi dengan tim pengembangan dan operasi untuk merancang, menguji,dan menerapkan praktik terbaik dalam infrastruktur teknologi, serta melakukan perbaikan dan peningkatan sesuai kebutuhan. - Memastikan integrasi...
-
Senior Site Reliability Engineer
7 months ago
Jakarta, Indonesia AccelByte Full timeAt AccelByte, our mission is to empower game creators by providing them with the backend platform and tools required to make scalable, reliable AAA-quality games. The company was founded in 2016 by industry veterans who have engineered online systems for some of the largest game and distribution platforms in the world including Fortnite, Epic Store, Xbox...
-
Site Reliability Engineer
1 week ago
Jakarta, Indonesia Zenius Education Full timeDesign and implement the architecture of the next generation of automated infrastructure following Infrastructure as a Code model. Build and maintain container native CI/CD pipelines. Build tools and automation to improve system’s observability, availability,reliability. Design & Implement observability stack for the infrastructure - System/Application...
-
Site Reliability Engineer
7 months ago
Jakarta, Indonesia AccelByte Full timeAt AccelByte, our mission is to empower game creators by providing them with the backend platform and tools required to make scalable, reliable AAA-quality games. The company was founded in 2016 by industry veterans who have engineered online systems for some of the largest game and distribution platforms in the world including Fortnite, Epic Store, Xbox...
-
Site Reliability Engineer
2 days ago
Jakarta, Indonesia Shopee Full timeDepartmentEngineering and Technology- LevelEntry Level- LocationIndonesia - JakartaThe Engineering and Technology team is at the core of the Shopee platform development. The team is made up of a group of passionate engineers from all over the world, striving to build the best systems with the most suitable technologies. Our engineers do not merely solve...