Illumio Logo

Illumio

Staff Site Reliability Engineer

Job Posted 10 Days Ago Posted 10 Days Ago
Be an Early Applicant
Sunnyvale, CA
Senior level
Sunnyvale, CA
Senior level
The Staff Site Reliability Engineer will design, deploy, and maintain scalable cloud infrastructure, ensuring reliability and security of services while collaborating with development teams.
The summary above was generated by AI

Location: Onsite, Sunnyvale, California (5 days a week in the office)
Onwards Together!

Illumio is the leader in ransomware and breach containment, redefining how organizations contain cyberattacks and enable operational resilience. Powered by the Illumio AI Security Graph, our breach containment platform identifies and contains threats across hybrid multi-cloud environments – stopping the spread of attacks before they become disasters.
Recognized as a Leader in the Forrester Wave™ for Microsegmentation, Illumio enables Zero Trust, strengthening cyber resilience for the infrastructure, systems, and organizations that keep the world running. 

Our Team's Vision 

The Cloud Operations team at Illumio is working to deploy and manage our SaaS services by reducing human error, aggressively focusing on automation, and providing deep insight into application behavior and health! We do that by incorporating aspects of software engineering and applying them to infrastructure and operations problems to create and manage scalable and reliable distributed software systems.

Your Impact: 

We are looking for a backend/platform or SRE engineer with a demonstrated track record of building secure, large scale, highly available services using automation and Infrastructure as Code, who is well versed in cloud architecture (with a focus on Kubernetes), and loves to delight the engineers they support.

This engineer will be an essential member of our Operations team, collaborating with the Platform and Data engineers to deliver the latest Illumio products.

The Cloud Platform SRE Engineer will be responsible for designing and deploying scalable, reliable, and secure cloud infrastructure. This individual must have a thorough understanding and experience with AWS and/or Azure clouds. The platform will be based on Kubernetes and is built using cloud native technologies. The Cloud Platform SRE is responsible for building, operating, and maintaining this platform. They are responsible for defining and meeting Platform SLOs, capacity utilization, cost visibility, security compliance etc. They are highly critical to the success of the Multi-cloud Platform.

Key responsibilities:

  • Driving reliability improvements back into applications

  • Building code to resolve reliability/resiliency issues

  • Mentor and educate team members to aid in strengthening technical expertise

  • Collaborate closely with cloud architects to drive cloud solutions

  • Curating proper SLI/SLOs to accurately measure or assess error budgets

  • Embed with the development teams to assist with cloud methodologies when developing products to ensure that the deliverable is as reliable as possible

  • Work with development teams to build and strengthen application security and compliance

  • Manage high impact situations that involve technically challenging issues across diverse audiences and drive to find the root cause, mitigate, and identify a solution

  • Focus on observability

Your Toolkit: 

  • Bachelor's degree in Computer Science, Engineering, or related field; or equivalent work experience

  • 6+ years of relevant SRE, DevOps, Platform or Infrastructure Engineering experience.

  • 4+ years in production support role in a fast-paced industry/organization

  • Experience deploying, tuning, and maintaining Linux-based, highly available, fault-tolerant web platforms in public cloud providers such as AWS, Azure, and GCP

  • Common monitoring, log aggregation, and metrics gathering platforms experience (Icinga, Sensu, Splunk, Telegraf/InfluxDB, et. al.)

  • Configuration management & orchestration tools experience like Chef, Ansible, and AWS Services & APIs, or equivalent

  • Experience scripting/coding with Python, Java, Ruby and/or Go.

  • Experience with MySQL, PostgreSQL, Redis, or similar

  • Solid knowledge of Linux operating system, Ubuntu, RHEL, OEL7 is required

  • EKS and/or AKS frameworks

  • Knowledge/Experience of Incident Management/on-call: PagerDuty

  • Knowledge of Database Technologies, Release Management, REST, SRE, etc.

  • Load balancers/ Traffic manager knowledge

  • Experience working with Kubernetes, Docker, or other virtualization & containerization technologies

  • Networking basics and trouble shooting skills

  • Good understanding of Production deployment, Distributed Environments required

  • Strong problem solving and operational process skills, attention to detail

  • Application support and debugging experience in a dynamic fast-paced production environment 

  • Experience with SDLC principles, architecture and operations.

  • Experience working with senior leadership both inside and outside of engineering.

  • Ability to manage multiple tasks and competing priorities to deliver projects on schedule

  • Azure certifications such as Azure Administrator, Azure Developer, or AWS/GCP certifications are a plus

Compensation: 
$192,000.00 - $230,000.00 

Illumio believes that an environment of unique backgrounds, experiences, viewpoints, and individual contributions drives our success and makes us stronger together. We are dedicated to creating and maintaining a diverse culture and emphasizing inclusion and belonging. #LI-KD1 #LI-ONSITE
Our Commitment: 

Illumio believes that an environment of unique backgrounds, experiences, viewpoints, and individual contributions drives our success and makes us stronger together. We are dedicated to creating and maintaining a diverse culture and emphasizing inclusion and belonging.    

 

All official job offers from our company are extended directly by our recruitment team and will be sent through an official DocuSign document for your review and signature. Please be aware that we do not ask for any personal information in the process of extending offers of employment, such as financial details or social security numbers. Upon acceptance of any offer, we will request such information as part of the onboarding process prior to or on your first day of employment, and only after completing a background check through an authorized third-party vendor. If you receive any communication asking for personal details outside of these processes, please contact us immediately to verify the authenticity of the request. Your security is important to us, and we are committed to a safe and transparent hiring experience. 

Top Skills

Ansible
AWS
Azure
Chef
Docker
Go
Icinga
Influxdb
Java
Kubernetes
Linux
MySQL
Postgres
Python
Redis
Ruby
Sensu
Splunk
Telegraf

Illumio Sunnyvale, California, USA Office

920 De Guigne Dr, Sunnyvale, CA, United States, 94085

Similar Jobs

4 Days Ago
Hybrid
San Francisco, CA, USA
250K Annually
Senior level
250K Annually
Senior level
Cloud • Greentech • Other • Energy
As a Senior Site Reliability Engineer on the Observability team, you will enhance observability systems, collaborate with teams, and improve infrastructure reliability.
Top Skills: AnsibleCircleCICloud FormationDockerGithub ActionsGitlab Ci/CdGoKubernetesPythonTerraform
4 Days Ago
Remote
5 Locations
151K-297K Annually
Senior level
151K-297K Annually
Senior level
Big Data • Cloud • Software • Database
As a Staff Engineer in the InfraSec team, you'll lead cloud security solutions, manage tools, automate monitoring, and guide a small team of SREs.
Top Skills: AnsibleAWSAzureCloudFormationGCPGoTerraform
17 Days Ago
Remote
5 Locations
147K-289K Annually
Expert/Leader
147K-289K Annually
Expert/Leader
Big Data • Cloud • Software • Database
Seeking a Site Reliability Engineer with strong networking skills to build and maintain secure infrastructure for service communication. Involves collaboration, support, and 24/7 on-call participation.
Top Skills: AWSAzureBgpCloud ComputingDnsGCPKubernetesLoad BalancingSdnService MeshTcp/IpTls

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine
By clicking Apply you agree to share your profile information with the hiring company.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account