Plume Design, Inc

Staff Site Reliability Engineer

Reposted 14 Days Ago

Be an Early Applicant

Palo Alto, CA

Senior level

Palo Alto, CA

Senior level

Supervise Site Reliability Engineering team, manage deployments and production infrastructure. Focus on customer satisfaction and technical leadership with strong DevOps/SRE knowledge.

The summary above was generated by AI

Life at Plume

At Plume, we believe that technology isn't about moving faster, it's about making life’s moments better. Which is why we’ve built the world's first, and only, open and hardware-independent service delivery platform for smart homes, small businesses, enterprises, and beyond. Our SaaS platform uses WiFi, advanced AI, and machine learning to create the future of connected spaces—and human experiences—at massive scale.

We now deliver services to over 50 million locations globally and have managed over 2.5 billion devices on our platform. We’re expanding rapidly, pioneering a new category, and we achieved our Series F funding in just four years. Our customers include many of the world's largest Communications Service Providers (CSPs) who look to Plume to help them evolve their smart home offerings while gleaning insights from their own data.

With a bias for action and a love for being trailblazers, the team at Plume embodies a combination of relentless curiosity and imaginative innovation. We challenge ourselves to think in ways that other companies don't, work to do what should be done (rather than what can), and if we can’t do it exceptionally well, we don’t do it. It’s how we've assembled a team of world-class builders, thinkers, and doers. And it’s how we’re reinventing what’s possible every day.

Opportunity

We’re looking for a seasoned SRE, experienced with Customer Facing environments, to provide Technical Leadership for our Site Reliability Engineering Team. This team is focused on deployments, Production Infrastructure, Availability and Reliability. The right candidate has held several Infrastructure-oriented roles and needs to have strong technical knowledge in the DevOps/SRE technology stack while focusing on customer satisfaction.

What You’ll Do:

Supervise a team of Site Reliability Engineers who provide first-line support to Customer Clouds. Deployments, On-call, Application Provisioning are some of the routine tasks.
Run stand ups for the team, ticket management
Participate in the Sprints and close tickets with the team
Attend and conduct customer Meetings for Project and Roadmap specification.
Be able to step in and execute or triage issues. Some examples are as follows:

Provision and scale Kubernetes Infrastructure and Applications (EKS)
Deploy Software in multiple Production Environments
Own monitoring and alerting to production systems, improvements and changes
Contribute improvements to the current automation
Contribute improvements to our on-call process and alerting

What You’ll Bring

4+ years of Kubernetes Knowledge (operate)
2+ years of Terraform Knowledge
Experience both setting up and utilizing Monitoring and observability tools

e.g. New Relic, Nagios/Icinga, Grafana, Prometheus

2+ years of experience Programming/Scripting - one of the following

eg. Perl, Python, PHP, GoLang, Java, etc

8+ years of experience with modern Linux Operating systems
6+ years of experience with modern cloud infrastructure, preferably AWS
Availability to be in on-call rotation for Production issues
Availability to work with a distributed team in different timezones
Advanced communication skills
Experience leading efforts and reporting up

Desired Skill Set

10+ Years of experience with Production Troubleshooting
4+ Years of experience leading teams
Executive Communication skills
Bachelor’s degree in related field or equivalent experience, Advanced degree preferred.
This is a leadership role, but you must have Technical knowledge and working experience with:

Kubernetes (operate)
Basic Terraform Knowledge
Experience Programming/Scripting - one of the following (eg. Perl, Python, PHP, GoLang, Java, etc)
Experience with modern cloud infrastructure, preferably AWS
Experience with modern Linux Operating systems (Enterprise Linux or Debian based)
Experience both setting up and utilizing self-managed Monitoring and observability tools (e.g. Nagios/Icinga, Grafana, Prometheus)

Differentiators

Troubleshooting production performance/service degradation or outage issues at scale
Experience with Infrastructure Troubleshooting in VMs and/or Bare Metal (ssh/Linux)
Advanced Kubernetes knowledge
Advanced Terraform knowledge
Customer Facing experience in previous roles
Experience operating Kafka in Production
Experience operating NoSQL Databases in Production
Experience operating Relational Databases in Production
Configuration Management experience

Kindly note that this is a HYBRID position, with a requirement to work in the office 3 days a week. We’re looking for candidates who are within a commutable distance. At this time, we are unable to provide relocation assistance or visa sponsorship.

Total Compensation package would include: anticipated compensation range of $177,000 - $208,000 + bonus + equity + benefits. Benefits include: a 401k plan and a company match, basic life insurance plus unparalleled health, dental, vision and other benefits and perks. For more details please see: https://www.plume.com/careers

An employee’s base salary and its position within the range may depend on a number of factors including job related knowledge, education, skills, experience and other business related considerations. Published ranges are provided in good faith at the time of posting.

About Plume

As the creator of the only open, hardware-independent, cloud-controlled experience platform for CSPs and their subscribers, Plume partners with over 350 CSP customers, including some of the world’s largest such as Comcast, Charter, Liberty Global, and J:COM.

Using OpenSync, the most widely supported open-source, silicon-to-cloud framework for smart spaces, Plume’s software-defined network allows CSPs to decouple their service offerings from hardware and rapidly curate and deliver new services over a multi-vendor, open-platform architecture.

Backed by investors such as Insight Partners and SoftBank Vision Fund 2, Plume is now valued at $2.6B, having added over $500M in funding in 2021 alone.

Plume is an equal opportunity workplace that maintains a continuing policy of nondiscrimination in all employment practices and decisions, ensuring equal employment opportunities for all qualified individuals without regard to race, color, creed, religion, sex, national origin, age, physical or mental disability, sexual orientation, gender identity, marital status, pregnancy, childbirth or related individual conditions, medical conditions (as defined by state law), military or veteran status, or any other characteristic protected by federal, state or local law.

Top Skills

AWS

Grafana

Java

Kubernetes

Linux

Nagios

New Relic

Perl

PHP

Prometheus

Python

Terraform

325 Lytton Ave,, Palo Alto, CA, United States, 94301

Similar Jobs

Cisco Meraki

Lead Site Reliability Engineer, Network - Remote

19 Hours Ago

Easy Apply

Remote

Hybrid

Easy Apply

148K-236K Annually

Senior level

148K-236K Annually

Senior level

Hardware • Information Technology • Security • Software • Cybersecurity • Conversational AI

As a Lead Site Reliability Engineer, you will enhance cloud infrastructure, automate operations, and troubleshoot complex production issues in a secure environment.

Top Skills: AnsibleAWSBashChefDirect ConnectDockerGoKubernetesPuppetPythonRestRubyScalaSoapTlsTransit GatewayUnix/LinuxVpc

Illumio

Staff Site Reliability Engineer

3 Hours Ago

Sunnyvale, CA, USA

Senior level

Software • Cybersecurity

The Staff Site Reliability Engineer will design, deploy, and maintain scalable cloud infrastructure, ensuring reliability and security of services while collaborating with development teams.

Top Skills: AnsibleAWSAzureChefDockerGoIcingaInfluxdbJavaKubernetesLinuxMySQLPostgresPythonRedisRubySensuSplunkTelegraf

NVIDIA

Principal Staff Site Reliability Engineer - CDN

2 Days Ago

Remote

Expert/Leader

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse

As a Principal Site Reliability Engineer, you'll lead CDN management, design efficient distributed systems, mentor engineers, and drive innovation in AI-based enterprise products.

Top Skills: AWSAzureCdnDnsGoogleHttp/SPythonSplunkTcp/IpTlsUnix/Linux

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Apply Save

By clicking Apply you agree to share your profile information with the hiring company.

Plume Design, Inc

Staff Site Reliability Engineer

Top Skills

Plume Design, Inc Palo Alto, California, USA Office

Similar Jobs

Lead Site Reliability Engineer, Network - Remote

Staff Site Reliability Engineer

Principal Staff Site Reliability Engineer - CDN

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech