Site Reliability Engineer

Palo Alto, CA, US | San Francisco, CA, US
Regular
Engineering
1737568
Two women posing in front of a green screen.
Two women sitting at a table in a group setting.
Decorative light letters that spell PINS with a woman sitting on the floor next to them.
View of conference attendee from behind wearing a grey hat.
Decorative light letters that spell PINS.
A lush office patio with furniture overlooking a neighborhood in the city.
Various flyers for women's groups at Pinterest.
A view of Pinterest Toronto office common area.
A colorful art installation.

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

The Site Reliability Engineering organization at Pinterest is accountable for ensuring overall Pinterest availability as well as enhancing Engineering teams’ capability to design, build and operate robust systems at scale

Pinterest’s applications and infrastructure that handle billions of monthly page views and petabytes of data as Pinterest continues to grow and scale. As a Pinterest SRE, you will design and build systems, platforms, tools, frameworks and methodologies to assure the reliability of our large-scale distributed systems.

What You’ll Do:

  • Develop software solutions to enable relailbity and operability of large scale distributed systems handling petabytes of data and serving 
  • Build a deep understanding of how Pinterest’s systems behave, scale, interact and fail, and use that insight to identity risks and opportunities for remediation
  • Build tools and automation to eliminate toil and reduce operational overhead. Create frameworks, processes and best practices to be used across Pinterest Engineering
  • Build meaningful, insightful and actionable SLIs
  • Automate critical portions of Pinterest’s engineering processes, to minimize risk and maximize the speed of innovation
  • Manage capacity and performance to help scale our infrastructure both on public and private clouds around the world

What We’re Looking For:

  • Strong knowledge of Linux/Unix/BSD internals and experience working with open source software (e.g. MySQL, Hadoop, Envoy, HAProxy, Nginx)
  • Experience with technologies such as ElasticSearch, ZooKeeper, HBase, Hadoop, Memcache and Kafka with a focus on reliability, automation, operability and performance
  • 2+ years of experience with programming languages (Python, Golang, Ruby, etc.)
  • Infrastructure as code a plus (e.g. Terraform, Puppet, Chef, Ansible, Salt, Fabric, Docker, etc)
  • Bonus points if experienced with deploying web apps to cloud infrastructure (AWS, etc.) and working with distributed, service-oriented architecture 

 

#LI-SG1

We’re growing rapidly throughout Latin America and Mexico! Check out our open positions and learn more about Pinterest’s exciting opportunities in these regions.

It all starts here.
Meet the people