Find a new job at a remote-friendly company. If your company is hiring, it's $20 free for a limited time to list a job on Remote Friendly for 60 days.

← View all available jobs

Site Reliability Engineer

We are a diverse team from around the world working together on a mission to make DuckDuckGo the world’s most trusted search engine, and we need your help!

Join us as a site reliability engineer at DuckDuckGo and and become part of the team shaping our growing infastructure. As a member of our small Operations team you will work together with your peers to keep the search engine online, stable and fast. You will leverage your expertise to challenge our assumptions about the reliability of our deployment and the effectiveness of our processes as we strive to improve.

DuckDuckGo is a remote company and our employees live all over the world! We empower our team with personal autonomy on team projects. This means you must be self-directed and self-motivated to succeed. If that seems awesome and you identify with our core values — build trust, question assumptions, and validate direction — then you’ll fit right in.

What you will do:

  • Lead high complexity projects from scoping to deployment to production
  • Develop effective tools, alerts, and responses to identify and address reliability risks
  • Work closely with search engineers to triage production issues and determine appropriate remediation including code changes and performance considerations
  • Share the burden of on call responsibilities - collaborating with other engineers to triage and fix reliability issues that come up in production and autonomously put out fires that may come up
  • Help determine the future technical direction of our deployment with an effort to improve reliability and performance

What we are looking for:

  • Significant experience as a site reliability engineer (around 2+ years).
  • Ability to root cause sources of instability of high-traffic, distributed systems.
  • Experience with configuration and troubleshooting of Linux and NGiNX.
  • Strong understanding of reliability challenges of large-scale deployments.
  • Moderate to advanced programming experience (preferably in a high level language like Perl or Python).
  • Effective project management skills.
  • Strong decision makers. You can make a decision when faced with competing priorities and limited information.
  • Someone interested in the why not just the how. You like to analyze situations and won’t be satisfied with a shallow analysis.
  • Creative problem solvers and risk takers. You like to take initiative in pushing a project forward but can make adjustments based on team feedback.
  • Strong communication skills. You can validate and communicate your decisions clearly.

Other things to know:

  • We are a small, remote team in different time zones and communicate with a variety of tools throughout the day. You should feel comfortable with the intricacies of this type of work situation.
  • Sometimes we meet up! You can expect to travel at least 2x a year: once for our all-hands meetup and another for a team retreat (each ~4-5 days)
  • We want to have a major impact on raising the standard of trust online. To do this we believe in a focused approach, with company-wide objectives, and with each team member working on a single top priority at a time.
  • Our work philosophy is built upon empowered project management. All team members have opportunities to run projects.
  • All projects are run transparently, and we encourage everyone to participate in areas of interest throughout the company. Anyone and everyone can (and should) ask questions and offer feedback around the product and internal projects.
  • We try to exemplify our values (build trust, question assumptions, and validate direction) in everything we do.