Search jobs > Seattle, WA > Site reliability engineer

Staff Site Reliability Engineer

Moloco, Inc.
Seattle, Washington, US
Full-time

About the Role

Is this the role you are looking for If so read on for more details, and make sure to apply today.

Moloco is a machine learning company that operates at massive scale (we ingest 10 petabytes of training data per day), and our models are blazingly fast (return predictions in 10 milliseconds or less);

and a profitable unicorn (we are valued at $2 billion and have been profitable for the last 17+ quarters).

We are looking for an exceptional Senior Site Reliability Engineer to help us build a state-of-the-art ML model serving infrastructure for our mobile advertising platform.

You will be part of an engineering team that manages the infrastructure that serves deep neural network machine learning (ML) models to clients, CI / CD infrastructure to deploy infrastructure updates in real time, and develops infrastructure tools and platforms that improve the productivity of engineering teams.

We are looking for someone who is passionate about solving infrastructure problems with software engineering skills, a desire to grow and learn new technologies, a love of working in collaborative teams, and a commitment to customer service.

What you'll do

  • Play a role in engineering partner teams for company-wide infrastructure adoption and standard methodologies
  • Contribute to technical direction and decisions across the organization by conducting / leading research with other technical leaders in the organization
  • Traditional SRE / Operational support areas such as tooling and automation, monitoring, workflow management, maintaining and improving data pipelines, CI / CD, etc.
  • Actively participate in and contribute to code reviews and technical design documents to identify performance and reliability bottlenecks.
  • Partner with and support other engineering teams with operational guidance and expertise on various project initiatives.
  • Participate in capacity planning and scaling
  • Ensure that Moloco is delivered in a highly performant manner that can handle viral traffic spikes.
  • Collaborate with others in SRE and SWE to leverage tools, processes and techniques to improve service reliability.
  • Reduce business risk in areas such as infrastructure and configuration management, provisioning, capacity modeling and planning, and incident handling, mitigation, root cause analysis, and post-mortems.
  • Identify common patterns in the challenges of operating services in production, and work with others in SRE and SWE to design and implement reusable solutions and / or other cross-functional work that reduces the complexity, difficulty, cost, and risk of operating the business.

What you’ll need to succeed

  • Hands-on experience working with GCP or other cloud platforms (e.g. AWS, Azure)
  • Practical, proven knowledge of a high-level language (e.g. Go, Python)
  • Experience working with infrastructure-related software (e.g. Kubernetes, Helm, Terraform, etc.)
  • Experience developing infrastructure, configuration and deployment scripting and automation for large scale / high complexity services in a microservices environment
  • At least 5 years of experience in large-scale software development
  • Passionate about operational excellence and thrive in an environment where you are able to provide extremely high levels of customer support
  • Tenacious problem solver who takes ownership of issues from end-to-end to full resolution

J-18808-Ljbffr

6 days ago
Related jobs
Promoted
DICE
Seattle, Washington

SRE (Site Reliability Engineer) -. Requires 9+ years of software and DevOps development engineering. ...

Promoted
Apple Inc.
Seattle, Washington

Join Apple’s Cloud Service Infrastructure team as a site reliability engineer to help support and scale cloud services for thousands of development and operations engineers. As a Site Reliability Engineer, you will be responsible for providing the platform for mission critical cloud systems to maint...

Promoted
Axon
Seattle, Washington

Manager, Site Reliability Engineering. Exemplify cloud-native site reliability best practices. You are also obsessed about achieving the high quality and reliability our customers demand. You will work closely not only with the APX SRE organization, but your technical deliverables will reach the ent...

Promoted
Apple, Inc.
Seattle, Washington

We are looking for passionate and talented Site Reliability Engineers to continue our focus in providing our customers the highest quality Apple Services experience. The Apple Service Engineering (ASE) team builds and provides systems and infrastructure that fuel Apple's services (such as iCloud, iT...

Promoted
Microsoft
Redmond, Washington

We are looking for engineers who bring fresh ideas from all backgrounds, leveraging invaluable experience and perspectives to do and achieve more. This is a fantastic opportunity to work with incredibly talented engineers on our next set of big challenges. You will partner closely with internal cust...

ByteDance
Seattle, Washington

Our data infrastructure Site Reliability Engineering (SRE) team is a pioneer in innovation. Establish sustainable mechanisms for scaling systems, such as automation, to drive enhancements in reliability, efficiency, and velocity. ...

Microsoft
Redmond, Washington

What is a Site Reliability Engineer (SRE)? SRE is what you get when you treat operations as if it is a software engineering problem. Are you interested in working for one of the most exciting teams at Microsoft? Then look no further than Microsoft Teams Site Reliability Engineering (SRE) team. As a ...

Oracle
Seattle, Washington

Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Are you a seasoned Site Reliability Engineer or Cloud DevOps guru?. Articulate technical characteristics of services and technology areas and guide Development Te...

Microsoft
Redmond, Washington

OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administrationOR Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experien...

Microsoft
Redmond, Washington

OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administrationOR Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experi...