Search jobs > Seattle, WA > Site reliability engineer

Staff Site Reliability Engineer

Moloco, Inc.
Seattle, Washington, US
Full-time

About the Role

Is this the role you are looking for If so read on for more details, and make sure to apply today.

Moloco is a machine learning company that operates at massive scale (we ingest 10 petabytes of training data per day), and our models are blazingly fast (return predictions in 10 milliseconds or less);

and a profitable unicorn (we are valued at $2 billion and have been profitable for the last 17+ quarters).

We are looking for an exceptional Senior Site Reliability Engineer to help us build a state-of-the-art ML model serving infrastructure for our mobile advertising platform.

You will be part of an engineering team that manages the infrastructure that serves deep neural network machine learning (ML) models to clients, CI / CD infrastructure to deploy infrastructure updates in real time, and develops infrastructure tools and platforms that improve the productivity of engineering teams.

We are looking for someone who is passionate about solving infrastructure problems with software engineering skills, a desire to grow and learn new technologies, a love of working in collaborative teams, and a commitment to customer service.

What you'll do

  • Play a role in engineering partner teams for company-wide infrastructure adoption and standard methodologies
  • Contribute to technical direction and decisions across the organization by conducting / leading research with other technical leaders in the organization
  • Traditional SRE / Operational support areas such as tooling and automation, monitoring, workflow management, maintaining and improving data pipelines, CI / CD, etc.
  • Actively participate in and contribute to code reviews and technical design documents to identify performance and reliability bottlenecks.
  • Partner with and support other engineering teams with operational guidance and expertise on various project initiatives.
  • Participate in capacity planning and scaling
  • Ensure that Moloco is delivered in a highly performant manner that can handle viral traffic spikes.
  • Collaborate with others in SRE and SWE to leverage tools, processes and techniques to improve service reliability.
  • Reduce business risk in areas such as infrastructure and configuration management, provisioning, capacity modeling and planning, and incident handling, mitigation, root cause analysis, and post-mortems.
  • Identify common patterns in the challenges of operating services in production, and work with others in SRE and SWE to design and implement reusable solutions and / or other cross-functional work that reduces the complexity, difficulty, cost, and risk of operating the business.

What you’ll need to succeed

  • Hands-on experience working with GCP or other cloud platforms (e.g. AWS, Azure)
  • Practical, proven knowledge of a high-level language (e.g. Go, Python)
  • Experience working with infrastructure-related software (e.g. Kubernetes, Helm, Terraform, etc.)
  • Experience developing infrastructure, configuration and deployment scripting and automation for large scale / high complexity services in a microservices environment
  • At least 5 years of experience in large-scale software development
  • Passionate about operational excellence and thrive in an environment where you are able to provide extremely high levels of customer support
  • Tenacious problem solver who takes ownership of issues from end-to-end to full resolution

J-18808-Ljbffr

6 days ago
Related jobs
Promoted
VirtualVocations
Seattle, Washington

A company is looking for a Senior Site Reliability Engineer to contribute to the operational success and growth of their cloud infrastructure. ...

Promoted
Apple
Seattle, Washington

We are looking for passionate and talented Site Reliability Engineers to continue our focus in providing our customers the highest quality Apple Services experience. The Apple Service Engineering(ASE) team builds and provides systems and infrastructure that fuel Apple's services (such as iCloud, iTu...

Promoted
VirtualVocations
Seattle, Washington

A company is looking for a Principal Site Reliability Developer. ...

Promoted
SpaceX
Redmond, Washington

Bachelor's degree in computer science, information systems/IT, or an engineering discipline; OR 2+ years of professional experience in software, DevOps, or site reliability engineering in lieu of a degree. SITE RELIABILITY ENGINEER (STARSHIELD) - TOP SECRET CLEARANCE. Software engineering and innova...

Promoted
MongoDB
Seattle, Washington

The Cloud Site Reliability Engineering Team designs and builds the global infrastructure on which we deploy our services. ...

Circle
Seattle, Washington

Staff Site Reliability Engineer (IV). Staff Site Reliability Engineer (IV). Staff Site Reliability Engineer. As a Senior Site Reliability Engineer at Circle, you will design, build, and maintain Circle's infrastructure estate to meet the growing worldwide customer base on public cloud providers acro...

F5
Seattle, Washington

As a software engineer specializing in site reliability, you will bring a software engineering and automated solution mindset to your work. The Site Reliability Engineer III will be responsible for ensuring the reliability, availability, and scalability of critical systems and SaaS platforms. Softwa...

JPMorgan Chase & Co.
Seattle, Washington

Lead Site Reliability Engineer. Exhibits deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices with the ability to implement these practices within an application or platform. Assume a critical r...

Apple
Seattle, Washington

The Apple Services Engineering (ASE) team is one of the most exciting examples of Apple’s long-held passion for combining art and technology. These engineers build secure, end-to-end solutions. Thanks to Apple’s unique integration of hardware, software, and services, engineers here partner to get be...

Disney Entertainment & ESPN Technology
Seattle, Washington

The Senior Site Reliability Engineer is a key member of our Performance and Reliability embedded teams. Our Performance and Reliability teams are leading the improvements, optimization, and availability of applications across the Disney organization and business units, taking a consultative approach...