Job Description
Job Description
Salary : Base salary range $135k-$180k
Infrastructure Team Mission
Creatively support Berkshire Greys mission by providing compute, network, and cloud resources to enable BG operations, development, deployment, and production support in a robust, reliable, and repeatable manner.
Role Description
As the Lead Cloud SRE Engineer on Berkshire Greys Infrastructure Team, you will manage and improve the system effectiveness of BGs product infrastructure to increase customer satisfaction, both internal and external. You will take a data-driven and programmatic approach to understand customer pains and system failure points. You will collaborate with various customer teams, ensuring a high level of service for both our internal and external customers.
Responsibilities
- Drive system effectiveness and customer success through proactive quality improvements and issue resolution.
- Champion testing across the team; integrate automated testing into development workflows and CI / CD pipelines.
- Improve customer experience and ensure customer success with every component of the product infrastructure.
- Determine and track KPIs to ensure that product goals are being met.
- Build tools and automation for defect detection, performance benchmarking, and regression prevention.
- Manage issue resolution through prioritization, reproduction, and proactive mitigation.
- Generate key artifacts and deliverables including issue pareto, test plans, and test reports.
- Collaborate with internal teams to gather data and feedback from customer sites.
- Communicate KPIs, issues, plan and progress to resolution to internal stakeholders via meetings and reports.
- Administer and partner on cloud infrastructure with a focus on performance, availability, security, and cost optimization.
- Apply an SRE mindset : implement actionable monitoring, logging, and alerting solutions; manage error budgets, incident postmortems, and resilience practices.
- Identify defects and bottlenecks to make data-driven decisions to improve system performance and availability.
- Implement actionable monitoring and alerting solutions and create dashboards to provide overviews of performance and health.
- Create visibility and increase collaboration; act as facilitator and communicator within the DevOps team collaborating with development, support, operations teams, and other stakeholders to understand requirements and provide technical guidance towards better performance & stability.
Background and Experience (Required)
7 or more years of related work experience, preferably in Design Quality, Systems Engineering, or DevOpsHighly skilled with analysis tools / methods experience with tools like Prometheus, Grafana, and Elastic and exposure to their query languages3 or more years hands on coding automation / scripting tasks via a shell language and / or PythonUnderstand Infrastructure as Code concepts, hands-on Terraform experience preferred.Strong knowledge and experience with verifying and validating requirements and product development process.Must have the confidence to stand firm on critical to customer / quality requirements as well as willing to dive deep into design and testing.Strong organization and communication skillsHands-on admin or management of GCP (preferred) and AWS Cloud servicesUnderstanding of Kubernetes conceptsFamiliarity with networking conceptsWhy Berkshire Grey?
Opportunity to work with cutting-edge AI-powered robotic solutions that are transforming the supply chain and logistics industry.A culture of innovation and collaboration, with a commitment to professional development and growth.Competitive compensation and comprehensive benefits package.6310-2501JB
This job is not eligible for visa sponsorship, with the exception of H1-B transfers.