Search jobs > Plano, TX > Site reliability engineer

Site Reliability Engineer II

Bank of America Corporation
Plano, TX
Full-time

Job Description : About us :

About us :

At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection.

Responsible Growth is how we run our company and how we deliver for our clients, teammates, communities and shareholders every day.

One of the keys to driving Responsible Growth is being a great place to work for our teammates around the world. We're devoted to being a diverse and inclusive workplace for everyone.

We hire individuals with a broad range of backgrounds and experiences and invest heavily in our teammates and their families by offering competitive benefits to support their physical, emotional, and financial well-being.

Bank of America believes both in the importance of working together and offering flexibility to our employees. We use a multi-faceted approach for flexibility, depending on the various roles in our organization.

Working at Bank of America will give you a great career with opportunities to learn, grow and make an impact, along with the power to make a difference. Join us!

Job Description :

This job is responsible for partnering with engineering and technology teams to implement measures as prescribed by lead / senior SRE engineers.

Key responsibilities include ensuring appropriate instrumentation, tooling, ticketing, alerting and on call routines are in place for key services, identifying root causes of issues through production triage efforts, and suggesting code enhancements to technology teams to automate services and improve reliability and efficiency.

Job expectations include using software development skills to improve efficiency and to address gaps in reliability.

Overview :

Site Reliability Engineer II (Hadoop Admin) role supporting NextGen Platforms built around Big Data Technologies (AI / ML, Hadoop, Jupyter Notebook, Spark, Kafka, Impala, Hbase, Docker-Container, Ansible and many more).

Requires experience in cluster management of vendor based Hadoop and Data Science (AI / ML) products like C3, Cloudera, Talend, Trifacta, Selerity, ELK, KPMG Ignite etc.

Analyst is involved in the full life cycle of an application and part of an agile development process. They require the ability to interact, develop, engineer, and communicate collaboratively at the highest technical levels with clients, development teams, vendors and other partners.

The following section is intended to serve as a general guideline for each relative dimension of project complexity, responsibility, and education / experience within this role.

Responsibilities :

  • Develops and maintains reliability scripts, tools and libraries and leverages them for common instrumentation, automation, and operational needs, and when mentoring Site Reliability Engineer (SRE) resources on reliability practices and established tools / capabilities
  • Collaborates with Development and Infrastructure teams to understand technical solutions and implement monitoring capabilities outlined in the application and system monitoring designs put forward by the SRE Lead
  • Partners to implement code changes to make use of common reliability libraries and tools and helps Application Production Services and Application Development teammates understand how to use them
  • Identifies vulnerabilities and opportunities for reliability improvement, such as investigating low level error rates and 'noise' in monitoring, and defines solutions to reduce manual support effort and / or improve system reliability
  • Engages as a subject matter expert in major incident triage efforts and failure scenario modelling and diagnosis with Problem Manager root causes for major incident / problem management investigations
  • Participates regularly in an on-call rotation with Production Support teammates to learn more about reliability issues affecting their portfolio
  • Works on complex, major or highly visible tasks in support of multiple projects that require multiple areas of expertise
  • Team member will be expected to provide subject matter expertise in managing Hadoop and Data Science Platform operations with focus around Cloudera Hadoop, Jupyter Notebook, Openshift, Docker-Container Cluster Management and Administration
  • Integrates solutions with other applications and platforms outside the framework
  • He / She will be responsible for managing platform operations across all environments which includes upgrades, bug fixes, deployments, metrics / monitoring for resolution and forecasting, disaster recovery, incident / problem / capacity management
  • Serves as a liaison between client partners and vendors in coordination with project managers to provide technical solutions that address user needs

Required Qualifications :

  • 5+ years of combined Technology experience in an Enterprise environment
  • Docker, OpenShift / Kubernetes, Database (SQL, Cassandra, Postgres), Jupyter Notebook
  • Strong technical knowledge : Unix / Linux; Database (Sybase / SQL / Oracle), Java, Python, Perl, Shell scripting, Infrastructure.
  • Experience in Monitoring & Alerting, and Job Scheduling Systems
  • Being comfortable with frequent, incremental code testing and deployment
  • Strong grasp of automation / DevOps tools - Ansible, Jenkins, SVN, Bitbucket

Desired Qualifications :

  • Bachelor's degree or equivalent, preferably in a technical or engineering discipline
  • Cloudera Big Data Stack, Hadoop, Impala, Hive, Spark, Kafka, Impala, Hive, Hbase

Skills :

  • Analytical Thinking
  • Automation
  • Collaboration
  • Production Support
  • Result Orientation
  • Application Development
  • Architecture
  • Influence
  • Project Management
  • Solution Design
  • Adaptability
  • DevOps Practices
  • Risk Management
  • Solution Delivery Process
  • Stakeholder Management

Shift :

1st shift (United States of America)

Hours Per Week :

2 days ago
Related jobs
Promoted
Bank of America Corporation
Plano, Texas

Develops and maintains reliability scripts, tools and libraries and leverages them for common instrumentation, automation, and operational needs, and when mentoring Site Reliability Engineer (SRE) resources on reliability practices and established tools/capabilities. Site Reliability Engineer II (Ha...

Promoted
VirtualVocations
Carrollton, Texas

A company is looking for a Site Reliability Engineer. ...

Promoted
JP Morgan Chase & Co.
Plano, Texas

Lead Site Reliability Engineer. Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices with the ability to implement these practices within an application or platform. Assume a critical role in de...

Promoted
VirtualVocations
Carrollton, Texas

A company is looking for a Senior Site Reliability Engineer - GCP in the United States (Remote). ...

Resource Informatics Group
Irving, Texas

Position: Site Reliability Engineer (SRE). SRE role for online services in a multi-region, multi-cloud environment with specific experience in reliability and resliency. Serve as a mentor to junior engineers and provide technical leadership to the organization. ...

Promoted
VirtualVocations
Carrollton, Texas

...

Tyler Technologies
Plano, Texas

This Site Reliability Engineer position is a technical role within the Technical and Cloud Services group that helps ensure the reliability, scalability, and performance of our infrastructure while driving automation and efficiency in our development process. Site Reliability and/or DevOps role, wit...

JPMorgan Chase & Co.
Plano, Texas

As a Site Reliability Engineer III at JPMorgan Chase within the Infrastructure Platform, Web Hosting team, you will solve complex and broad business problems with simple and straightforward solutions. Advanced knowledge in site reliability culture and principles with demonstrated ability to implemen...

JPMorgan Chase Bank, N.A.
Plano, Texas

Job responsibilities * Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate * Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous i...

Splunk Inc
Plano, Texas
Remote

Site Reliability Engineers in this role will be engaging with multiple service owners across the platform to teach and implement modern interpretations ofSRE,observability, Chaos Engineering andDevOps. Splunk's Cloud Services group is looking for a Site ReliabilityEngineer to help lead, design and b...