Talent.com
Senior Site Reliability Engineer (Remote)
Senior Site Reliability Engineer (Remote)Xcel Engineering • Oak Ridge, TN, US
Senior Site Reliability Engineer (Remote)

Senior Site Reliability Engineer (Remote)

Xcel Engineering • Oak Ridge, TN, US
job_description.job_card.variable_hours_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
  • serp_jobs.filters.remote
job_description.job_card.job_description

Senior Site Reliability Engineer (Remote)

Join to apply for the Senior Site Reliability Engineer (Remote) role at XCEL Engineering

Company overview

XCEL Engineering, Inc. is an award-winning small business that provides trusted information technology, engineering, consulting and project management solutions and services to federal agencies and organizations. Originally founded in 1971 by professional engineers at the University of Tennessee, XCEL was acquired in 2003 by U.S. Army and Navy veterans and in 2023 became a MartinFed company. XCEL Engineering is a part of IT Lab Partners (ITLP) which was created to support a leading research facility in the East Tennessee region in recruiting the best and the brightest technical talent.

Job overview

XCEL Engineering is seeking a qualified applicant for a Senior Site Reliability Engineer role. As a Senior Site Reliability Engineer, you\'ll play a key role in the HPC Infrastructure and Platforms group, ensuring the reliability and performance of the supercomputer center. The container orchestration platform built on Kubernetes and Red Hat OpenShift powers both critical operational applications and user-managed persistent applications running alongside supercomputers and HPC clusters.

Essential functions

  • Lead ongoing improvements in reliability and scalability for our Kubernetes and Linux based applications and services.
  • Contribute as senior technical resource to define and implement best practices and standards for the center.
  • Provide primary operational support and engineering for production applications.
  • Define and implement KPIs, processes and drive continuous improvement.
  • Influence the architecture and implementation of solutions.
  • Tune operating systems and applications to increase performance and reliability of services.
  • Mentor junior staff and enable them for success.
  • Diagnose system operational problems quickly and effectively.
  • Participate in on-call rotation providing 24-hour, 7-day support and off-hours maintenance windows.
  • Coordinate with vendors to resolve hardware and software problems.
  • Deliver ORNL's mission by aligning behaviors, priorities, and interactions with our core values of Impact, Integrity, Teamwork, Safety, and Service. Promote diversity, equity, inclusion, and accessibility by fostering a respectful workplace.

Qualifications

Basic Qualifications

  • United States citizen with the ability to obtain a security clearance.
  • Bachelor's degree in Computer Science, Information Technology or a related technical field.
  • A minimum of eight (8) years of relevant experience, or an equivalent combination of education and experience.
  • Desired Qualifications

  • Strong working knowledge of Unix system fundamentals and common network protocols.
  • Experience managing Linux / UNIX operating systems in a heterogeneous environment.
  • Solid understanding of networked computing environment concepts.
  • Excellent understanding of networking, particularly Linux and Kubernetes networking.
  • Experience with instrumenting bare metal and VMware infrastructure.
  • Ability to develop and maintain programs and scripts that aid in the operation and automation using various shell (primarily bash) and high-level languages (Python or Go).
  • Ability to proactively identify performance issues, problems, and areas for improvement.
  • Ability to identify requirements and to define, plan, and implement requisite solutions.
  • Ability to plan, organize, prioritize tasks, and complete assigned projects with minimal supervision.
  • Experience with continuous integration and continuous deployment software methodologies and how they apply to SRE / systems engineering.
  • An understanding of code review and familiarity with tools like GitHub and GitLab.
  • Experience using tools such as Nagios, Grafana and Prometheus to monitor systems, metrics, and create dashboards.
  • Experience designing and implementing highly available systems / services utilizing virtual machines and Kubernetes resources.
  • Experience participating in an open-source community with patches accepted upstream.
  • Experience deploying and maintaining automated configuration management software such as Puppet or Ansible.
  • Experience implementing systems-level security technologies like SELinux and following security best practices.
  • J-18808-Ljbffr

    serp_jobs.job_alerts.create_a_job

    Senior Site Reliability Engineer • Oak Ridge, TN, US

    Job_description.internal_linking.related_jobs
    Operations Reliability Engineer

    Operations Reliability Engineer

    Spectra Tech, Inc. • Oak Ridge, TN, US
    serp_jobs.job_card.full_time
    We are seeking a Subcontract position that will assist the Operations Reliability Engineer to implement and improve reliability and asset management program in the site utility space.This position ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Reserve Entomologist

    Reserve Entomologist

    United States Army • Oneida, TN, US
    serp_jobs.job_card.full_time
    THE ARMY HEALTH CARE ADVANTAGE As a member of the Army health care team, you'll receive benefits that you won't be able to get in a civilian career. Challenging Work Feel inspired with great case di...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new
    Lube Technician - Immediate Opening

    Lube Technician - Immediate Opening

    Valvoline Instant Oil Change • Maryville, TN, US
    serp_jobs.job_card.part_time
    ALL ROADS LEAD TO THIS OPPORTUNITY.The journey to Valvoline Instant Oil Change (VIOC) is different for everyone.Our employees are students, recent grads, parents, veterans, career changers-who have...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Reserve Endodontist

    Reserve Endodontist

    United States Army • Oneida, TN, US
    serp_jobs.job_card.full_time
    THE ARMY HEALTH CARE ADVANTAGE As a member of the Army health care team, you'll receive benefits that you won't be able to get in a civilian career. Challenging Work Feel inspired with great case di...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new
    Reliability Engineer (3611)

    Reliability Engineer (3611)

    Navarro Research and Engineering • Oak Ridge, TN, US
    serp_jobs.job_card.full_time +1
    Be among the first 25 applicants.Navarro Research and Engineering is recruiting for a Reliability Engineer in Oak Ridge, TN. This position requires an active DOE Q level clearance, and only applican...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    LPN / LVN

    LPN / LVN

    Encompass Health • Norris, TN, US
    serp_jobs.job_card.full_time +1
    Full and Part time positions available for day and night shift.Newly increased, competitive shift differentials.Encompass Health : Where Nursing Meets Heart, Home, and Healing.Are you seeking a nurs...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Reserve Occupational Therapist

    Reserve Occupational Therapist

    United States Army • Lake City, TN, US
    serp_jobs.job_card.permanent
    Army Occupational Therapists have the strength to heal our Nation's defenders If you are a professional in the field of occupational therapy and want to combine your specialized skills with a desir...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new
    Senior Systems Integration Engineer

    Senior Systems Integration Engineer

    ZipRecruiter • Knoxville, TN, US
    serp_jobs.job_card.full_time
    Job DescriptionJob DescriptionJOB SUMMARY.Provide industrial control system design, integration, and / or technical assistance for process control systems across multiple platforms.This requires an e...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    System Engineer, Level IV

    System Engineer, Level IV

    Jobs via Dice • Oak Ridge, TN, US
    serp_jobs.job_card.full_time
    Akima Infrastructure Services, LLC (AIS), is actively seeking Engineering, Professional, Technical, and administrative support personnel as part of our staff augmentation team supporting the Depart...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Entry-level Lube Tech / Technician

    Entry-level Lube Tech / Technician

    Valvoline Instant Oil Change • Maryville, TN, US
    serp_jobs.job_card.part_time
    ALL ROADS LEAD TO THIS OPPORTUNITY.The journey to Valvoline Instant Oil Change (VIOC) is different for everyone.Our employees are students, recent grads, parents, veterans, career changers-who have...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Service Specialist - Entry Level

    Service Specialist - Entry Level

    Valvoline Instant Oil Change • Maryville, TN, US
    serp_jobs.job_card.part_time
    ALL ROADS LEAD TO THIS OPPORTUNITY.The journey to Valvoline Instant Oil Change (VIOC) is different for everyone.Our employees are students, recent grads, parents, veterans, career changers-who have...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ZipRecruiter • Oak Ridge, TN, US
    serp_jobs.job_card.full_time
    Job DescriptionJob DescriptionSenior Site Reliability Engineer.Must be able to travel onsite periodically (Oak Ridge, TN). Must be eligible for a Federal Security Clearance (US ).Lead ongoing improv...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Reliability Engineer

    Reliability Engineer

    Arconic • Alcoa, TN, US
    serp_jobs.job_card.full_time
    Arconic is currently in search of a.Corporate Manufacturing Technology Engineering Team.Davenport, IA (preferred), Alcoa, TN or Lancaster, PA. At Arconic, we strive to create a safe, inclusive, and ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Reliability Engineer (3611)

    Reliability Engineer (3611)

    Navarro Inc. • Oak Ridge, TN, US
    serp_jobs.job_card.temporary
    serp_jobs.filters_job_card.quick_apply
    Navarro Research and Engineering is recruiting for a Reliability Engineer in Oak Ridge, TN.This position requires an active DOE Q level clearance, and only applicants with current, active clearance...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30
    County Meter Reader

    County Meter Reader

    Meter Reader • Watts Bar Dam, TN
    serp_jobs.job_card.full_time
    Responsibilities The primary responsibility of this position is to read meters and record consumption of the water used, cleaning of meter boxes, and removal of vegetation impeding access to meters...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new
    Online Survey Taker. Earn up to $25 per survey. - Remote

    Online Survey Taker. Earn up to $25 per survey. - Remote

    Earn Haus • Niota, Tennessee, US
    serp_jobs.filters.remote
    serp_jobs.job_card.full_time +1
    We are urgently looking for people interested in taking online surveys for Fortune 500 brands.If you are a self-starter, looking for flexible hours throughout the week, this may be for you! Earn up...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Reserve OBGYN

    Reserve OBGYN

    United States Army • Oneida, TN, US
    serp_jobs.job_card.full_time
    THE ARMY HEALTH CARE ADVANTAGE As a member of the Army health care team, you'll receive benefits that you won't be able to get in a civilian career. Challenging Work Feel inspired with great case di...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new
    Service Supervisor

    Service Supervisor

    Continental Careers • Lenoir City, TN, US
    serp_jobs.job_card.full_time
    Continental Properties is looking for a motivated and empowered Service Supervisor at our Authentix Town Creek residential apartment community in Lenoir City, TN. Our supervisors are instrumental in...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted