Talent.com
Site Reliability Engineer

Site Reliability Engineer

Purple DriveOmaha, NE, Nebraska, USA
job_description.job_card.30_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

SRE - Java Splunk kibana grafana -payment domain

Job Summary :

We are seeking a skilled Site Reliability Engineer (SRE) with expertise in Java backend systems, observability tools (Splunk, Kibana, Grafana), and experience in the Payment Domain. The role involves ensuring system reliability, performance optimization, and efficient incident management for mission-critical payment applications.

Key Responsibilities :

Develop and maintain backend services using Java (Spring Boot, REST APIs) aligned with SRE principles.

Implement observability frameworks using Splunk, Kibana, and Grafana for monitoring application health, performance, and incident detection.

Automate operational tasks, deployment pipelines (CI / CD), and system health checks to ensure high availability and scalability.

Drive incident response processes - detection, troubleshooting, post-mortem, and root cause analysis.

Collaborate with DevOps, QA, and Application teams to design resilient systems with proactive monitoring and alerts.

Manage log aggregation, metrics, dashboards, and visualizations for proactive issue detection and capacity planning.

Perform performance tuning, load testing, and scalability assessments for payment transaction flows.

Work with on-call rotations to support production systems and drive continuous improvements.

Primary Skills (Must-Have) :

Core Java, Spring Boot, RESTful API Development

Splunk (Dashboard creation, alerting, log analysis)

Kibana (ElasticSearch log visualization & dashboards)

Grafana (Metrics dashboards, Prometheus integration)

CI / CD (Jenkins, GitLab, or equivalent)

Strong understanding of SRE concepts : SLIs, SLOs, Error Budgets

Experience in the Payment / FinTech domain

Secondary Skills (Good-to-Have) :

Prometheus, OpenTelemetry

Kubernetes, Docker

Incident Management Tools (PagerDuty, OpsGenie)

Knowledge of SQL & NoSQL Databases

AWS Cloud (EC2, S3, CloudWatch)

Work Environment :

Fast-paced production support and monitoring environment

Collaborative teams working in Agile / Scrum setup

Payment systems processing high-availability transactions

Additional Preferences (Optional) :

Experience in PCI-DSS Compliance environments

Exposure to Payment Gateways, Real-time Transaction Processing Systems

SRE Certifications (Google SRE, AWS DevOps)

serp_jobs.job_alerts.create_a_job

Site Reliability Engineer • Omaha, NE, Nebraska, USA