SRE - Java Splunk kibana grafana -payment domain
Job Summary :
We are seeking a skilled Site Reliability Engineer (SRE) with expertise in Java backend systems, observability tools (Splunk, Kibana, Grafana), and experience in the Payment Domain. The role involves ensuring system reliability, performance optimization, and efficient incident management for mission-critical payment applications.
Key Responsibilities :
Develop and maintain backend services using Java (Spring Boot, REST APIs) aligned with SRE principles.
Implement observability frameworks using Splunk, Kibana, and Grafana for monitoring application health, performance, and incident detection.
Automate operational tasks, deployment pipelines (CI / CD), and system health checks to ensure high availability and scalability.
Drive incident response processes - detection, troubleshooting, post-mortem, and root cause analysis.
Collaborate with DevOps, QA, and Application teams to design resilient systems with proactive monitoring and alerts.
Manage log aggregation, metrics, dashboards, and visualizations for proactive issue detection and capacity planning.
Perform performance tuning, load testing, and scalability assessments for payment transaction flows.
Work with on-call rotations to support production systems and drive continuous improvements.
Primary Skills (Must-Have) :
Core Java, Spring Boot, RESTful API Development
Splunk (Dashboard creation, alerting, log analysis)
Kibana (ElasticSearch log visualization & dashboards)
Grafana (Metrics dashboards, Prometheus integration)
CI / CD (Jenkins, GitLab, or equivalent)
Strong understanding of SRE concepts : SLIs, SLOs, Error Budgets
Experience in the Payment / FinTech domain
Secondary Skills (Good-to-Have) :
Prometheus, OpenTelemetry
Kubernetes, Docker
Incident Management Tools (PagerDuty, OpsGenie)
Knowledge of SQL & NoSQL Databases
AWS Cloud (EC2, S3, CloudWatch)
Work Environment :
Fast-paced production support and monitoring environment
Collaborative teams working in Agile / Scrum setup
Payment systems processing high-availability transactions
Additional Preferences (Optional) :
Experience in PCI-DSS Compliance environments
Exposure to Payment Gateways, Real-time Transaction Processing Systems
SRE Certifications (Google SRE, AWS DevOps)
Site Reliability Engineer • Omaha, NE, Nebraska, USA