Search jobs > Atlanta, GA > Incident management

Director, IT Incident and Problem Management

Smarsh
Atlanta
$200K-$250K a year
Full-time

The Director, IT Incident and Problem Management is responsible for overseeing the processes related to incident and problem management within the organization.

This role ensures that incidents are resolved efficiently and effectively and that root causes of problems are identified and addressed to prevent recurrence.

The Director will lead a team, collaborate with various departments, inclusive of Engineering, Product Management Customer Support, to maintain a high standard of service deliveryYou will leverage your expertise in ITIL framework and Google Site Reliability Engineering (SRE) methodologies to maintain high availability and reliability of our SaaS platform through effective incident response and robust problem management strategies.

What will you do?

  • Provide strategic direction and oversight for the IT incident and problem management function, ensuring / 7 coverage and effective response to incidents.
  • Develop and refine IT incident and problem management strategies aligned with ITIL and Google SRE methodologies to enhance service reliability and minimize business impact.
  • Lead major incident and problem resolution efforts, conducting thorough root cause analysis and implementing preventive actions based on Google SRE principles.
  • Collaborate closely with cross-functional teams including IT operations, development, and customer support to ensure coordinated incident and problem resolution efforts.
  • Define and monitor key performance indicators (KPIs) and metrics related to incident and problem management, driving continuous improvement initiatives.
  • Present incident and problem management reports to stakeholders, including senior executives and Product Managers, offering insights into trends, risks, and opportunities for improvement.

Additionally, develop and deliver customer-facing metrics and reports.

What will you bring?

  • Experience in IT Incident, Problem Management or SRE roles : - years of experience in IT, with at least 5 years in incident, problem management or SRE and least 3 years in a managerial position.
  • Experience in SaaS Environments : Proven experience in IT incident, problem management or SRE for B2B SaaS providers, ideally within the FinTech sector.
  • Leadership : Proven track record in senior leadership roles, with the ability to inspire and empower cross-functional teams to achieve operational excellence and drive continuous improvement.
  • IT Incident Management : Deep understanding of ITIL framework with extensive hands-on experience in incident identification, prioritization, resolution, and escalation.
  • Problem Management : Expertise in leading comprehensive root cause analysis and problem resolution efforts, incorporating Google SRE principles for preventive actions.
  • Google SRE Methodologies : In-depth knowledge of Google SRE philosophies, including error budget management, service level indicators / objectives (SLIs / SLOs), and effective incident response strategies.
  • Technical Acumen :
  • Broad technical understanding across IT infrastructure, networks, applications and their incident and problem management practices.
  • Broad technical understanding of modern cloud technologies (AWS, Azure, GCP) and their incident and problem management practices.
  • Analytical Skills : Strong ability to analyze incidents and problems, identify root causes, and drive the implementation of effective solutions.
  • Communication and Stakeholder Management : Excellent communication skills, with the ability to engage and influence stakeholders at all levels, including technical teams and senior management.
  • Collaboration : Effective collaboration skills to work with cross-functional teams and stakeholders.
  • Strategic Thinking : Strong analytical and strategic thinking abilities, capable of driving alignment between incident and problem management processes and organizational goals.

$, - $, a yearThe above salary range represents Smarsh's good faith and reasonable estimate of the range of possible base compensation at the time of posting.

Any applicable bonus programs will be discussed during the recruiting process. The salary for this role will be set based on a variety of factors, including but not limited to, internal equity, experience, education, location, specialty and training.

Local cost of living assessments are done for each new hire at the time of offer.The above salary range represents Smarsh's good faith and reasonable estimate of the range of possible base compensation at the time of posting.

Any applicable bonus programs will be discussed during the recruiting process. The salary for this role will be set based on a variety of factors, including but not limited to, internal equity, experience, education, location, specialty and training.

Local cost of living assessments are done for each new hire at the time of offer.

30+ days ago
Related jobs
Promoted
InsideHigherEd
Atlanta, Georgia

While performing the duties of this job, the employee is regularly required to: sit, walk, use hands for computers and to move items, reach with hands and arms, and talk or hear. Possess excellent interpersonal, initiative, teamwork, problem solving, independent judgment, organization, communication...

Promoted
PMI (Project Management Institute)
Atlanta, Georgia

Key responsibilities include establishing quality assurance and maintenance standards across products, as well as leading a team of specialists with varied expertise in quality assurance, digital product management, printing and fulfillment, inventory management and digital product maintenance, and ...

City of Atlanta
Atlanta, Georgia

Can successfully write clear and concise technical documents; ability to read, understand, and prepare critical operational and professional documents; ability to operate city vehicles: sedan, SUV, and pickup truck. Has strong Computer and Software (Microsoft) Skills; very strong administrative and ...

Truist
Atlanta, Georgia

Supervise and participate in the planning, scoping and execution of technology audit activities within the framework established by the department's policies and audit methodology. Strong verbal and written communication skills with the ability to effectively communicate with senior management and o...

Ameris Bank
Norcross, Georgia

The Director of Sales, Marketing and Client Management will be responsible for developing and managing all sales channels (outside sales, inside sales, national accounts, and 3rd party sales), relationship management and marketing for US Premium Finance. The Director of Sales, Marketing and Client M...

Boston Consulting Group
Atlanta, Georgia

At least 10 years of working experience, ideally 2+ years of experience developing and experimenting with LLMs and 6+ years of experience developing AI/ML technologies within large and business critical applications. Experience with multi-agent frameworks/systems and an understanding of multi-agent ...

JLL
Atlanta, Georgia

Oversee and drive Incident Management activities, including the resolution of Major incidents, ensuring adherence to best practice and industry standards. Jones Lang LaSalle (“JLL”) is an Equal Opportunity Employer and is committed to working with and providing reasonable accommodations to individua...

City of Atlanta
Atlanta, Georgia

Can successfully write clear and concise technical documents; ability to read, understand, and prepare critical operational and professional documents; ability to operate city vehicles: sedan, SUV, and pickup truck. Has strong Computer and Software (Microsoft) Skills; very strong administrative and ...

AMEX
Atlanta, Georgia

Here, youll learn and grow as we help you create a career journey thats unique and meaningful to you with benefits, programs, and flexibility that support you personally and professionally. When you join Team Amex, you become part of a global and diverse community of colleagues with an unwavering co...

Graphic Packaging International
Atlanta, Georgia

Service Level Management: ensures availability and responsiveness of production applications per service level agreements by planning and/or executing maintenance activities, restoring service and resolving incidents, exercising problem management to continuously improve the reliability of services ...