CrowdStrike outage affecting Abrigo applications
Incident Report for Abrigo
Postmortem

CrowdStrike Outage

Incident Overview

Incident Commenced: 19JULY2024

Product Family Affected: All Products

Abrigo Reference#: IM-125

Incident Summary: Beginning at 1:30 am ET on July 19, Abrigo-hosted software applications became unavailable due to a global outage caused by a faulty update from our service provider, CrowdStrike. This update led to issues with the Falcon sensor software, affecting our systems. Once remediation steps were provided by CrowdStrike, Abrigo promptly tested and implemented the fix, restoring functionality to our applications.

Incident Timeline

 

Remediation Summary and Current Status:

Resolution Summary: CrowdStrike provided resolution steps, which included either rebooting the affected machines up to 15 times or booting into safe mode to delete any .sys file beginning with c-00000291. Microsoft suggested an alternative resolution of restoring systems to backups from prior days. Abrigo implemented all these solutions in order of complexity. If repeated reboots did not resolve the issue, we either restored the system or deleted the identified .sys files. We prioritized production environments and subsequently addressed UAT environments over the weekend.

Root Cause Analysis

Root Cause Statement: On July 19 at 04:09 UTC, CrowdStrike released an update to the Falcon sensor software on Windows PCs and servers that contained a faulty configuration. The update included a change to the configuration file responsible for monitoring named pipes, specifically Channel File 291. This modification led to an out-of-bounds memory read in the Windows sensor client, triggering an invalid page fault. Consequently, affected machines either experienced continuous reboot cycles or entered recovery mode.

Mitigation Summary: To mitigate the issue going forward, Abrigo will update our break glass process to ensure rapid and efficient responses. We will audit our backup settings in AWS to enhance reliability and ensure quick recovery. Additionally, we will review and refine our recovery precedence procedures to handle multiple production failures more effectively. These measures will strengthen our resilience and maintain our critical operations.

Remediation Steps

Posted Jul 30, 2024 - 14:56 EDT

Resolved
This incident has been resolved.
Posted Jul 19, 2024 - 19:08 EDT
Update
All system are performing as expected.
Posted Jul 19, 2024 - 19:07 EDT
Update
FinCEN DirectFile is now Operational
Sageworks API is now Operational
BAM+ is currently Degraded
We continue to work to full resolution of this issue
Posted Jul 19, 2024 - 18:31 EDT
Update
LoanLoss Analyzer and VuluCast are operational.
Posted Jul 19, 2024 - 18:02 EDT
Update
IQ Autoscan is operational
Posted Jul 19, 2024 - 17:36 EDT
Update
IQ AutoScan has been restored to operational
Sageworks API has been restored to operational
BAM+ is currently in a partial outage
Direct File and IQAS are currently Degraded.
We are continuing to make significant progress on the BAM+ Suite and have successfully completed the majority of recovery tasks with work continuing to ensure the availability of all functionality.
Posted Jul 19, 2024 - 17:36 EDT
Update
fileservice.abrigo.com is functioning as expected.
Abrigo continues to implement the necessary remediation steps to bring systems online and has made significant progress. Some systems have been restored, and we are closely testing and monitoring performance to ensure full operation. Our engineers continue to work through the remaining systems as quickly as possible.
Posted Jul 19, 2024 - 15:17 EDT
Update
We are continuing to work on a fix for this issue.
Posted Jul 19, 2024 - 15:03 EDT
Update
Online Portal Now is operational, but is not running at full capacity yet and may still experience some slowness. Full functionality is available.
Posted Jul 19, 2024 - 13:47 EDT
Update
Abrigo ID is now operational. We will continue to update this incident as products come back online.
Posted Jul 19, 2024 - 13:32 EDT
Update
AbrigoID has been restored to operation.
Posted Jul 19, 2024 - 13:24 EDT
Update
Abrigo continues to implement the necessary remediation steps to bring systems online and has made significant progress. Some systems have been restored and we are closely testing and monitoring performance to ensure full operation. Our engineers continue to work through the remaining systems in order to restore as quickly as possible. Your patience during this process is appreciated.

We will update this page within 2 hours, and we recommend subscribing to receive notifications as updates occur.
Posted Jul 19, 2024 - 13:00 EDT
Update
Sageworks Analyst has been restored to operation.
Posted Jul 19, 2024 - 12:55 EDT
Update
Abrigo continues to implement the necessary remediation steps to bring systems online. We are working through automating the process so progress will accelerate moving forward.

We will update this page within 2 hours, and we recommend subscribing to receive notifications as updates occur.
Posted Jul 19, 2024 - 11:00 EDT
Update
CrowdStrike outage affecting Abrigo applications

A global outage with service provider, CrowdStrike, has made many of the Abrigo applications unavailable. Abrigo is in the process of remediating the issues. CrowdStrike has provided a resolution, but implementation requires multiple steps per impacted machine.

We will continue to update this page, and we recommend subscribing to receive notifications as updates occur.
Posted Jul 19, 2024 - 10:03 EDT
Identified
CrowdStrike Engineering has identified a content deployment related to this issue and reverted those changes.
Posted Jul 19, 2024 - 03:14 EDT
Investigating
We are currently experiencing multiple service failures across numerous devices in AWS. The root cause appears to be related to general connectivity issues to certain EC2 instance types due to a recent update to the CrowdStrike agent causing a stop error within the Windows operating system. We are actively investigating and working with AWS Support to remediate as quickly as possible. This event appears to be impacting numerous AWS customers running Windows O/S and CrowdStrike.
Posted Jul 19, 2024 - 01:48 EDT
This incident affected: Abrigo ID (login.abrigo.com), BAM+, File Transfer (fileservice.abrigo.com), Fincen DirectFile (fincendirectfile.abrigo.com), IQ AutoScan (iqautoscan3.com), Loan Loss Analyzer (lla.abrigo.com), Online Portal Now, Sageworks Analyst (www.sageworksanalyst.com), Sageworks API (api.sageworks.com), and ValuCast CECL (www.valucastcecl.com).