Content
# RE: Sludge Pump Alarm Investigation Report
> **Thread Summary:** The email thread discusses a spurious alarm event on March 2, 2026, caused by a momentary communication loss between the SCADA system and the PLC controlling the WWTP Primary Sludge area, leading to false alarms. Mason Radke provided a detailed investigation report identifying the root cause and recommended network infrastructure inspection and alarm configuration review. Mike Arias inquired about adding alarm delays to reduce false alarms, but Mason advised against it for now to avoid masking issues, suggesting addressing alarms case-by-case while the underlying network problem is investigated; Kevin Seifert plans to perform on-site testing and network analysis next week.
---
## 1. From: Mike Arias <Arias@sslocsd.us> — Wed, 4 Mar 2026 16:24:51 +0000
Good Morning Mason,
Thank you for the detailed analysis of the alarm/callout, the layout of the report made it very easy to digest the information. One of the recommendations was alarm configuration review. Would a slightly longer delay in alarm conditions address this type of issue while still creating the alarm if the communication error persists? If this is a viable solution, would we want to review the system as a whole or address issues as they arise? I would be hesitant to automatically shelve alarms but maybe a system wide review would be valuable.
Kind regards,
Michael Arias
Michael J. Arias
Operations Supervisor
Grade III Operator
South San Luis Obispo County Sanitation District
1600 Aloha Place, Oceano, Ca.
805-489-6666
[South San Luis Obispo County Sanitation District]
From: Mason Radke <mason@autosysnet.com>
Sent: Tuesday, March 3, 2026 8:12 PM
To: Mike Arias <Arias@sslocsd.us>; Kevin Seifert <kevin@autosysnet.com>
Subject: Sludge Pump Alarm Investigation Report
ALARM INVESTIGATION REPORT
WWTP Primary Sludge System - Spurious Alarm Event
Date of Event:
March 2, 2026
Time of Event:
1:53:18 AM
System:
WWTP Primary Sludge Pumping
Server:
WWTP_AE
Area:
WWTP.Primary (RNA://SGlobal/SSLOCSD_HMI)
Prepared By:
Mason
1. Incident Summary
At 1:53:18 AM on March 2, 2026, the SCADA system generated three simultaneous Priority 900 (Urgent) alarms in the WWTP Primary area. All three alarms were in a TRIP condition and cleared shortly after, with the system returning to normal operation by approximately 2:02 AM.
Alarms Generated
Alarm Tag
Description
Alarm Class
Condition Quality
P2312_Alm_DriveFault
Primary Sludge Pump 2 Drive Fault
P_PF52x
Bad Quality - Communication Failure
P2322_Alm_DriveFault
Primary Sludge Pump 4 Drive Fault
P_PF52x
Bad Quality - Communication Failure
LT_3210_Alm_LoLo
Secondary Sump Low-Low Alarm
P_PF52x
Bad Quality - Communication Failure
2. Investigation & Analysis
2.1 Alarm Detail Review
Detailed examination of the alarm properties for all three alarms revealed that each carried a Condition Quality of "Bad Quality - Communication Failure." This designation indicates that the SCADA system flagged the alarm data as unreliable due to a loss of communication, rather than representing a genuine field condition.
2.2 Drive Fault Alarms (P2312 & P2322)
The Primary Sludge Pump 2 and Pump 4 drive fault alarms are triggered by a drive fault status bit communicated from the variable frequency drives (VFDs) to the PLC over Ethernet. When the SCADA system lost communication with the PLC, the drive fault status values defaulted to their fail-safe state, which the alarm system interpreted as active faults.
2.3 Secondary Sump Low-Low Level Alarm (LT_3210)
The sump level transmitter (LT_3210) is a hardwired 4-20 mA analog instrument connected directly to the PLC input module. Under normal circumstances, a communication loss between SCADA and the PLC would not affect the actual field signal. However, during the communication disruption, the SCADA system was unable to read the level value from the PLC. The value momentarily defaulted to zero (or was flagged as bad quality) on the SCADA side, which triggered the Low-Low alarm. Trend data confirms the level returned to its normal value immediately after communication was restored, indicating the actual sump level never dropped.
2.4 Hypotheses Considered & Eliminated
1. PLC Power Loss: If the PLC had experienced a power interruption, all I/O values from that controller would have dropped to zero simultaneously. Review of trend data confirmed that other values associated with the same PLC remained stable throughout the event. This hypothesis was eliminated.
2. Ethernet Communication Loss (Drive Network Only): A communication failure on the drive Ethernet network could explain the VFD fault alarms, as the drive fault status is transmitted over Ethernet. However, this would not account for the simultaneous level transmitter anomaly, since LT_3210 is hardwired to the PLC and does not rely on Ethernet communication. This hypothesis was eliminated as the sole cause.
3. SCADA-to-PLC Communication Loss: A momentary loss of communication between the SCADA server and the PLC would explain all three alarms simultaneously. Both the drive fault status (read from the PLC, which receives it over Ethernet from the VFDs) and the sump level value (read from the PLC, which receives it via hardwired analog input) would become unavailable to SCADA at the same time. The Bad Quality - Communication Failure tag on all three alarms confirms this as the root cause.
3. Root Cause Determination
The root cause of all three alarms is a momentary communication loss between the SCADA system (Server: WWTP_AE) and the PLC controlling the WWTP Primary Sludge area. This communication disruption lasted only seconds and caused the SCADA system to temporarily lose visibility of process values from the PLC, resulting in spurious alarms as default/fail-safe values triggered alarm conditions.
4. Timeline of Events
Time
Event
1:53:18 AM
SCADA-PLC communication loss occurs. Three alarms triggered simultaneously: P2312 Drive Fault, P2322 Drive Fault, LT_3210 Low-Low Level. All tagged Bad Quality - Communication Failure.
1:55:35 AM
Secondary Sump Low-Low alarm acknowledged by operator.
1:55:52 AM
Pump 4 Drive Fault acknowledged.
1:56:09 AM
Pump 2 Drive Fault acknowledged.
2:00:17 AM
CCT Mid Channel ORP Low Alarm triggered (P_AE110B_ORP, Val=455.4). Separate issue, unrelated to communication event.
~2:01-2:02 AM
All drive fault and sump level alarms clear. Communication restored. System returns to normal.
5. Recommendations
4. Network Path Investigation: Inspect the network infrastructure between the SCADA server (WWTP_AE) and the affected PLC, including managed switches, fiber connections, and any intermediate network devices, for signs of intermittent failures or errors.
5. Network Switch Log Review: Review switch logs and port diagnostics for the timeframe of 1:50-2:05 AM on 3/2/2026 to identify any port flaps, CRC errors, or link state changes.
6. Alarm Configuration Review: Consider implementing alarm suppression or shelving during communication quality events (Bad Quality) to prevent spurious alarms from generating operator distractions.
-Mason Radke
-Autosys, LLC
## Attachments
- ![[20260304_19cb9aaaf812_image001.png]]
---
## 2. From: Kevin Seifert <kevin@autosysnet.com> — Thu, 5 Mar 2026 17:10:26 +0000
Thank you Mason.
I will be on-site next week to perform some testing and network analysis.
-Kevin Seifert, AutoSys LLC
________________________________
From: Mason Radke <mason@autosysnet.com>
Sent: Wednesday, March 4, 2026 9:42 AM
To: Mike Arias <Arias@sslocsd.us>
Cc: Mychal Jones <Mychal@sslocsd.us>; Kevin Seifert <kevin@autosysnet.com>
Subject: Re: Sludge Pump Alarm Investigation Report
Good Morning Michael,
Thank you for the feedback, im glad the report was easy to follow.
To your question, yes, adding a slight delay to the alarm conditions could technically filter out these brief communication blips. However, I'd consider that more of a band-aid than a fix. My concern is that we'd be masking a symptom rather than addressing the underlying issue.
I believe this event is related to the larger network issue we've been troubleshooting starting at the CP4000. Until we can identify and resolve that root cause, I'd recommend holding off on making alarm configuration changes or any other system-wide modifications. Introducing delays or other workarounds right now could make it harder to spot these events as they occur, which is actually valuable diagnostic information while we're still investigating.
My suggestion would be to sit tight on the alarm side for now and continue to address these on a case-by-case basis as they come up. That way we can use each occurrence as a data point to help us zero in on the root cause. Once we've resolved the network issue, we can circle back and discuss whether any alarm tuning makes sense at that point.
—Mason Radke
—Autosys, LLC
From: Mike Arias <Arias@sslocsd.us>
Date: Wednesday, March 4, 2026 at 8:24 AM
To: Mason Radke <mason@autosysnet.com>
Cc: Mychal Jones <Mychal@sslocsd.us>, Kevin Seifert <kevin@autosysnet.com>
Subject: RE: Sludge Pump Alarm Investigation Report
Good Morning Mason,
Thank you for the detailed analysis of the alarm/callout, the layout of the report made it very easy to digest the information. One of the recommendations was alarm configuration review. Would a slightly longer delay in alarm conditions address this type of issue while still creating the alarm if the communication error persists? If this is a viable solution, would we want to review the system as a whole or address issues as they arise? I would be hesitant to automatically shelve alarms but maybe a system wide review would be valuable.
Kind regards,
Michael Arias
Michael J. Arias
Operations Supervisor
Grade III Operator
South San Luis Obispo County Sanitation District
1600 Aloha Place, Oceano, Ca.
805-489-6666
[South San Luis Obispo County Sanitation District]
From: Mason Radke <mason@autosysnet.com>
Sent: Tuesday, March 3, 2026 8:12 PM
To: Mike Arias <Arias@sslocsd.us>; Kevin Seifert <kevin@autosysnet.com>
Subject: Sludge Pump Alarm Investigation Report
ALARM INVESTIGATION REPORT
WWTP Primary Sludge System — Spurious Alarm Event
Date of Event:
March 2, 2026
Time of Event:
1:53:18 AM
System:
WWTP Primary Sludge Pumping
Server:
WWTP_AE
Area:
WWTP.Primary (RNA://SGlobal/SSLOCSD_HMI)
Prepared By:
Mason
1. Incident Summary
At 1:53:18 AM on March 2, 2026, the SCADA system generated three simultaneous Priority 900 (Urgent) alarms in the WWTP Primary area. All three alarms were in a TRIP condition and cleared shortly after, with the system returning to normal operation by approximately 2:02 AM.
Alarms Generated
Alarm Tag
Description
Alarm Class
Condition Quality
P2312_Alm_DriveFault
Primary Sludge Pump 2 Drive Fault
P_PF52x
Bad Quality – Communication Failure
P2322_Alm_DriveFault
Primary Sludge Pump 4 Drive Fault
P_PF52x
Bad Quality – Communication Failure
LT_3210_Alm_LoLo
Secondary Sump Low-Low Alarm
P_PF52x
Bad Quality – Communication Failure
2. Investigation & Analysis
2.1 Alarm Detail Review
Detailed examination of the alarm properties for all three alarms revealed that each carried a Condition Quality of "Bad Quality – Communication Failure." This designation indicates that the SCADA system flagged the alarm data as unreliable due to a loss of communication, rather than representing a genuine field condition.
2.2 Drive Fault Alarms (P2312 & P2322)
The Primary Sludge Pump 2 and Pump 4 drive fault alarms are triggered by a drive fault status bit communicated from the variable frequency drives (VFDs) to the PLC over Ethernet. When the SCADA system lost communication with the PLC, the drive fault status values defaulted to their fail-safe state, which the alarm system interpreted as active faults.
2.3 Secondary Sump Low-Low Level Alarm (LT_3210)
The sump level transmitter (LT_3210) is a hardwired 4–20 mA analog instrument connected directly to the PLC input module. Under normal circumstances, a communication loss between SCADA and the PLC would not affect the actual field signal. However, during the communication disruption, the SCADA system was unable to read the level value from the PLC. The value momentarily defaulted to zero (or was flagged as bad quality) on the SCADA side, which triggered the Low-Low alarm. Trend data confirms the level returned to its normal value immediately after communication was restored, indicating the actual sump level never dropped.
2.4 Hypotheses Considered & Eliminated
1. PLC Power Loss: If the PLC had experienced a power interruption, all I/O values from that controller would have dropped to zero simultaneously. Review of trend data confirmed that other values associated with the same PLC remained stable throughout the event. This hypothesis was eliminated.
2. Ethernet Communication Loss (Drive Network Only): A communication failure on the drive Ethernet network could explain the VFD fault alarms, as the drive fault status is transmitted over Ethernet. However, this would not account for the simultaneous level transmitter anomaly, since LT_3210 is hardwired to the PLC and does not rely on Ethernet communication. This hypothesis was eliminated as the sole cause.
3. SCADA-to-PLC Communication Loss: A momentary loss of communication between the SCADA server and the PLC would explain all three alarms simultaneously. Both the drive fault status (read from the PLC, which receives it over Ethernet from the VFDs) and the sump level value (read from the PLC, which receives it via hardwired analog input) would become unavailable to SCADA at the same time. The Bad Quality – Communication Failure tag on all three alarms confirms this as the root cause.
3. Root Cause Determination
The root cause of all three alarms is a momentary communication loss between the SCADA system (Server: WWTP_AE) and the PLC controlling the WWTP Primary Sludge area. This communication disruption lasted only seconds and caused the SCADA system to temporarily lose visibility of process values from the PLC, resulting in spurious alarms as default/fail-safe values triggered alarm conditions.
4. Timeline of Events
Time
Event
1:53:18 AM
SCADA-PLC communication loss occurs. Three alarms triggered simultaneously: P2312 Drive Fault, P2322 Drive Fault, LT_3210 Low-Low Level. All tagged Bad Quality – Communication Failure.
1:55:35 AM
Secondary Sump Low-Low alarm acknowledged by operator.
1:55:52 AM
Pump 4 Drive Fault acknowledged.
1:56:09 AM
Pump 2 Drive Fault acknowledged.
2:00:17 AM
CCT Mid Channel ORP Low Alarm triggered (P_AE110B_ORP, Val=455.4). Separate issue, unrelated to communication event.
~2:01–2:02 AM
All drive fault and sump level alarms clear. Communication restored. System returns to normal.
5. Recommendations
4. Network Path Investigation: Inspect the network infrastructure between the SCADA server (WWTP_AE) and the affected PLC, including managed switches, fiber connections, and any intermediate network devices, for signs of intermittent failures or errors.
5. Network Switch Log Review: Review switch logs and port diagnostics for the timeframe of 1:50–2:05 AM on 3/2/2026 to identify any port flaps, CRC errors, or link state changes.
6. Alarm Configuration Review: Consider implementing alarm suppression or shelving during communication quality events (Bad Quality) to prevent spurious alarms from generating operator distractions.
—Mason Radke
—Autosys, LLC
## Attachments
- ![[20260305_19cbefac809a_image001.png]]
---