Clients/SSLOCSD/emails/2026/march/Sludge_Pump_Alarm_Investigation_Report.md

Content

# RE: Sludge Pump Alarm Investigation Report

> **Thread Summary:** The email thread discusses a spurious alarm event on March 2, 2026, caused by a momentary communication loss between the SCADA system and the PLC controlling the WWTP Primary Sludge area, leading to false alarms. Mason Radke provided a detailed investigation report identifying the root cause and recommended network infrastructure inspection and alarm configuration review. Mike Arias inquired about adding alarm delays to reduce false alarms, but Mason advised against it for now to avoid masking issues, suggesting addressing alarms case-by-case while the underlying network problem is investigated; Kevin Seifert plans to perform on-site testing and network analysis next week.

---

## 1. From: Mike Arias <Arias@sslocsd.us> — Wed, 4 Mar 2026 16:24:51 +0000

Good Morning Mason,
Thank you for the detailed analysis of the alarm/callout, the layout of the report made it very easy to digest the information. One of the recommendations was alarm configuration review. Would a slightly longer delay in alarm conditions address this type of issue while still creating the alarm if the communication error persists? If this is a viable solution, would we want to review the system as a whole or address issues as they arise? I would be hesitant to automatically shelve alarms but maybe a system wide review would be valuable.

Kind regards,
Michael Arias

Michael J. Arias
Operations Supervisor
Grade III Operator
South San Luis Obispo County Sanitation District
1600 Aloha Place, Oceano, Ca.
805-489-6666
[South San Luis Obispo County Sanitation District]

From: Mason Radke <mason@autosysnet.com>
Sent: Tuesday, March 3, 2026 8:12 PM
To: Mike Arias <Arias@sslocsd.us>; Kevin Seifert <kevin@autosysnet.com>
Subject: Sludge Pump Alarm Investigation Report

ALARM INVESTIGATION REPORT
WWTP Primary Sludge System - Spurious Alarm Event
Date of Event:

March 2, 2026

Time of Event:

1:53:18 AM

System:

WWTP Primary Sludge Pumping

Server:

WWTP_AE

Area:

WWTP.Primary (RNA://SGlobal/SSLOCSD_HMI)

Prepared By:

Mason

1. Incident Summary
At 1:53:18 AM on March 2, 2026, the SCADA system generated three simultaneous Priority 900 (Urgent) alarms in the WWTP Primary area. All three alarms were in a TRIP condition and cleared shortly after, with the system returning to normal operation by approximately 2:02 AM.
Alarms Generated
Alarm Tag

Description

Alarm Class

Condition Quality

P2312_Alm_DriveFault

Primary Sludge Pump 2 Drive Fault

P_PF52x

Bad Quality - Communication Failure

P2322_Alm_DriveFault

Primary Sludge Pump 4 Drive Fault

P_PF52x

Bad Quality - Communication Failure

LT_3210_Alm_LoLo

Secondary Sump Low-Low Alarm

P_PF52x

Bad Quality - Communication Failure

2. Investigation & Analysis
2.1 Alarm Detail Review
Detailed examination of the alarm properties for all three alarms revealed that each carried a Condition Quality of "Bad Quality - Communication Failure." This designation indicates that the SCADA system flagged the alarm data as unreliable due to a loss of communication, rather than representing a genuine field condition.
2.2 Drive Fault Alarms (P2312 & P2322)
The Primary Sludge Pump 2 and Pump 4 drive fault alarms are triggered by a drive fault status bit communicated from the variable frequency drives (VFDs) to the PLC over Ethernet. When the SCADA system lost communication with the PLC, the drive fault status values defaulted to their fail-safe state, which the alarm system interpreted as active faults.

2.3 Secondary Sump Low-Low Level Alarm (LT_3210)
The sump level transmitter (LT_3210) is a hardwired 4-20 mA analog instrument connected directly to the PLC input module. Under normal circumstances, a communication loss between SCADA and the PLC would not affect the actual field signal. However, during the communication disruption, the SCADA system was unable to read the level value from the PLC. The value momentarily defaulted to zero (or was flagged as bad quality) on the SCADA side, which triggered the Low-Low alarm. Trend data confirms the level returned to its normal value immediately after communication was restored, indicating the actual sump level never dropped.
2.4 Hypotheses Considered & Eliminated

1.     PLC Power Loss: If the PLC had experienced a power interruption, all I/O values from that controller would have dropped to zero simultaneously. Review of trend data confirmed that other values associated with the same PLC remained stable throughout the event. This hypothesis was eliminated.

2.     Ethernet Communication Loss (Drive Network Only): A communication failure on the drive Ethernet network could explain the VFD fault alarms, as the drive fault status is transmitted over Ethernet. However, this would not account for the simultaneous level transmitter anomaly, since LT_3210 is hardwired to the PLC and does not rely on Ethernet communication. This hypothesis was eliminated as the sole cause.

3.     SCADA-to-PLC Communication Loss: A momentary loss of communication between the SCADA server and the PLC would explain all three alarms simultaneously. Both the drive fault status (read from the PLC, which receives it over Ethernet from the VFDs) and the sump level value (read from the PLC, which receives it via hardwired analog input) would become unavailable to SCADA at the same time. The Bad Quality - Communication Failure tag on all three alarms confirms this as the root cause.

3. Root Cause Determination
The root cause of all three alarms is a momentary communication loss between the SCADA system (Server: WWTP_AE) and the PLC controlling the WWTP Primary Sludge area. This communication disruption lasted only seconds and caused the SCADA system to temporarily lose visibility of process values from the PLC, resulting in spurious alarms as default/fail-safe values triggered alarm conditions.
4. Timeline of Events
Time

Event

1:53:18 AM

SCADA-PLC communication loss occurs. Three alarms triggered simultaneously: P2312 Drive Fault, P2322 Drive Fault, LT_3210 Low-Low Level. All tagged Bad Quality - Communication Failure.

1:55:35 AM

Secondary Sump Low-Low alarm acknowledged by operator.

1:55:52 AM

Pump 4 Drive Fault acknowledged.

1:56:09 AM

Pump 2 Drive Fault acknowledged.

2:00:17 AM

CCT Mid Channel ORP Low Alarm triggered (P_AE110B_ORP, Val=455.4). Separate issue, unrelated to communication event.

~2:01-2:02 AM

All drive fault and sump level alarms clear. Communication restored. System returns to normal.

5. Recommendations

4.     Network Path Investigation: Inspect the network infrastructure between the SCADA server (WWTP_AE) and the affected PLC, including managed switches, fiber connections, and any intermediate network devices, for signs of intermittent failures or errors.

5.     Network Switch Log Review: Review switch logs and port diagnostics for the timeframe of 1:50-2:05 AM on 3/2/2026 to identify any port flaps, CRC errors, or link state changes.

6.     Alarm Configuration Review: Consider implementing alarm suppression or shelving during communication quality events (Bad Quality) to prevent spurious alarms from generating operator distractions.

-Mason Radke
-Autosys, LLC

## Attachments
- ![[20260304_19cb9aaaf812_image001.png]]

---

## 2. From: Kevin Seifert <kevin@autosysnet.com> — Thu, 5 Mar 2026 17:10:26 +0000

Thank you Mason.

I will be on-site next week to perform some testing and network analysis.

-Kevin Seifert, AutoSys LLC

________________________________
From: Mason Radke <mason@autosysnet.com>
Sent: Wednesday, March 4, 2026 9:42 AM
To: Mike Arias <Arias@sslocsd.us>
Cc: Mychal Jones <Mychal@sslocsd.us>; Kevin Seifert <kevin@autosysnet.com>
Subject: Re: Sludge Pump Alarm Investigation Report

Good Morning Michael,

Thank you for the feedback, im glad the report was easy to follow.

To your question, yes, adding a slight delay to the alarm conditions could technically filter out these brief communication blips. However, I'd consider that more of a band-aid than a fix. My concern is that we'd be masking a symptom rather than addressing the underlying issue.

I believe this event is related to the larger network issue we've been troubleshooting starting at the CP4000. Until we can identify and resolve that root cause, I'd recommend holding off on making alarm configuration changes or any other system-wide modifications. Introducing delays or other workarounds right now could make it harder to spot these events as they occur, which is actually valuable diagnostic information while we're still investigating.

My suggestion would be to sit tight on the alarm side for now and continue to address these on a case-by-case basis as they come up. That way we can use each occurrence as a data point to help us zero in on the root cause. Once we've resolved the network issue, we can circle back and discuss whether any alarm tuning makes sense at that point.

—Mason Radke
—Autosys, LLC
From: Mike Arias <Arias@sslocsd.us>
Date: Wednesday, March 4, 2026 at 8:24 AM
To: Mason Radke <mason@autosysnet.com>
Cc: Mychal Jones <Mychal@sslocsd.us>, Kevin Seifert <kevin@autosysnet.com>
Subject: RE: Sludge Pump Alarm Investigation Report

Good Morning Mason,

Thank you for the detailed analysis of the alarm/callout, the layout of the report made it very easy to digest the information. One of the recommendations was alarm configuration review. Would a slightly longer delay in alarm conditions address this type of issue while still creating the alarm if the communication error persists? If this is a viable solution, would we want to review the system as a whole or address issues as they arise? I would be hesitant to automatically shelve alarms but maybe a system wide review would be valuable.

Kind regards,

Michael Arias

Michael J. Arias

Operations Supervisor

Grade III Operator

South San Luis Obispo County Sanitation District

1600 Aloha Place, Oceano, Ca.

805-489-6666

[South San Luis Obispo County Sanitation District]

From: Mason Radke <mason@autosysnet.com>
Sent: Tuesday, March 3, 2026 8:12 PM
To: Mike Arias <Arias@sslocsd.us>; Kevin Seifert <kevin@autosysnet.com>
Subject: Sludge Pump Alarm Investigation Report

ALARM INVESTIGATION REPORT

WWTP Primary Sludge System — Spurious Alarm Event

Date of Event:

March 2, 2026

Time of Event:

1:53:18 AM

System:

WWTP Primary Sludge Pumping

Server:

WWTP_AE

Area:

WWTP.Primary (RNA://SGlobal/SSLOCSD_HMI)

Prepared By:

Mason

1. Incident Summary

At 1:53:18 AM on March 2, 2026, the SCADA system generated three simultaneous Priority 900 (Urgent) alarms in the WWTP Primary area. All three alarms were in a TRIP condition and cleared shortly after, with the system returning to normal operation by approximately 2:02 AM.

Alarms Generated

Alarm Tag

Description

Alarm Class

Condition Quality

P2312_Alm_DriveFault

Primary Sludge Pump 2 Drive Fault

P_PF52x

Bad Quality – Communication Failure

P2322_Alm_DriveFault

Primary Sludge Pump 4 Drive Fault

P_PF52x

Bad Quality – Communication Failure

LT_3210_Alm_LoLo

Secondary Sump Low-Low Alarm

P_PF52x

Bad Quality – Communication Failure

2. Investigation & Analysis
2.1 Alarm Detail Review

Detailed examination of the alarm properties for all three alarms revealed that each carried a Condition Quality of "Bad Quality – Communication Failure." This designation indicates that the SCADA system flagged the alarm data as unreliable due to a loss of communication, rather than representing a genuine field condition.

2.2 Drive Fault Alarms (P2312 & P2322)

The Primary Sludge Pump 2 and Pump 4 drive fault alarms are triggered by a drive fault status bit communicated from the variable frequency drives (VFDs) to the PLC over Ethernet. When the SCADA system lost communication with the PLC, the drive fault status values defaulted to their fail-safe state, which the alarm system interpreted as active faults.

2.3 Secondary Sump Low-Low Level Alarm (LT_3210)

The sump level transmitter (LT_3210) is a hardwired 4–20 mA analog instrument connected directly to the PLC input module. Under normal circumstances, a communication loss between SCADA and the PLC would not affect the actual field signal. However, during the communication disruption, the SCADA system was unable to read the level value from the PLC. The value momentarily defaulted to zero (or was flagged as bad quality) on the SCADA side, which triggered the Low-Low alarm. Trend data confirms the level returned to its normal value immediately after communication was restored, indicating the actual sump level never dropped.

2.4 Hypotheses Considered & Eliminated

1.     PLC Power Loss: If the PLC had experienced a power interruption, all I/O values from that controller would have dropped to zero simultaneously. Review of trend data confirmed that other values associated with the same PLC remained stable throughout the event. This hypothesis was eliminated.

2.     Ethernet Communication Loss (Drive Network Only): A communication failure on the drive Ethernet network could explain the VFD fault alarms, as the drive fault status is transmitted over Ethernet. However, this would not account for the simultaneous level transmitter anomaly, since LT_3210 is hardwired to the PLC and does not rely on Ethernet communication. This hypothesis was eliminated as the sole cause.

3.     SCADA-to-PLC Communication Loss: A momentary loss of communication between the SCADA server and the PLC would explain all three alarms simultaneously. Both the drive fault status (read from the PLC, which receives it over Ethernet from the VFDs) and the sump level value (read from the PLC, which receives it via hardwired analog input) would become unavailable to SCADA at the same time. The Bad Quality – Communication Failure tag on all three alarms confirms this as the root cause.

3. Root Cause Determination

The root cause of all three alarms is a momentary communication loss between the SCADA system (Server: WWTP_AE) and the PLC controlling the WWTP Primary Sludge area. This communication disruption lasted only seconds and caused the SCADA system to temporarily lose visibility of process values from the PLC, resulting in spurious alarms as default/fail-safe values triggered alarm conditions.

4. Timeline of Events

Time

Event

1:53:18 AM

SCADA-PLC communication loss occurs. Three alarms triggered simultaneously: P2312 Drive Fault, P2322 Drive Fault, LT_3210 Low-Low Level. All tagged Bad Quality – Communication Failure.

1:55:35 AM

Secondary Sump Low-Low alarm acknowledged by operator.

1:55:52 AM

Pump 4 Drive Fault acknowledged.

1:56:09 AM

Pump 2 Drive Fault acknowledged.

2:00:17 AM

CCT Mid Channel ORP Low Alarm triggered (P_AE110B_ORP, Val=455.4). Separate issue, unrelated to communication event.

~2:01–2:02 AM

All drive fault and sump level alarms clear. Communication restored. System returns to normal.

5. Recommendations

4.     Network Path Investigation: Inspect the network infrastructure between the SCADA server (WWTP_AE) and the affected PLC, including managed switches, fiber connections, and any intermediate network devices, for signs of intermittent failures or errors.

5.     Network Switch Log Review: Review switch logs and port diagnostics for the timeframe of 1:50–2:05 AM on 3/2/2026 to identify any port flaps, CRC errors, or link state changes.

6.     Alarm Configuration Review: Consider implementing alarm suppression or shelving during communication quality events (Bad Quality) to prevent spurious alarms from generating operator distractions.

—Mason Radke

—Autosys, LLC

## Attachments
- ![[20260305_19cbefac809a_image001.png]]

---

Extracted Entities

Type	Key	Value	Confidence	Evidence
contact	Mike Arias Email	Arias@sslocsd.us	100%	From: Mike Arias <Arias@sslocsd.us>
contact	Mike Arias Phone	805-489-6666	100%	805-489-6666
contact	Mason Radke Email	mason@autosysnet.com	100%	From: Mason Radke <mason@autosysnet.com>
contact	Kevin Seifert Email	kevin@autosysnet.com	100%	From: Kevin Seifert <kevin@autosysnet.com>
server	SCADA Server	WWTP_AE	100%	The root cause of all three alarms is a momentary communication loss between the SCADA system (Server: WWTP_AE) and the PLC
site	Client Location	1600 Aloha Place, Oceano, Ca.	100%	1600 Aloha Place, Oceano, Ca.
site	Client Name	South San Luis Obispo County Sanitation District	100%	South San Luis Obispo County Sanitation District
system	SCADA System	WWTP Primary Sludge Pumping	100%	System: WWTP Primary Sludge Pumping
task	Network Path Investigation	Inspect network infrastructure between SCADA server WWTP_AE and PLC for intermittent failures	100%	Inspect the network infrastructure between the SCADA server (WWTP_AE) and the affected PLC
task	Network Switch Log Review	Review switch logs and port diagnostics for 1:50-2:05 AM on 3/2/2026	100%	Review switch logs and port diagnostics for the timeframe of 1:50-2:05 AM on 3/2/2026
task	Alarm Configuration Review	Consider alarm suppression or shelving during communication quality events	100%	Consider implementing alarm suppression or shelving during communication quality events
task	On-site Testing and Network Analysis	Kevin Seifert to perform on-site testing and network analysis next week	100%	I will be on-site next week to perform some testing and network analysis.

File: Clients/SSLOCSD/emails/2026/march/Sludge_Pump_Alarm_Investigation_Report.md
Updated: 2026-03-11 23:57:58.882256