← SSLOCSD

Clients/SSLOCSD/emails/.raw/2026/march/Re_Sludge_Pump_Alarm_Investigation_Report.md

gmail
Source
3
Chunks
11
Entities
Doc
Type

Content

Thank you Mason. I will be on-site next week to perform some testing and network analysis. -Kevin Seifert, AutoSys LLC ________________________________ From: Mason Radke <mason@autosysnet.com> Sent: Wednesday, March 4, 2026 9:42 AM To: Mike Arias <Arias@sslocsd.us> Cc: Mychal Jones <Mychal@sslocsd.us>; Kevin Seifert <kevin@autosysnet.com> Subject: Re: Sludge Pump Alarm Investigation Report Good Morning Michael, Thank you for the feedback, im glad the report was easy to follow. To your question, yes, adding a slight delay to the alarm conditions could technically filter out these brief communication blips. However, I'd consider that more of a band-aid than a fix. My concern is that we'd be masking a symptom rather than addressing the underlying issue. I believe this event is related to the larger network issue we've been troubleshooting starting at the CP4000. Until we can identify and resolve that root cause, I'd recommend holding off on making alarm configuration changes or any other system-wide modifications. Introducing delays or other workarounds right now could make it harder to spot these events as they occur, which is actually valuable diagnostic information while we're still investigating. My suggestion would be to sit tight on the alarm side for now and continue to address these on a case-by-case basis as they come up. That way we can use each occurrence as a data point to help us zero in on the root cause. Once we've resolved the network issue, we can circle back and discuss whether any alarm tuning makes sense at that point. —Mason Radke —Autosys, LLC From: Mike Arias <Arias@sslocsd.us> Date: Wednesday, March 4, 2026 at 8:24 AM To: Mason Radke <mason@autosysnet.com> Cc: Mychal Jones <Mychal@sslocsd.us>, Kevin Seifert <kevin@autosysnet.com> Subject: RE: Sludge Pump Alarm Investigation Report Good Morning Mason, Thank you for the detailed analysis of the alarm/callout, the layout of the report made it very easy to digest the information. One of the recommendations was alarm configuration review. Would a slightly longer delay in alarm conditions address this type of issue while still creating the alarm if the communication error persists? If this is a viable solution, would we want to review the system as a whole or address issues as they arise? I would be hesitant to automatically shelve alarms but maybe a system wide review would be valuable. Kind regards, Michael Arias Michael J. Arias Operations Supervisor Grade III Operator South San Luis Obispo County Sanitation District 1600 Aloha Place, Oceano, Ca. 805-489-6666 [South San Luis Obispo County Sanitation District] From: Mason Radke <mason@autosysnet.com> Sent: Tuesday, March 3, 2026 8:12 PM To: Mike Arias <Arias@sslocsd.us>; Kevin Seifert <kevin@autosysnet.com> Subject: Sludge Pump Alarm Investigation Report ALARM INVESTIGATION REPORT WWTP Primary Sludge System — Spurious Alarm Event Date of Event: March 2, 2026 Time of Event: 1:53:18 AM System: WWTP Primary Sludge Pumping Server: WWTP_AE Area: WWTP.Primary (RNA://SGlobal/SSLOCSD_HMI) Prepared By: Mason 1. Incident Summary At 1:53:18 AM on March 2, 2026, the SCADA system generated three simultaneous Priority 900 (Urgent) alarms in the WWTP Primary area. All three alarms were in a TRIP condition and cleared shortly after, with the system returning to normal operation by approximately 2:02 AM. Alarms Generated Alarm Tag Description Alarm Class Condition Quality P2312_Alm_DriveFault Primary Sludge Pump 2 Drive Fault P_PF52x Bad Quality – Communication Failure P2322_Alm_DriveFault Primary Sludge Pump 4 Drive Fault P_PF52x Bad Quality – Communication Failure LT_3210_Alm_LoLo Secondary Sump Low-Low Alarm P_PF52x Bad Quality – Communication Failure 2. Investigation & Analysis 2.1 Alarm Detail Review Detailed examination of the alarm properties for all three alarms revealed that each carried a Condition Quality of "Bad Quality – Communication Failure." This designation indicates that the SCADA system flagged the alarm data as unreliable due to a loss of communication, rather than representing a genuine field condition. 2.2 Drive Fault Alarms (P2312 & P2322) The Primary Sludge Pump 2 and Pump 4 drive fault alarms are triggered by a drive fault status bit communicated from the variable frequency drives (VFDs) to the PLC over Ethernet. When the SCADA system lost communication with the PLC, the drive fault status values defaulted to their fail-safe state, which the alarm system interpreted as active faults. 2.3 Secondary Sump Low-Low Level Alarm (LT_3210) The sump level transmitter (LT_3210) is a hardwired 4–20 mA analog instrument connected directly to the PLC input module. Under normal circumstances, a communication loss between SCADA and the PLC would not affect the actual field signal. However, during the communication disruption, the SCADA system was unable to read the level value from the PLC. The value momentarily defaulted to zero (or was flagged as bad quality) on the SCADA side, which triggered the Low-Low alarm. Trend data confirms the level returned to its normal value immediately after communication was restored, indicating the actual sump level never dropped. 2.4 Hypotheses Considered & Eliminated 1. PLC Power Loss: If the PLC had experienced a power interruption, all I/O values from that controller would have dropped to zero simultaneously. Review of trend data confirmed that other values associated with the same PLC remained stable throughout the event. This hypothesis was eliminated. 2. Ethernet Communication Loss (Drive Network Only): A communication failure on the drive Ethernet network could explain the VFD fault alarms, as the drive fault status is transmitted over Ethernet. However, this would not account for the simultaneous level transmitter anomaly, since LT_3210 is hardwired to the PLC and does not rely on Ethernet communication. This hypothesis was eliminated as the sole cause. 3. SCADA-to-PLC Communication Loss: A momentary loss of communication between the SCADA server and the PLC would explain all three alarms simultaneously. Both the drive fault status (read from the PLC, which receives it over Ethernet from the VFDs) and the sump level value (read from the PLC, which receives it via hardwired analog input) would become unavailable to SCADA at the same time. The Bad Quality – Communication Failure tag on all three alarms confirms this as the root cause. 3. Root Cause Determination The root cause of all three alarms is a momentary communication loss between the SCADA system (Server: WWTP_AE) and the PLC controlling the WWTP Primary Sludge area. This communication disruption lasted only seconds and caused the SCADA system to temporarily lose visibility of process values from the PLC, resulting in spurious alarms as default/fail-safe values triggered alarm conditions. 4. Timeline of Events Time Event 1:53:18 AM SCADA-PLC communication loss occurs. Three alarms triggered simultaneously: P2312 Drive Fault, P2322 Drive Fault, LT_3210 Low-Low Level. All tagged Bad Quality – Communication Failure. 1:55:35 AM Secondary Sump Low-Low alarm acknowledged by operator. 1:55:52 AM Pump 4 Drive Fault acknowledged. 1:56:09 AM Pump 2 Drive Fault acknowledged. 2:00:17 AM CCT Mid Channel ORP Low Alarm triggered (P_AE110B_ORP, Val=455.4). Separate issue, unrelated to communication event. ~2:01–2:02 AM All drive fault and sump level alarms clear. Communication restored. System returns to normal. 5. Recommendations 4. Network Path Investigation: Inspect the network infrastructure between the SCADA server (WWTP_AE) and the affected PLC, including managed switches, fiber connections, and any intermediate network devices, for signs of intermittent failures or errors. 5. Network Switch Log Review: Review switch logs and port diagnostics for the timeframe of 1:50–2:05 AM on 3/2/2026 to identify any port flaps, CRC errors, or link state changes. 6. Alarm Configuration Review: Consider implementing alarm suppression or shelving during communication quality events (Bad Quality) to prevent spurious alarms from generating operator distractions. —Mason Radke —Autosys, LLC ## Attachments - ![[20260305_19cbefac809a_image001.png]]

Extracted Entities

TypeKeyValueConfidenceEvidence
contact Michael Arias Email Arias@sslocsd.us 100% To: Mike Arias <Arias@sslocsd.us>
contact Mychal Jones Email Mychal@sslocsd.us 100% Cc: Mychal Jones <Mychal@sslocsd.us>
contact Kevin Seifert Email kevin@autosysnet.com 100% Cc: Kevin Seifert <kevin@autosysnet.com>
contact Mason Radke Email mason@autosysnet.com 100% From: Mason Radke <mason@autosysnet.com>
contact Michael Arias Phone 805-489-6666 100% 805-489-6666
server SCADA Server WWTP_AE 100% Server: WWTP_AE
site Client Plant South San Luis Obispo County Sanitation District 100% South San Luis Obispo County Sanitation District 1600 Aloha Place, Oceano, Ca.
system SCADA System WWTP Primary Sludge Pumping 90% System: WWTP Primary Sludge Pumping
task Network Path Investigation Inspect network infrastructure between SCADA server WWTP_AE and PLC 95% Network Path Investigation: Inspect the network infrastructure between the SCADA server (WWTP_AE) and the affected PLC
task Network Switch Log Review Review switch logs and port diagnostics for 1:50–2:05 AM on 3/2/2026 95% Network Switch Log Review: Review switch logs and port diagnostics for the timeframe of 1:50–2:05 AM on 3/2/2026
task Alarm Configuration Review Consider alarm suppression or shelving during communication quality events 90% Alarm Configuration Review: Consider implementing alarm suppression or shelving during communication quality events
File: Clients/SSLOCSD/emails/.raw/2026/march/Re_Sludge_Pump_Alarm_Investigation_Report.md
Updated: 2026-03-11 23:56:52.114533