Sunday, August 4, 2013

Could a Counter Interrogation Service bring the European Power or Gas Networks down?

Good question! Easy to answer: Yes! It depends on the standard and implementation used.

Early May 2013 it almost happened in Europe. What? During a test of a new control center communication and application an IEC 60870-5-101 or –104 Broadcast “Counter interrogation” command went out to interrogate counters from ALL RTUs somehow “connected”. The command was received and answered by all these RTUs. Obviously one RTU responded with a “Broadcast” response … and obviously there was a “loop” somewhere in the network … it ended up in flooding the network for days!!!

The operators had very severe problems to get status and measurements from the process – because first the network was sending bunches of messages back and forth and around. Second, when experts started to “break” the “loops” and disconnect from the neighboring network they could “cool” down the traffic but lost some awareness of the system’s situation. After a few days they fixed some software … but they did not yet find the device that caused the trouble. According to a report from experts involved.

Hm!? That’s really a crucial issue with a standard protocol in operation for 15 or 20 years.

Here is why this could happen at all: During the days IEC 60870-5-101 was designed, people thought that the communication is strictly hierarchical and looks like a tree (top-down) – see next figure from 101: 


For counter interrogation the broadcast is often used in order to catch the counter values at a certain time, let’s say 20:00 h. To freeze the value at 20:00 h the control center has to send out a broadcast counter interrogation to freeze the value at 20:00 h (+/- some seconds – due to travel time …).

Next it can send another command to start sending the values from the RTUs to the control center.

That means: A lot of messages have to be sent at the same time … to reach all RTUs … in star topologies, or “looped” networks, … how to control such a process if you have hundreds of RTUs … owned by different utilities … blablabla …

The issue is here: People thought that you could start system-wide synchronous functions by synchronizing through timeliness messages. That may work in simple topologies … but … in Smart Grid systems with many (many) meters, it is unlikely that this approach will work reliably.

How does IEC 61850 solve that requirement? It defines a concept of time-wise synchronized RTUs (or generally speaking IEDs). The control center can send a command to freeze well in advance – an hour or two … so that no message shower will occur around 20:00 h. The IEC 61850 server stores the time when it has to freeze the corresponding value(s). The server can then send the frozen values via a data set and report control block, or can the data set or log it.

The synchronization is completely decoupled from the freezing and retrieving process.

The process is configured using the common data class BCR (Binary Counter Reading):


This model really is based on the (bad) experience with 101 and 104 … and … it works … and does not flood the network!

The broadcast command in 101 and 104 SHOULD be REMOVED … at least utilities should no longer rely on it!!! Take this very serious … as many other utility experts do.

No comments: