next up previous
Next: Conclusion Up: Model-Based Alarm Correlation in Previous: Modeling Alarms in Cellular

Implementation and First Results

The DRUM-2 Engine

In the first generic systems for model-based diagnosis [5] poor performance on complex applications was the price to pay for flexibility. Early systems, based on logical inference, had the problem of recomputations during diagnosis. After several years of research and intermediate results [6, ], Raiman, de Kleer and Saraswat succeeded in constructing a diagnosis engine, which was able to solve large combinatorial problems by focusing computation and avoiding recomputations of values [12].

DRUM-2 emerged from a different line of research. It treats a model of the system as a data structure and computes diagnoses by manipulating this model. It is simular in spirit to the works of Chou and Winslett on model-based belief revision [2]. By directly manipulating the model, DRUM-2 avoids recomputations of values (which can be accessed through the model) as well as symbolic operations, which tend to slow down computation. Using this approach DRUM-2 is able to solve combinatorial benchmark circuits with more than 3000 components efficiently [11]. A detailed description of the semantics can be found in [7].

First Results

In this section we report the performance of our model-based approach on a library of 32 representative alarm cases. We are currently extending this to 512 alarm cases from our GSM subnetwork. Current results, test cases and references can be found on our web page http://www.kbs.uni-hannover.de/project/alarm.html.

The alarm cases represent a wide range of alarm patterns for the subnetwork shown in the previous examples (tex2html_wrap_inline700 to tex2html_wrap_inline702). As we already stated, due to the heavy traffic caused by defects of microwave links (alarm bursts), the probability of suppressed or lost bts failure alarms is rather high. For the test cases we assumed a probability of 0.1 of lost alarm messages and 0.01 for a faulty microwave link. The probabilities are much lower in reality, but for the purpose of diagnosis the exact values do not matter. The probabilities are needed only to discriminate between more plausible diagnoses (assuming less lost messages and faulty microwave links) and less plausible diagnoses.

In all test cases except one the system identifies the correct diagnosis, either as the single plausible diagnosis, or as the most probable diagnosis. We comment on the only exception in the next section.

The running time of our prototype is also very encouraging. Table 1 shows the typical running time of our system for one test case. In the first row we show the time for the subnetwork used above. The second row shows the running time on the complete network of a large German city. All times were measured on a SUN Ultra 1 workstation.

 

Network # BTSs # MLs Time
One Subnetwork 5 5 0.8s
A City Network 22 20 2.5s
Table 1:   Running time of our system

Some Case Studies

Let us now examine three of our test cases in more detail to discuss the scope of our current approach. The first example obeys our first deterministic model as well as the probabilistic approach:

tex2html_wrap722

In this example microwave link 1 is faulty and error messages are generated for all base transceiver stations located downstreams. Since none of these messages is lost, the example can also be explained by the deterministic model. In the next example the message from tex2html_wrap_inline702 is lost:

tex2html_wrap724

This case is still handled correctly by the probabilistic model, because the other minimal diagnoses tex2html_wrap_inline706, tex2html_wrap_inline708, tex2html_wrap_inline710 and tex2html_wrap_inline712, tex2html_wrap_inline714, tex2html_wrap_inline716, tex2html_wrap_inline718, tex2html_wrap_inline710 are less likely (since they assume more lost messages). Using the most probable diagnosis approach our system is able to handle 31 out of 32 alarm cases correctly. In the following case it produces no diagnosis, since all relevant alarm messages were lost or suppressed.

tex2html_wrap726


next up previous
Next: Conclusion Up: Model-Based Alarm Correlation in Previous: Modeling Alarms in Cellular