Due to fast and cost-efficient installation of links in base station subsystems, new operating companies realize most of their connections in the Base Station Subsystem (BSS) with microwave links. Other links are established with leased lines. The resulting network topology is a star configuration, where the traffic to several base station transceivers is distributed over a chain of microwaves and leased lines. Figure 3 shows an example. Cross-Connects are not displayed in figure 3 because we treat them, once initialized, as fixed and transparent connections. There is only one transmission path from a BTS to the related BSC.

Figure 3: Star-configuration of a base station
subsystem
Alarm messages of BSC and BTS indicating link faults can be classified into three groups:
>1E-3, FAREND_ALARM_1, PCM_FAILURE etc.)
OSI-layer two and upper layers (FAILURE_IN_D-CHANNEL_ACTIVATION, LAPD_LINK_FAILURE, BTS_OMU_LINK_FAILURE).
Performance of microwave systems is weather dependent. Dense fog, heavy rain or snow can increase the bit error rate resulting in a connection breakdown. Such a breakdown of a physical connection interrupts voice connections as well as connections for control messages. As a consequence, up to 100 alarms are generated and transmitted to the OMC for a single failure. The operators in the OSS face several problems. First, a lot of alarms are forwarded to the OSS and have to be handled by the staff. More important alarms have to be separated from less important ones. The identification of the original failure is necessary as no alarms are available to directly indicate this failure. Operators are under stress, and reaction time to faults increases. Important alarms are misinterpreted or overlooked. Observations in the OSS have shown that alarm patterns belonging to the same fault do not match exactly as the original alarm patterns are disturbed by noise. This noise can result from other faulty or fluttering devices or delays in the transmission of alarms to the OSS. Prefiltering mechanisms in mediation devices and overload (e.g. in the transmitting system or mediation device) can also generate noise. A tool supporting the operating staff in the task of alarm- and fault-management is necessary to speed up reaction time to faults. This tool has to condense alarms, correlate them and precisely diagnose initial causes to achieve better quality of service. It is connected to the network management platform in the OSS.
In the coding approach [14] each link-failure in the managed network is represented by an alarm vector. The binary alarm vector contains 1's for generated alarms and 0's otherwise. The alarm vectors for all links are collected in a codebook. At runtime the actual alarm vector is compared to the vectors in the codebook by calculating the Hamming-distance. The fault with the smallest Hamming-distance to the current alarm vector is assumed to be the cause of the observed failure. The coding approach speeds up alarm correlation compared to the usual rule-based approaches. However, for large networks or multiple faults the codebooks produced can be huge and the codebooks have to be regenerated after each topology change.
Approaches to rule based alarm correlation are known from literature. In [1] an expert system for a transport network is shown. [10] describes an intelligent filter for a SDH-network. Problems of rule based approaches are the evaluation of rules and real time-diagnosis. A prototype expert system for alarm correlation in GSM-Networks was built by several of the authors. Linked to an existing network management platform, a rule based expert-system was developed using the CLIPS programming language. Our prototype contains special rules and facts to deal with the network management platform. The behavior of the network elements due to link breakdowns is described by user-defined rules, topology information is represented by user-defined facts. Incoming alarms are forwarded from the management system to the prototype and alarm facts are created. Results from the expert system (e.g. cancelation of alarms) are returned.
The prototype can filter alarms on programmable criteria and diagnose the breakdown of links in case of single faults. With respect to configuration changes, the knowledge depending on topology information is modeled by generic rules. Multiple faults and noise are not treated well yet.
The IMPACT system described in [8] uses a hierarchy of network element types as well as a network configuration model and messages classes, which makes it more configurable and modular than other systems. However, correlation of alarms is still done by heuristic rules, no explicit description of alarm behavior and alarm propagation is used.
The system most closely related to ours is the AIM system developed within the RACE project AIM. The goal of this system (described in [9]) is the maintenance of telecommunication networks, for example broadband ISDN networks. The system shows the typical characteristics of a model-based system, using an explicit model of the telecom network as well as a model of its behavior. It therefore exhibits the advantages of model-based systems discussed in the next section, including easy maintenance, reconfiguration and extension. Being developed six years ago, however, the techniques used (based on a simple ATMS system) do not scale up well for large networks. Also, the network diagnosed consists of rather unintelligent network elements, which do not have the ability to diagnose local faults and generate alarms for such faults.