Determining Dependencies with Neural Networks

Next: Architecture for Automated Model Up: New Approach to Automated Previous: New Approach to Automated

Determining Dependencies with Neural Networks

Our new approach is based on neural networks and thus very different from traditional ones explained further above. As input data for the neural networks we use any kind of values that express an object's activity within small time periods (of about 1 to 10 seconds each). We further restrict our selection on values that are relatively easy to collect and available for most types of services, hosts, etc. Examples for such values are, but are not restricted to:

CPU activity of hosts (mainly useful, if the selected hosts are interesting objects by themselves or carry only one main service);
CPU usage of an application, compared to the CPU power available over a certain period of time (useful in various cases measuring applications in scenarios different from above);
communication bandwidth used by a system during each of the short time intervals;
sum (or other appropriate function) of activities of sub-components (if the activity of an object is not directly measurable as one value, like for distributed applications or domain objects).

Generally speaking, this is data taken from lower layers like the operating system, middleware or transport system.

**Figure:** Neural network decides per pair of objects

As depicted by figure two streams (time series) of activity data are fed into a pre-trained neural network for each relevant pair of objects. The neural network decides whether a dependency exists or not and (if required) retrieves further information about that dependency like the assumed dependency strength. Of course, the values of activity do not explicitly show a dependency. But simplifying the process within the neural network one can imagine that peaks of activity often occurring in both input streams with similarly repeating patterns over time (as depicted in the example below) allow the conclusion of a dependency.

Neural networks were chosen because of advantages, like:

dealing with uncertain information, and
robustness to noise in the input data.

These advantages are necessary to overcome the lack of explicitly useful information in the simple input values and problems like small timely displacements of values at certain managed objects (e.g., due to not well synchronized clocks). The second point is especially important, because--depending on the kind of values that express activity--there potentially is a lot of ``internal'' activity, meaning that actions are performed which are completely unrelated to other objects outside.

In our project we constructed and trained neural networks with data collected from real environments for which the results (whether dependencies between the objects exist or not) where known. For a proper decision quality the training set had to contain data from at least two or more distinct types of service implementations and different sources of activity data (to obtain samples of positive training cases) as well as pairs of non-related services (negative cases). Each of them was observed under various usage conditions and during times of high and low service utilization. Using data from real environments led to the problem of noisy training data, but with the neural networks quality to perform generalization on the input data our requirements could still be met. As a positive consequence the design and installation of a special test field was not necessary.

In later tests we positively verified the results in different environments without retraining the neural network. However, we do not exclude that it may be necessary to improve the neural networks in other cases, e.g., with the help of special reinforcement learning techniques that can be applied even in parallel to the networks utilization. Further studies on the robustness and general reusability are currently in progress.

Figure shows two example plots of data collected from two hosts during the same period of time with a sampling rate of 5 seconds. The values shown represent the intensity of the hosts' IP-communications with others during time intervals of five seconds. It is of course just a very small excerpt of the real time series fed into the neural network.

**Figure:** Time series of two hosts' activity values

**Figure:** Architecture's hierarchy of probes and agents

The high spikes within the plots are of special interest. At three time intervals (numbered with 1785, 1801 and 1826 for the first host, respectively 1784, 1800 and 1825 for the second) both hosts show an activity of nearly the same intensity and shape indicating a possible relationship. The plot of host 1 additionally shows further significant activities (e.g., at 1794 and 1837) which turn out to be noise for the investigation of the two hosts' relationship.

The fact that two services show activity at the same time does of course not yet allow to say that they are dependent, but after observing this behavior several times with similar peak patterns, a decision becomes plausible. Further algorithms are used to find groups of dependencies (occurring together or in a row, respectively) that belong to one type of transaction and to distinguish two timely unrelated dependencies involving common objects (A $\rightarrow$ B; B $\rightarrow$ C) from real transient ones (A $\rightarrow$ B $\rightarrow$ C).

A possible disadvantage of pairwise calculations is that it needs O(n²) time for n elements. For large numbers of n special techniques must be applied: One simple possibility is to pre-exclude pairs that are either not of interest, or where dependencies are not possible anyway. In the web server scenario one could omit all calculations for pairs of web clients. This usually makes up a significant percentage, comparing the huge number of clients against a smaller number of servers. Further reduction comes from applying the domain concept. Smaller models are generated per interesting domain. Additionally, the activity of the whole domain is condensed into one single domain activity allowing to calculate the dependencies between domains and also between one single object in one domain and (other) `outside' domains.

At the same time the argument of complexity also supports the use of neural networks, as they--once trained--are able to calculate the results faster than traditional correlation analyses.

Looking back at the requirements listed at the beginning of this section, a decision method about dependencies as such is obviously not enough for comprehensive dependency modeling. It has to be supplemented with an architecture allowing for a clean integration into real IT-environments. Such an architecture is presented in the following.

Next: Architecture for Automated Model Up: New Approach to Automated Previous: New Approach to Automated