Parallel method for assembling neural networks in a fault prediction subsystem of high-performance computing complexes

The article discusses the issues of increasing the efficiency of using artificial neural networks to predict failures and failures of high-performance computing complexes in real time. Particular attention is paid to multicomponent systems with complex switching and massive parallelism. To improve the accuracy of forecasting failures and breakdowns, a parallel method of assembling neural networks is proposed based on identifying internal logical relationships between different elements of the complex and performing special fragmentation of training samples for each tier. An experimental complex based on a cluster system is synthesized and comparative testing of the forecasting module various configurations is carried out. The task of timely diagnostics and prediction of specialized digital complexes malfunctions is of particular importance, since serious incidents can cause not only a suspension of equipment operation, but also the loss of critical data.

Authors: V. Yu. Meltsov, A. K. Krutikov

Direction: Informatics, Computer Technologies And Control

Keywords: computing complex, cluster system, diagnostics, failure prediction, artificial neural network, fragmented sampling, cascade architecture, event log, software prototype


View full article