NEURAL NETWORK CLASSIFIER OF TEXT INFORMATION

The theoretical foundations of machine classification of text information are considered. Recently, there has been an increase in interest in this topic. The paper highlights the main stages and main difficulties in solving problems of this direction, presents the data obtained as a result of the work of a simple algorithm for the classification of text information. The preliminary filtering of texts, the formation of feature vectors, the structure and principles of training a neural network are discussed. The F-measure is used to evaluate the results. The comparison of the results for three collections of texts for different parameters of the preliminary filter, the number of neurons in the hidden layer and the training time of the network is carried out. The proposed model of the classifier allows solving the classification problem with an accuracy of more than 80% percent. In this case, the quality of the training data makes a decisive contribution to the classification accuracy. Conclusions about the quality of the results and options for further research on this topic are presented.

Authors: E. N. Karuna, P. V. Sokolov

Direction:

Keywords: Classification, machine learning, thematic analysis, neural network, stemming

View full article