TOWARDS APPLICATION OF TEXT MINING TECHNIQUES TO THE ANALYSIS OF THE WEBSITE’S ONLINE CONSENTS

Website’s online privacy consents provide end users information about how they personal data collected, processed and shared with third parties by web services. However, in major cases they are written in unclear and not transparent manner. In the paper, the authors investigate application of two different text mining approaches to analyze texts of privacy policies. To detect different personal data usage scenarios, latent semantic analysis technique with several statistic text models was applied. Also, to establish links between elements of data scenario, the part-of-speech based analysis was used. Using the part-of-speech based analysis, it is possible to reveal logical sequences which describe data practices. The authors tested the selected algorithms against a set of labelled privacy policies, generated within Usable Privacy Project, and discuss the obtained results.

Authors: M. D. Kuznetzov, V. S. Myadzel, E. S. Novikova

Direction: Informatics, Computer Technologies And Control

Keywords: Text analysis, latent semantic analysis, synonym search, context-free grammars, natural language processing, internal data convention, morphological analysis


View full article