The crawler forming the dataset of user agreements for the use of personal data

The collection and use of personal data to meet the digital needs of users are extremely common scenarios today. Users actively provide their personal data to improve the quality of digital services. At the same time, user agreements are the only tool for informing which personal data is used and how. There are different approaches to make user agreements more transparent, but most of these approaches require data to both experiment and train deep learning models. Currently, there are few datasets for researching of user agreements, and those that are available do not cover the internet-of-things market. Smart devices generate a huge amount of personal data traffic, so their user agreements deserve just as much attention as agreements of the websites. In this paper, the authors propose a new way of forming a dataset of user agreements, and also presents a corresponding tool that, in addition to the main functions, has a number of improvements for bypassing blocks, bans and captcha.

Authors: M. D. Kuznetsov, E. S. Novikova

Direction: Informatics, Computer Technologies And Control

Keywords: personal data agreements, crawler, dataset, data collection, data cleaning

