Pursuant to Article 57(1)(b) of the GDPR, on May 20, 2024 the Italian Data Protection Authority (“Italian DPA”) adopted guidelines [LINK] on web scraping, with the aim of providing guidance to operators of websites and online platforms, acting in Italy as data controllers of personal data made available online to the public.
Web scraping is defined by the Italian DPA as the massive collection of personal data from the web for the purpose of training generative artificial intelligence models. Specifically, whenever such phenomenon involves the collection of traceable information – linked to an identified or identifiable natural person – a data protection issue arises with reference to the identification of an appropriate legal basis for the processing of such data.
According to the guidelines, the assessment of the lawfulness of web scraping must be carried out on a case-by-case basis. Personal data are made available on the web as a result of a primary level processing by operators of online platforms as data controllers. Only then, third parties – often web robots or “bots” – may gather such data for different purposes while scraping the web. This is the reason why the Italian DPA addresses its guidelines to operators of online platforms: they are, in fact, the only ones able i) to more easily evaluate how data are used after being scraped from their platforms and ii) to implement measures on their platforms that may prevent or mitigate web scraping activity for purposes of training algorithms.
Possible precautions or enforcement actions identified by the Italian DPA are the following:
- Creation of restricted areas, which can only be accessed after registration. In this way, certain personal data would be removed from public availability;
- Inclusion of ad hoc clauses in the terms of service of the online platform expressly prohibiting the use of web scraping techniques;
- Monitoring network traffic to detect any abnormal flow of data and adopting limits as countermeasures;
- Direct intervention on bots (e.g. insertion on websites of CAPTCHA checks or monitoring log files to block undesirable users).
Such measures should be adopted by the data controller after an independent assessment – in compliance with the accountability principle, which increasingly appears to govern new data protection legislation and strategies. At any rate, the Italian DPA acknowledges that, albeit useful, none of these measures can be expected to entirely prevent web scraping from happening.

You must be logged in to post a comment.