loader image

Synthetic data and data protection

#Syntheticdata (SD) is fake data created from real data that allows secondary analysis (e.g. for research, consumer behaviour, etc) in a privacy-friendly manner (#PET). It takes an original (real) dataset and then it builds a model to characterize the distributions and relationships in that data («synthesizer», an ANN or another ML algo).

As the data generated retains the statistical properties of the original data, it can serve as a proxy for real data. It mimics real data.

Is SD #personaldata?
– It depends 🙂
– SD does not have a 1:1 relationship to real data, which reduces the chances to be considered personal data.
– However, if the model overfits the real data, it’ll replicate that data and it will be considered PD

Key legal questions to evaluate before creating SD:
– Is the use of the original dataset to generate and/or evaluate a synthetic data set regulated by law?
– Is sharing the original data set with a third-party service provider to generate the synthetic data set regulated?
– Does the law regulate the resulting synthetic data set?

EDPS – European Data Protection Supervisor will host a webinar on this topic: “Synthetic data: what use cases as a privacy enhancing technology?” (16.06.21)

 

Sources:
Photo
Questions
More info

Categories
Latest news

Related Posts

Comments

0 comentarios

Enviar un comentario

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *