Synthetic data and data protection

6 · 04 · 21

#Syntheticdata (SD) is fake data created from real data that allows secondary analysis (e.g. for research, consumer behaviour, etc) in a privacy-friendly manner (#PET). It takes an original (real) dataset and then it builds a model to characterize the distributions and relationships in that data («synthesizer», an ANN or another ML algo).

As the data generated retains the statistical properties of the original data, it can serve as a proxy for real data. It mimics real data.

Is SD #personaldata?
– It depends 🙂
– SD does not have a 1:1 relationship to real data, which reduces the chances to be considered personal data.
– However, if the model overfits the real data, it’ll replicate that data and it will be considered PD

Key legal questions to evaluate before creating SD:
– Is the use of the original dataset to generate and/or evaluate a synthetic data set regulated by law?
– Is sharing the original data set with a third-party service provider to generate the synthetic data set regulated?
– Does the law regulate the resulting synthetic data set?

EDPS – European Data Protection Supervisor will host a webinar on this topic: “Synthetic data: what use cases as a privacy enhancing technology?” (16.06.21)

Synthetic data and data protection

Categories

Latest news

Norwegian Datatilsynet on synthetic data

Not all AI is ML: ISO TR 24372:2021

People-centric approaches to algorithmic explainability (TTC Labs)

Related Posts

Norwegian Datatilsynet on synthetic data

Not all AI is ML: ISO TR 24372:2021

People-centric approaches to algorithmic explainability (TTC Labs)

Comments

0 comentarios

Enviar un comentario Cancelar la respuesta

Contact

Services