Questionnaire for the EDPB Stakeholder Event

This questionnaire is a contribution from the Rathenau Institute for the EDPB Stakeholder event on ‘processing of personal data for scientific research’ on 30 April 2021.

Question 1 – A definition of scientific research.

In the GDPR it is foreseen that certain requirements for controllers do not apply or are modified in case of ‘processing of personal data for scientific research purposes’ (research exemptions). In order to rely on said exemptions it is important to know what criteria to use to determine what - under the GDPR - can and cannot be considered ‘scientific research purposes’ and/or ‘scientific research’. The GDPR does not provide a definition and/or clear criteria. Both EDPB and EDPS have given an opinion on the matter. But making a distinction, for instance based on whether the scientific research is in the ‘public interest’ remains controversial.

What are your ideas/suggestions on how to clarify this issue?

The Rathenau Institute’s work is based on an institutional decision by the Dutch government. We perform research and organize debates relating to science, innovation, and new technologies.

As part of our 2020 report “Datasolidariteit voor gezondheid” (translated: “Data solidarity for health”) we examined the legal conditions for research with health data from medical files. These conditions are laid down in Dutch law and predate the GDPR as well as the national implementation of the EU Data Protection Directive 95/46/EC. In this context, the Dutch legislator has clarified what it considers research in the public interest. We set forth these considerations below, as these may act as a source for inspiration.

First, if it is expected that the proposed research project does not lead to new scientific insights, then the processing is most likely not in the public interest. The same is true regarding research driven by a personal hobby (‘amateurism’) or research that solely serves industrial or commercial purposes. According to the Dutch legislator, the latter does not mean that for profit-research cannot be in the public interest or that scientific research must be fully financed by public means.

In the context of research with health data from medical records, the Dutch legislator explained that research ‘in the public interest’ means that it aims to promote or protect public health. More specifically, this mean that it is expected with a certain degree of probability that it will significantly benefit a group of a reasonable size. In any case, according to the predecessor of the Dutch Data Protection Authority, the research should be in the public interest from the outset.

Despite the abovementioned conditions, clarification remains needed regarding the question whether the research results must be publicly available. Not every research leads to a publication and recital 159 of the GDPR seems to leave this option open. The recital merely states that “specific conditions should apply in particular as regards the publication (…)”. In contrast, the former Dutch minister of Medical Care argued that the research results should be published and that the FAIR principles are taken into account (https://www.go-fair.org/fair-principles/).

Our research shows that it can be difficult to determine if research is in the public interest, especially in case of modern research with AI, such as machine learning or natural language processing. For instance, what would be the difference between scientific research with AI and the (further) development or optimization of an AI tool in the context of product development? There are no easy answers since the AI models are continuously developed and improved as they are fed more (personal) data. This makes it difficult to establish where research ends and product development begins.

Question 2 – Research exemptions and the need to provide for additional safeguards.

In most ‘research exemptions’ in the GDPR, reliance on such an exemption is made conditional on the provision of additional and/or compensatory safeguards and measures to protect the rights and interests of data subjects. Article 89(1) GDPR also - in general terms – requires appropriate safeguards in case of processing for scientific research purposes.

What would you consider to be appropriate safeguards and measures to protect (which) rights and interests of data subjects in (which) cases of processing personal data for scientific research purposes?

Apart from the safeguards already mentioned in the guidance documents of the EDPB and the Article 29 Working Party (such as anonymization, data minimization, etc.), our research shows that safeguards are needed when commercial parties aim to collect personal data under the façade of ‘scientific research’ while in fact the processing does not entirely benefit the general public. For example, in the UK commercial third parties wanted to obtain health data from medical records with the aim to develop AI applications to combat COVID-19. The parties involved want to claim (intellectual) property rights regarding the data, effectively keeping the data to themselves. Under pressure by an independent global media organization, the data sharing arrangements were rolled back.

To avoid such situations, there should be transparency about the parties involved in a research project that uses sensitive data such as health data from medical files. Such information should be provided pro-actively, without the need for citizens or organizations to start for instance a FOIA procedure first. A best practice could be to actively engage with data subjects in order to discuss the intended data processing for research purposes (see also 35(9) GDPR). In cases outside the medical field the EDPB could clarify when increased transparency is needed, but also when this would hinder research purposes and how researchers should act in such situations.

Regarding research with health data from medical files, there should be written agreements in place on processing purposes and the processing activities between the organization that will provide the data (e.g., a hospital) and the party that will analyze the data (e.g., a research institute). Indeed, the GDPR mandates a data processing agreement (article 28) or a joint controllers arrangement (article 26), but in practice the hospital would be neither a processor or a (joint) controller. It would be helpful to data subjects if the EDPB prescribes written arrangements regarding the data provisions, given data subjects would then be able to obtain information about the essence of the arrangements (see also article 26(2) GDPR).

Lastly, a safeguard recognized by the Dutch legislator is the use of a trusted third party (‘TTP’). The TTP acts as an intermediary between the original collector of the personal data (e.g., a hospital) and the researchers. The TTP would be the only party that is technically capable to identify data subjects and acts on instructions of the hospital. This safeguard is not mandatory. However, if a research project involves a TTP, the Dutch legislator expects that arrangements with the TTP must be in place. These arrangements should contain the conditions that apply to the data provision to the researchers.

Question 3 – Further processing of personal data for scientific research purposes

Both the concept of ‘broad consent’ and the ‘compatibility presumption’ could play a role in facilitating the further processing of personal data for scientific research purposes.

What are your ideas on how to reconcile the use of such concepts with essential GDPR principles pertaining to the specificity of consent and to purpose limitation?

Before we dive in the concept of ‘broad consent’, we observe – based on our research – that a public debate about the compatibility presumption is needed. Especially regarding the questions which types of further processing are always incompatible with the original processing purposes (and thus consent would be needed apart from a law that specifies the further processing). For example, if a research project aims to fulfil multiple purposes, such as research and the (commercial) development of an AI tool, then consent would be needed as not all parts of the processing operation would benefit from the compatibility presumption for scientific research. If it is too late to obtain consent, then it follows from the GDPR that the processing may not take place. This could hinder legitimate research purposes. Is this, according the public, a desirable outcome? Should the GDPR provide leniency in case of a research projects that serve multiple purposes? Why (not)? These questions should be answered as part of a public debate.

In The Netherlands, as a principle rule the processing of health data for research purposes should be based on consent. Only under certain conditions the processing may take place. For instance, the data subject must have had the possibility to object to the processing, among other things. We are aware that EDPB deems a broad consent not easily compatible with the GDPR (EDPB Guidelines 05/2020). However, Dutch authors in the medical-legal domain claim that the EDPB’s strict opinion does not apply to consent in a medical research context, since this national legal regime provides for a type of consent which is different from the one mentioned the GDPR.

The EDPB should clarify when consent is needed. For instance, Dutch law allows the medical professional to conduct own research with medical data without consent of the patient (see also Opinion WP131 of the Article 29 Working Party, chapter 6 regarding statistical processing by health professionals). However, what are the boundaries of ‘own research’? This remains an open question. Also, obtaining consent could lead to a selection bias thus hindering the research purposes. How to cope with such situations? Can the researchers refrain from obtaining consent?

It can also be undesirable to obtain consent as part of international data transfers (article 49(1)(a) GDPR) if national law allowed the initial research processing without consent. In this scenario, the parties involved would be too late to obtain consent and the international data transfer should be halted. In a worst-case scenario, this could hinder useful international research if no other derogations are available to legally authorize the data transfers.

Question 4 – Transparency

In case of the storage of personal data in large databases, for long-term use for unspecified scientific research purposes, transparency on and control over the use of such personal data can be at risk.

What are your ideas on how data subjects can be provided with appropriate information and means to maintain control in such cases? What types of governance would you consider appropriate for such situations?

It is important that researchers and other parties involved are transparent about wat they are going to do with the data about individuals. Who is going to use the data and for what purposes? If commercial motives apply, then this should be clear from the outset. In reality, this is not always clear. In 2014 the Rathenau Institute called for mandatory transparency regarding the business model of health apps. Other research of the Rathenau Institute, published in 2019, showed that a lot of health apps lack transparency regarding their data use and the business models. In our ‘Datasolidarity-report’ of 2020, we observe that such apps are also used as part of research projects, both on a national and international level. Information about the business models should be available to data subjects, also in a research context.

Preferably, information about research projects should be easily accessible to data subjects, as centralized as possible. After all, one cannot expect to frequently check numerous websites or paper brochures to learn about certain data processing activities for research purposes. In the context of health data, the Dutch government endorses an initiative where data subjects can maintain their own “Personal Health Environments” (‘PGO’s’). The PGO’s act as digital data vaults and contain all health-related information about the data subjects, e.g., data collected by health apps or copies from his medical file.

PGO’s can indeed be a perfect opportunity to provide data subjects about the processing purposes of their data, including scientific purposes. However, the data collected in these PGO’s are not protected by the professional secrecy of health professionals. This also means that in practice, there will be no health professional aiding the data subject on whether it would be wise to share his data. Reasons why The Netherlands Patients Federation fears that PGO’s may be abused by parties that want to collect health data based on consent. Data subjects may feel pressured to give their consent.

Question 5 - Codes of conduct.

What issues do you consider suitable to be clarified and/or elaborated upon in a code of conduct? Who could take the lead in such a process? What would be determinants in successfully pursuing such a route?

In our view, there should not be one code of conduct since it would not be desirable or even possible to address all types of research. For instance, research with health data from medical files is different than research based on social media messages.

In any case, a code of conduct should provide answers to important questions. We raised some of these in this questionnaire. For example, if the code of conduct deals with health research, the code should make clear which types of research are not allowed especially if commercial motives may apply. Next to this, the code of conduct should establish best practices that need to be followed by the parties to the code. These best practices could go beyond the GDPR. For instance, by offering the option to object to the data processing of the research project more than once. Lastly, the code should specify how certain processing operations should take place. For example, how and when parties should anonymize and/or pseudonymize the personal data.

Preferably the code of conduct applies to international research performed within the EU/EEA and with parties outside the Union. Non-EEA actors should be able to adhere to the code of conduct.

Data subjects or their representatives should be involved in the drafting process of the code. For example, if the code of conduct concerns health data research, then the organizations representing patients should be involved. If not, then the code poses the risk to be drafted in a one-sided manner.

The lead should be taken by national or European organizations whose members proved themselves to be reliable and trustworthy. Such as organizations that represent academic research institutes.

Question 6 – Agreements between joint controller or controller and processor.

Partners collaborating in research projects, especially in large research consortia, should be aware of their respective responsibilities under the GDPR and make the appropriate agreements. For the field of ‘scientific research’, what would you consider good examples and/or models of joint controller agreements and/or ‘controller-processor agreements?’

During our research we did not encounter specific examples of agreements. However, we did come across several models in the (medical) research domain which could be interest to the EDPB.

For instance, research that falls under the Dutch Medical Research with Human Subjects Act (abbreviated as “WMO”) must be reviewed by an independent committee of experts. Without a positive decision by the committee, the research may not begin. This medical ethical review procedure could act as model for non-WMO research in which personal data are involved. It can also be considered as an appropriate safeguard (see question 2).

Another good model would be the Personal Health Train (PHT). PHT is a metaphor for the set of arrangements, the IT-architecture and the implementation of responsible health data use. PHT has been endorsed by the Dutch Ministry of Health. Based on the PHT model algorithms ‘visit’ decentralized datasets which act as ‘stations’. Individuals and organizations determine themselves which data will be available and for which purposes, e.g., for scientific research. The ‘stations’ prepare the requested data in a standardized manner. By doing so, the data will not be transferred to the algorithm, but the algorithm will be taken to (a copy of) the data. PHT can be beneficiary in terms of GDPR compliance since it can result in data minimization by only processing the data necessary for the (research) purposes. Moreover, the algorithm can be instructed to respect the principle of purpose limitation. Lastly, PHT is a good example of data protection by design.

Finally, the EDPB could take a look at health data cooperative models (such as the Swiss MIDATA platform). The cooperatives are non-profit entities that enable citizens to securely store, manage and control access to their data while also offering the opportunity to contribute their health information for research purposes. Another example would be Patients Like Me, a non-profit organization that transformed to a commercial party.