Data protection and data science

Data is becoming more important in many areas. It becomes indispensable for marketing strategies, optimization of IOT or for the use of machine learning. The importance of data is such that it becomes essential to secure it so that it does not fall into the hands of malicious people. In data science, the use of personal data is becoming ubiquitous, but what are the European rules for protecting personal data and how can a company implement such a policy? Answers below.


What is data protection?

Personal data defines all information relating to a physical person, whether the person is directly identified or not. It can be your name, first name, photos, physical or IP address, geolocation, email address, social media profiles, etc.

The General Data Protection Regulation in France and Europe (GDPR) obliges companies to secure this data. The rules apply from the moment an organization, company or even a physical or legal person uses, retains and collects personal data digitally or physically.

To do this, data collection, retention and use must be done under the following conditions:

  • Information on the data collected: the organization collecting this information must inform users of the nature of the data collected;
  • Data collection must be consented: individuals must give their consent for the use of their personal data and be able to access it. Also, data must be able to be corrected by the user and the user must be able to object to their use;
  • The collection of this data must have a specific and justified purpose: the purpose of the use of the data must be determined and relevant. In addition, the use of personal data cannot diverge and must be limited to the previously defined purpose;
  • The data collected must be of high quality: the data must be adequate but also accurate and kept up to date;
  • The retention of this data must be temporary: the data collected must be kept only temporarily and the user has the right to request the erasure or dereferencement of his data. For example, a person has the right to ask Google to delete search results associated with their last name;
  • Data must be secure: Data collected by a company must be highly secure and cannot be disclosed to a third party or organization. If, however, data security is breached and disclosure poses a significant risk to individuals, the data handler must notify the individuals concerned and the CNIL as soon as possible (72 hours).

Your rights in case of non-compliance with data protection

If a company or organization does not respect the protection of your personal data you have the option, according to the RGPD, to:

  • Requesting compensation for property or moral damage: in the event of a company's non-compliance with the RGPD, anyone who suffers the damage can ask the data handler for compensation for the damage;
  • Initiate or join a group action: In the event of a data breach, you can mandate a data protection agency or association to make a claim or appeal for damage repair.

Corporate bonds

To be in line with the European General Data Protection Regulation, any company must be able to:

  • Respect the rules of privacy and personal data protections from the design of the project, as defined in the RGPD;
  • Identify all protocols performed in data processing. These must be registered in a register for this purpose.
  • To be able to justify that the data is processed properly. To do this, employees can undergo certification training or sign codes of conduct;
  • Notify the CNIL and the persons concerned within 72 hours in case of personal data breaches;
  • Hold a privacy impact study if the treatments performed are at risk;
  • Ensure that users are well informed about the terms of use of their personal data, how long it will last, their rights and the remedies available in the event of a digression.

How do you set up data protection in data science?

To implement a data protection policy in data science, several steps must be followed.

List the data

The first step is to store all the data in a data repository called Data Lake. It is then necessary to bring them into compliance and sort each type of data to send them to the center of the company that requires them, such as the marketing division, human resources, commercial organization, etc. Tools called Data Discovery make this task less tedious.


Data anonymization

Once the Data Lake is mapped, you can reduce the risk of data leakage and illegal uses by making this data anonymous. You can use a proxy server or technical parsing. If your anonymization level is high and it does not trace back to the identity of the people, you can rule out the application of the RGPD. However, full anonymization involves extensive analysis of the data, as well as sorting out which data is most relevant to scientific data.

This requires a lot of human and financial resources. However, you can perform an intermediate process called pseudonymization. This allows the data to be fully encrypted while retaining the decryption key. However, with this process you will still be in the scope of the GDPR.

Define and enforce security measures

To complete a Data Science project that complies with data protection rules, you will need to define a data security strategy and strictly apply it. In a large-scale project, data can be exchanged, hosted and manipulated with processing tools located both domestically and abroad.

At each transaction, the risk of data leakage is present. It is therefore important to anticipate these risks before the project begins.

To do this, you can write down every step of the data processing and strategies for encrypting, decrypting and protecting data.

Define the modalities for the exercise of people's rights

The data handler will have to explain his information instructions by informing users of the use of Data Science and by describing the rights of individuals as well as possible remedies.

When a user requests access to their information, this means access to the data that has been collected, but also to the information calculated after the data-science work. The person will then have the right to correct and erase his raw data, but also his data after analysis. These rights are unconditional and a user does not need to justify his or her claim in order to succeed.

The General Data Protection Regulation provides users with some security. There are many ways to implement a data protection policy within a company, hence the interest of calling on Data Science experts like Ryax, and thus not to breach the RGPD.

The Ryax Team.