Labelled ten years ago as the sexiest job of the 21st century, the job of data scientist attracts. Beyond immediate job prospects, data scientists enjoy high salaries and are highly courted. The contours of this relatively young profession are still unclear. What makes a good data scientist? What computer languages do you need to master? What soft skills do recruiters and companies look for? This article summarises the basic skills of a good data scientist.
Data scientist, a job with a future
The demand for data scientists continues unabated. It is currently growing by almost 40% per year. The median salary of a data scientist in the United States is $125,000 a year and the potential for growth is high. In France, a beginning data scientist can expect to earn around 45,000 euros per year.
With such prospects, the profession of data scientist is attractive. The typical candidate is often well trained. There are many data scientists with five years of higher education or even a PhD. Most aspirants also undergo additional training to specialise in the field. Although there is no shortage of jobs, competition is fierce when it comes to courting the best employers. Among the most sought-after skills, we can mention the following:
- Statistics and mathematics, mastery of the language R;
- Analytical thinking, problem-solving ability and willingness;
- Mastery of Python. Other computer languages are additional assets;
- Knowledge of distributed computing solutions such as Hadoop and Spark;
- Excellent communication skills and mastery of data visualization tools;
- Knowledge in the fields of Machine Learning and Artificial Intelligence.
We go into more detail about these different skills in the following paragraphs. However, it should be kept in mind that the job remains fluid and sometimes difficult to circumscribe. Considerable efforts have been made in this direction in recent years to define what constitutes the essence of a data scientist. The IBM company, aware of the ambient vagueness, has worked for a long time on a competency model for data scientists. This model has been validated by the American Department of Labour.
According to this competency model, the job of data scientist would be a multidisciplinary job at the crossroads of statistics, computer programming and specific expertise related to visualisation and problem solving.
Intellectual curiosity and willingness to solve problems
This is the very foundation of the data scientist's profession.
The data scientist must be able to :
- Identify and characterize a particular problem;
- Formulate hypotheses and understand their limits;
- Deploy a method for analysing and solving the problem; and
- Plan the execution of the solution.
An analytical mind is therefore essential to successfully carry out its mission.
Statistics and mathematics
Statistical concepts are fundamental to understanding the mechanisms governing data processing. A solid mathematical basis will also make it easier to grasp the underlying concepts. Many profiles of data scientists come from mathematical or statistical backgrounds.
Computer languages of the data scientist
A data scientist must master certain analytical data manipulation tools such as R or SAS. The R language, which enables statistical and mathematical problems to be solved, is quickly proving to be indispensable in the field of data sciences.
The Python language is also indispensable for data scientists. C++ or Java databases are strongly recommended. As a general rule, mastery of additional computer languages is an additional asset.
Finally, SQL (Structured Query Language) is hard to miss when working as a data scientist. This language, which is specific to data processing, will help to optimise the use of data.
Data processing, the essential frameworks for data scientists
The Hadoop platform remains omnipresent in the world of data. For data scientists, a perfect knowledge of Hadoop is not always required but a certain familiarity will be indispensable. Mastery of a tool like Spark is obviously a plus. In recent years, the mastery of Apache Spark seems to be gaining preference among recruiters. More and more models require real-time data processing which Hadoop cannot perform. In any case, understanding these two frameworks, their uses and their differences is imperative.
Communication skills and mastery of data visualisation tools
If the data scientist fails to transform his conclusions into concrete action, he has failed. A fundamental part of his task is therefore to communicate with non-specialists and to convince. This requires leadership.
Therefore, mastery of visualisation tools that allow complex data to be transformed into simple elements is absolutely essential.
Machine Learning and Artificial Intelligence
Concepts in the fields of Machine Learning or Artificial Intelligence (AI) are valuable aids. This area of expertise is increasingly valued by companies as these segments are growing rapidly.
According to the 2020 Dice Tech Job report, 68% of job offers for data scientists currently require machine learning skills.
With the mastery of Python, R and Apache Spark, Machine Learning therefore becomes one of the four key competencies of the data scientist profile.
The data scientist, a superman?
In view of this impressive list of competences, one may indeed wonder whether the data scientist does not come from another galaxy. In fact, few data scientists possess all of these skills and it is sometimes better to specialise in one or the other direction.
A good data scientist is above all someone who thinks in an articulated way. At the beginning of a problem, he is able to visualise how to reach the solution. The data will correspond to millions of notes forming a score and the data scientist will be able to draw the right harmony from them.
Thanks to its data processing platform, Ryax helps data scientists to put their models into production and thus achieve concrete results. To find out more, consult our product sheet or contact the Ryax Team.
La Ryax Team.