Scalability: the challenge of scale-up data projects

The majority of companies consider data integration and big data projects to be a priority. Future decisions will be "data-driven" or will not be. As data is constantly multiplying, thinking about scalability from the outset of a data processing project is essential. How to scale? How best to ensure this scalability? In this article, you will find a few elements of response on these critical questions for the benefit of executives or business leaders.

kid-scale-sky-1300px

The challenge of scalability

Scalability, scalability, scalability, scaling, these words represent the same concept: the ability for data processing systems to increase their processing capabilities as data increases.

The problem is known to many: the mega wave of data is overwhelming and there are only two options: get to surf it or get drowned. The International Data Corporation estimates that global data volume will reach 175 zettabytes by 2025. Take 175 bits and add 21 “zeros” behind, and you will reach the order of magnitude we are talking about. Visuals will cling to this figure: 175,000,000,000,000,000 bits of data. If in the middle of this data you need to locate the relevant elements and process them properly, it is in your best interest to design a powerful system today.

The era of big data is characterized by 3 “V”:

  • Data Volumes;
  • Data Velocity;
  • Data Variety.

Scalability therefore corresponds to the ability of systems to evolve in order to process data of a different nature in large quantities and in always faster. If latency is taking too much progress and some internal or external customers are complaining about the slowness of your systems, it's time to focus on the issue of scalability.

For the record, followers of cryptocurrencies are familiar with the problem. Indeed, the bitcoin blockchain for example has a design defect in terms of scalability which limits the speed of transactions. This vulnerability has led to the creation of many other cryptocurrencies, as users are not satisfied withs  the  lack of flexibility of the bitcoin.

Principles to be followed to ensure scalability

When setting up a data processing solution, there are some basic principles to ensure scalability.

  1. Consider that it is impossible to increase the resources allocated to data (budget, team, hardware, etc.) at the same rate as the volume of data increases. This would be like comparing the reproductive rate of a human being to that of a mouse. The system must therefore anticipate the exponential curve of the data;
  2. Define a clear and well-known data governance strategy within the company. This is a key step in ensuring that the targets are scaled up and achieved;
  3. Simplify as much as possible the solution used to allow scalability. Too often in business, systems have overlapped, and interfaces are multiple, sometimes redundant;
  4. To accept the fact that no system is infallible. It is important to define exit doors as well as alert systems to detect anomalies in real time. The testing phase should not be overlooked but has certain limitations that need to be considered;
  5. Try to automate as many functions as possible from the start to enable more efficient management in the future;
  6. Keep in mind the ultimate goal: to provide the right information to the right people in a timely manner. This is what data analysis is all about.
big-tree-roots-1300px

How do I achieve a scalable system?

There are generally two ways to ensure the scalability of a system:

  • Scaling up is vertical scaling. This approach involves increasing hardware performance by adding additional resources (CPUs, memory, or storage capacity) as the system saturates. The underlying principle is the addition of ever more efficient components. The vertical approach quickly shows its limits when the amount of data explodes and generates a  high cost. This approach is not very popular today, at least not as such;
  • Scaling out is horizontal scaling. Scaling is done by adding more machines. Processing capacity is increased by increasing the number of servers. The workload is therefore better distributed among more machines.

If you want to transport goods from point A to point B, you can change your truck model and buy an ever larger vehicle to increase your transport capacity. This approach will be interesting if your company grows slowly but will have several limitations if it grows fast. At some point, the size of your truck will prevent it from driving  properly  on the roads,  resulting in a loss of efficiency. On the other hand, if your truck breaks down, all the goods will not arrive safely. Finally, if the demand suddenly declines, your gigantic semi-trailer continues to carry a minimal charge and unnecessary costs result.

On the other hand, if you select a vehicle of the correct size, simple and robust, you just have to buy more trucks as your orders increase. If one of the trucks breaks down, you can use another truck, you will even be able to recover its parts. If demand decreases, simply resell a truck or rent it. If one truck is blocked, other trucks can collect the goods.

By scaling horizontally, you ensure your systems have greater agility and the flexibility to adapt the processing of your data to suit particular or impromptu circumstances.

Ryax and scalability

The open source engineering platform created by Ryax helps data teams deploy, execute and scale their production models. Aware of the stakes, scalability has been at the heart of our approach since the creation of our platform. We make sure you have the ability to process a growing data stream with minimal latency. If you want to learn more about the features of our software and its performance, please contact our teams. We will look for the right solution for your enterprise to face the challenge of scalability.

The Ryax Team.