The challenge of data storage in times of big data

Escrito por Susana Morcuende | 20-may-2020 11:15:07

The volume of data that moves in the world breaks records day by day. The Complete Forecast Update, 2017–2022 of Cisco Systems predicts that in 2022 the registered web traffic around the planet will reach 4.8 zetabytes per year, which means going from 122 exabytes per month in 2017 to 396 exabytes in just two years. Is it not surprising that, according to Statista, the market value of big data in the world will multiply by two in just seven years: from $55 billion in 2020 to $103 billion in 2027. A challenge for solutions storage systems, which are forced to adapt to forced marches to respond to the demand of organizations.

What storage needs does big data present?

Data storage technology for big data must be prepared not only to house a large amount of information, but also to respond to needs such as these:

It must be dimensioned according to the demands of the present, but it can be modulated with respect to what may happen in the future, in the face of larger and more heterogeneous data volumes.
Understanding as such the compatibility with structured and unstructured data.
It must offer a quick response, with a very low latency, to any request.
Cyber-threat proof, actual or potential.
Ease of access. In big companies, big data offers cross-cutting benefits to very diverse departments. Therefore, to take advantage of it, it is essential to decentralize the information consultation protocols and that these can be carried out using simple commands.

Many organizations have found a satisfactory solution to all of these big data requirements with the implementation of tiered data warehousing.

What is tiered storage?

Tiered data storage is based on segmenting information according to its importance, so that the most valuable records are placed in safer, more stable locations with high processing capacity; and those of lower value are confined in layers that are more difficult to access and with less benefits. This allows companies to save costs, gain profitability and optimize the computing resources available in big data management.

Defining the data storage strategy is one of the responsibilities of the Chief Information Officer (CIO). Broadly, CIOs that opt for tiered storage tend to:

Combine different types of storage. For example, the cloud or on-premises servers for vital data; and hard drives or external memories for less useful records.
Prioritize those data that need to be consulted frequently, moving the rest to long-term storage environments. In this way, space is freed up and the activity of the most efficient units is facilitated.

Data storage technology facing the challenge of big data

Depending on the characteristics of each company, CIOs can resort to different data storage technologies that adapt to the demands of big data:

Data lakes. The best option to break the silos structures that prevent information from reaching all nodes of the organization. They are repositories that support structured and unstructured data from very diverse and raw sources, without the need for prior treatment before inclusion.
Edge computing. Faced with the tendency to centrally collect and exploit the data collected by thousands of scattered sensors, the Internet of Things opens the door to edge computing, that is, to the storage and processing of information taking place near the collection point, thus reducing latency in decision-making and avoiding overloading the cloud.
Hybrid Cloud. It is based on taking advantage of combining the use of a public cloud, such as Amazon Web Services or Microsoft Azure, and that of a private cloud, with a configuration tailored for members of the organization.
It is committed to the complementary contracting of data storage services in the cloud from various providers.

The handling of large volumes of data in big data also requires the use of work environments that allow them to be managed, consulted and organized. One of the most popular is Hadoop, an open source project that has a powerful distributed file storage system in its Hadoop Distributed File System (HDFS). HDFS divides the information to be saved into blocks (usually 128 or 256 MB each) and places them in different nodes that make up a cluster, replicating them, in turn, in several of them to minimize the risk of loss.

Ver post completo