The volume of data that moves in the world breaks records day by day. The Complete Forecast Update, 2017–2022 of Cisco Systems predicts that in 2022 the registered web traffic around the planet will reach 4.8 zetabytes per year, which means going from 122 exabytes per month in 2017 to 396 exabytes in just two years. Is it not surprising that, according to Statista, the market value of big data in the world will multiply by two in just seven years: from $55 billion in 2020 to $103 billion in 2027. A challenge for solutions storage systems, which are forced to adapt to forced marches to respond to the demand of organizations.
Data storage technology for big data must be prepared not only to house a large amount of information, but also to respond to needs such as these:
Many organizations have found a satisfactory solution to all of these big data requirements with the implementation of tiered data warehousing.
Tiered data storage is based on segmenting information according to its importance, so that the most valuable records are placed in safer, more stable locations with high processing capacity; and those of lower value are confined in layers that are more difficult to access and with less benefits. This allows companies to save costs, gain profitability and optimize the computing resources available in big data management.
Defining the data storage strategy is one of the responsibilities of the Chief Information Officer (CIO). Broadly, CIOs that opt for tiered storage tend to:
Depending on the characteristics of each company, CIOs can resort to different data storage technologies that adapt to the demands of big data:
The handling of large volumes of data in big data also requires the use of work environments that allow them to be managed, consulted and organized. One of the most popular is Hadoop, an open source project that has a powerful distributed file storage system in its Hadoop Distributed File System (HDFS). HDFS divides the information to be saved into blocks (usually 128 or 256 MB each) and places them in different nodes that make up a cluster, replicating them, in turn, in several of them to minimize the risk of loss.