The enormous quantity and variety of data available to us today thanks to new technologies, among which the Big Data; has revolutionized worldwide. All this has led to the emergence of new challenges related to the handling of all this information.
One of the big questions asked by those in charge of handling this data is the best way to represent them graphically, so that they are easily interpretable. This is where Data Visualization comes into play, which encompasses the creation and study of graphical data representations.
To carry out these representations, Data Visualization uses tools such as statistical graphs, maps or schematic representations of various kinds.
We can say that the concept of Data Visualization has centuries of history. The publication of data in graphic form to facilitate its understanding goes back to the XVII century, acquiring a great relevance during the Napoleonic invasion of Russia in 1812, when Charles Minard drew a map with the size of the French army and the way of withdrawal from it, associating it with variables such as temperature and weather.
Importance and relevance of data visualization
Over the years and the birth and expansion of information technology, the collection, storage, management and interpretation of data has evolved to change our way of seeing the world. Data Visualization is currently one of the pillars of Business Intelligence and is used in practically all sectors of society. Thanks to it, we can identify areas that need special attention, determine factors that justify some events, predict behaviors of various kinds, etc.
Some classical concepts are changing to adapt to a data-based world. While traditional training is based on the separation of the technical treatment of information and the methods of visual narration of them, Data Visualization is in an intermediate position of both.
For an appropriate representation of data, in addition to being clear about where and how to show them, it is necessary to analyze factors such as simplicity, comparability, diversity and the causes that justify it.
It is important to show the data with only the variables that are necessary for our purpose, in this way they will be more easily interpretable and comparable at first sight. In addition, the same information can be expressed graphically in several ways, all of them valid. Thanks to this representative diversity it can be easier to reach conclusions. Obviously, none of this makes sense if we do not previously establish the reason why we are visualizing the data and what we intend to obtain from it.
There is an important variety of ways to represent data. The most basic are those that we are taught from small (bar graphs, lines, sectors, etc). These are very useful for structured data, which are those that can be easily collected in the table of a database.
However, when dealing with unstructured data, such as images, videos or text files, it is necessary to use another type of representation. Some of the most common for these situations are relational graphs, word clouds, heat maps, cartographic images or scatter diagrams.
The influence that Data Visualization has from the psychological point of view on the target audience has been verified. This influence is stronger when the viewer is not previously conditioned.
Recommendations for data visualization
When it comes to expressing a large number of data graphically, it is advisable to follow a series of guidelines that facilitate this task.
Depending on what you intend to show, one type of graph will be more appropriate than another. For example, bar graphs are useful for comparisons, line graphs to show trends, and pie graphs are not appropriate if differences between sectors are not large.
It is recommended not to truncate the coordinate axes, as this may lead to erroneous interpretations of the results. Use only the colors that are necessary to give information, because too many can lead to confusion. As with colors, we must ensure that the graphics look as clean as possible, without unnecessary additions. Finally, everything that appears on the graph must be labeled.
Currently, it is estimated that the sector related to data has a value of about 4 billion dollars. However, due to the great growth in which it is immersed, it is expected that by 2022 its value will be around 7 trillion dollars.
With the emergence of Big Data, one of the great challenges of the future is the analysis and capture of large volumes of data in real time. In addition, the amount of data received is such that one of the current open fronts is the use of data reduction techniques that allow operating a more manageable number of them. For this purpose, techniques such as sampling, filtering and aggregation of data are being developed.
As a result of these reduction techniques, visualization tools have been developed. Some of the most prominent are Data Cube & Nanocube, which are multidimensional extensions of two-dimensional tables; On the other hand, there is inMens, which groups variables according to defined schemes; ScalaR is also able to apply data reduction to represent them graphically in the most optimal way possible.
The fastest and most efficient way to appreciate results is to do it graphically. To do this we use the techniques of Data Visualization, which is based on a wide range of options to represent data.
With Data Visualization we can make use of classic graphics representations, as well as others that have been gaining importance from a more recent past. For the creation of these representations it is necessary to process increasing amounts of data, for which it is increasingly necessary to develop techniques that facilitate this process.
Some experts say that the data is the new oil. This affirmation makes us reflect on the relevance they have in an increasingly digitalized society, in which almost everything can be measured and interpreted. Reflection that is widely supported by the economic figures of the sector.