Good data visualization is the key to explaining things to stakeholders in any organization. Properly utilizing data and presenting it in an understandable manner can lead to significant decisive actions and policy changes inside the organization. Visualizations are evaluated using different metrics. One such metric is the Data ink ratio. In this article, we will discuss the data-ink ratio with an example.
What is The Data-ink Ratio?
The data-ink ratio is a metric introduced by Edward Tufte. Edward Tufte has defined the data-ink ratio as the amount of data-ink divided by the total ink required to print the graphic. In other words, the data-ink ratio is the ratio of elements in a visual representation conveying information to the total elements in the image.
We can also define the data-ink ratio as the proportion of the visualization that can be erased from the visualization without losing any information.
Data ink ratio can be used to make sure that all the elements displayed in a visualization are relevant to the information being displayed and that no element in the chart be redundant.
Having only elements necessary for conveying information makes the information easily consumable and understandable. The elements selected to represent the information also depend on the audience to which the information is being conveyed. If you are presenting information to stakeholders that are not used to dealing with data visualizations, you can use lesser elements to convey the information so that no confusion occurs. Having lesser elements in the visualization also helps the viewers consume information in less time. Similarly, you will also save time while creating the visualization.
Suggested Reading: You can read this article on visualization wheel by Alberto Cairo that discusses different components of visualization and which component should you use in a given situation.
Why Data-ink Ratio Becomes Low in a Visualization?
There are various reasons why the data-ink ratio decreases in data visualization.
- The use of 3-D effects and shadow effects in a data visualization doesn’t add any extra information. Hence, it decreases the data-ink ratio.
- The use of background images can also be unnecessary in the visualizations and may decrease the data-ink ratio.
- Unnecessary borders and grid lines don’t convey any information to the user. Most of the time, grids and borders are redundant and decrease the data-ink ratio.
- Adding redundant legends, bold labels, and other decorative elements also reduces the data-ink ratio.
How to Maximize the Data-ink Ratio?
To maximize the data-ink ratio, Edward Tufte suggested two principles to erase redundant elements from data visualization.
- Erase non-data ink within reason: Elements like 3-D effects, grids, annotations, colors, and borders that don’t add any information should be deleted from the visualization.
- Erase redundant data ink within reason: In a chart, there can be different elements that convey the same information. In such a case, we can remove redundant elements that don’t add any unique information to the visualization. The elements that often fall in this category are legends, labels, and information unrelated to the visualization. For example, you can add labels to a bar chart and legends to the chart simultaneously. In such a case, the legends become redundant. It has been explained in the example in the next section.
Five Laws of Data Ink
Edward Tufte has stated five laws of data ink for representing data in visualization as given below.
- Above all else, show the data: Keep in mind that we need to show the data to the viewer. Hence, we should show all the relevant data in the chart. The data should be the number one priority.
- Maximize the data-ink ratio: While presenting the data, we should focus on maximizing the data-ink ratio.
- Erase non-data ink: To increase the data-ink ratio, we should erase all the elements of the visualization that don’t contribute any information.
- Erase redundant data ink: We should also delete the elements that show redundant data from the visualization.
- Revise and edit: While creating any visualization, we should critically evaluate it and make sure that we have proper elements in the visualization and that there is no redundancy.
Data ink Ratio Maximization Example
Now that we have discussed the basics of the data-ink ratio, let us look at an example of how to maximize the data-ink ratio in a visualization. This example has been taken from the Applied Plotting, Charting & Data Representation in Python by the University of Michigan at Coursera.
Following is an image containing the bar chart for calories per kilogram for different foods. You can observe that the image has many decorative elements. We will use different steps to reduce the elements in the visualization to maximize the data-ink ratio.
First of all, the background color of the bar chart adds no value to the chart. Similarly, the gray background of the bar chart provides no information about the data. Hence, we will remove the background colors from the image. After processing the image, it looks as follows.
You can observe that there are many redundant labels in the image. For example, we have all the bars in the bar chart labeled. So, we will remove the legend. Similarly, the bars themselves represent different foods. So, we don’t need to explicitly label the x-axis. Hence, we will remove the label “Type of Food” from the x-axis. We will remove the label on the y-axis too as it adds no information. As a result, we get the following image.
Borders in the image also don’t provide any information. Hence, we will remove the borders too. You can observe that the resultant image still provides the same information as the initial image. However, it uses very few elements to represent the data. Hence, we are gradually increasing the data-ink ratio.
In the above image, all the bar charts have been labeled explicitly. Hence, the colors given to the bars provide no information. We can reduce the colors in this case. The choice of colors is a tricky issue. If we are presenting to people who are color-blind then the choice of colors becomes really important. We can highlight certain bars with colors to emphasize a certain point. But, keeping the colors in minimum quantity does the task. For instance, we can highlight the bar representing Bacon and keep the rest of the bars without color as shown below.
At this point, you might think that the data-ink ratio has been maximized. However, we can still perform many optimizations. For instance, we can remove the 3-D effects from the bars as it provides no information. Similarly, we can also drop shadows of the bars from the visualization. The resultant image is shown below.
There are grids present in the chart. Grids can be valuable to explain data points. However, most of the time, these are useless and add no value to the chart. For instance, You cannot exactly determine the calories per 100 grams for potato chips or Chili dogs using the grids. Hence, the grids are of no use in this case. Instead of using the grids, we can directly label the bar charts.
After labeling the bars directly, we can also drop the y-axis as it is of no use to us. To further increase the data-ink ratio, we can lighten the labels too. As a result, we will get the following image.
The above image is a minimal version of the first image in terms of the elements in the visualization. However, the bar chart gives the same information as the first image. Hence, we can say that the final image has more data-ink ratio than the first image.
Data-ink ratio is certainly an important metric to watch for while creating any visualization. You should focus on presenting the data in an efficient manner. Adding any decorative element to the visualization should be considered carefully. If adding any element to the visualization doesn’t improve the understanding of the data, we should try to avoid adding the element to the visualization.
Finally, any metric isn’t absolute. You can review the visualization or ask one of your friends or colleagues to read the data. If they are able to understand it, there is no need to add anything else to the visualization. Otherwise, you should iteratively revise the visualization to increase the understandability while maintaining the maximum data-ink ratio.
To learn more about data analysis, you can read this article on data cleaning. You might also like this article on data analyst vs data scientist.
To read about other computer science topics, you can read this article on dynamic role-based authorization using ASP.net. You can also read this article on user activity logging using Asp.net.
Disclosure of Material Connection: Some of the links in the post above are “affiliate links.” This means if you click on the link and purchase the item, I will receive an affiliate commission. Regardless, I only recommend products or services I use personally and believe will add value to my readers.