Talks Tech #42: Principles of Good Data Visualization
Over the years, I've learned some key design principles that support me in creating impactful visualizations. I love data visualization and design. What does good data viz look like? Take a moment to think back to some data visualizations you've seen or maybe ones you've made. What do you think makes them good? Why does good data viz matter? These aren't simple questions with straightforward answers. Good data viz presents accurate information in context. It considers the audience and presents information in an attractive way while being clear and efficient. It involves creating a meaningful design that enables understanding of the data. Each element of the design should serve a purpose and distractions should be removed. Our overall goal with good data viz is greater clarity and simplification.
When our audience sees a complicated design, they automatically assume that it's going to be challenging to understand. They're less likely to engage with it. On the other hand, when they see a clean, simple, beautiful design, they assume it will be easy to understand and they're more likely to want to engage. When we create data visualizations that make our audience feel at ease as they read them, they will more quickly understand the key message. They're more likely to take the desired action we recommend. When we're intentional with our design choices, the audience notices a difference. They are more likely to want to engage with something perceived as well-designed. Fonts, color choices, the type of chart, the titles, grid lines, and alignment are all design choices that have an impact on how our audience experiences the data viz and whether or not they understand our key message. Good data viz matters because it makes our audience engage with the data, and it helps us more effectively communicate the information.
One of the ways to do this is by using the design principles that we will cover. Pre-attentive attributes are visual encodings that our brains process without our conscious awareness. These help focus the audience's attention. Gestalt's principles of visual perception were established by the Gestalt school of psychology in the early 1900s based on research on how people interact with and create order from visual stimuli. These reduce clutter and the cognitive load for the audience. They also help us understand how the audience will interpret and make meaning from the data viz. We'll discuss some basic color principles because when we use color consistently and according to best practices, it makes it easier for the audience to understand and interpret a visualization. By understanding these three design topics we can create effective data visualizations that are easy for the audience to understand and engage with. The goal of using these concepts is to reduce the cognitive load of the data viz, which means that the viewer doesn't have to work so hard to understand it.
Knowing design principles in relation to data visualization makes creating charts and graphs easier. You have guidelines to fall back on that are proven to help draw attention to the main point you're trying to make with your data. The design principles we'll discuss are based on how the human brain processes and organizes information. Taking advantage of these automatic rapid processing steps, in the brain we can create data visualizations that immediately provide key information and takeaways. These principles also apply in any tool that you use to create data visualizations, you don't have to be able to code or use a specific software to make use of these principles. Design can be used to make good data viz regardless of whether you use Excel, Looker Studio, R, Python, Tableau, Adobe, or some other tool to create your graphs and charts.
Pre-attentive attributes can be used to help people understand data, focus the viewer's attention and create a visual hierarchy. They can also be used to put emphasis on the most important pieces of data by providing visual cues that will automatically draw the viewer's attention. Pre-attentive attributes help us enable our audience to see what we want them to see, before they even know that they're seeing it. There are 11 pre-attentive attributes that apply to static data visualizations: Length, width, size, shape, orientation, curvature, added parks, enclosure, intensity, hue and position. For each of these attributes, if you were looking at a visualization, you'd immediately see the element that is different. It could be longer than the other elements, a different shape, oriented differently, have an extra mark or it's a different color. Your eye is drawn to it, and you don't have to search for it, because our brains are hard-wired to quickly pick up differences we see in our environments. Pre-attentive attributes can also be used in text. If you're reading a paragraph and you see a bold word, that's going to stand out to you, that's a pre-attentive attribute.
One of the pre-attentive attributes is length, which means that an object of a different length stands out. We usually see this in column charts and bar charts, where the part of the chart that is shortest or the longest is going to stand out to us. Orientation is another pre-attentive attribute. This means that an object that's a different orientation stands out, usually, we see this in line charts, or if we're looking at a column and bar chart together. This means that when the slope changes, we notice a difference. Added marks are another pre-attentive attribute. If we put an extra line on a graph or have a line graph itself and we put some points on top of part of the line, that's going to make part of the line stand out. This happens in scatter plots when we're using a different symbol for part of the graph. It can also be used in a line graph if we put points on part of the line. One that we see quite frequently is intensity, so an object with a different intensity or saturation stands out. That means that an object or part of the graph that has a different color is going to stand out.
The Gestalt principles of visual perception, as I mentioned earlier, were established by the Gestalt school of psychology in the early 1900s, based on research on how people interact with and create order from visual stimuli. So the psychologists involved with these research theories that we tend to group elements, look for patterns and reduce complex midges to their simplest forms. And we can use these principles to design data visualizations that will help the audience or the viewer more easily understand them. When we implement these principles in data viz, we can reduce the cognitive load or the amount of processing the viewer must take on to understand the data. The principles are based on how people naturally interpret and process visual elements. One thing to note about these gestalt principles and how we view data in general is that we're always looking for structure and relationships. It's important to keep this in mind as we design data visualizations because our viewers will look for structure and relationships, whether or not they're actually there.
There's interpretation involved when viewing a data viz that extends beyond our immediate perceptions. It's important to understand the pre-attentive attributes and the gestalt principles, because they influence a person's initial engagement with the data viz. We can use these principles to highlight important information, draw attention to certain areas of the visualization and create a hierarchy of visual elements. Using these principles, we can also help produce the cognitive load for processing a data viz. There are six gestalt principles often seen in data viz: Proximity, similarity, enclosure, closure, continuity and connection. All principals are about how we group things together, how we make meaning from visual stimuli. Proximity is elements that are close together or thought of as belonging to a group. This can be used in creating a table because we could add more space between rows or more space between columns to help the viewer more easily see the table and connect parts of it together. This principle also applies in scatter plots because we're going to naturally find groups of data that are close together. Those groupings may or may not be meaningful, but our brain is going to interpret them as groups nonetheless.
Another gestalt principle is similarity. This means that elements that look similar, are thought of as belonging to a group. They could be similar color, a similar shape or a similar size, this again, could be used with a table designed by shading alternate rows to help our audience connect all of the data in that row together. It can also be used in plotting data by making data points of different categories, different colors or different shapes, to help our audience associate all of the points of one category together. The enclosure is another gestalt principle where elements that are physically enclosed or thought of as belonging to a group. Enclosure is also a pre-attentive attribute, which means that when we see part of a graph enclosed, whether that's shaded or a box is drawn around it, our attention is going to be immediately drawn to that area of the graph. Enclosure can be used to frame or shade a section of a table or a graph to draw attention to it or to help the viewer group the enclosed elements together. This is particularly useful when we want to draw a distinction in the data.
Color can be used to draw attention and highlight information and group data points together as it's part of the pre-attentive attributes and gestalt principles. There are some more specific uses with color in data viz. When used in a data visualization or a dashboard, we want to make sure that color has a clear purpose and it's used consistently, and is accessible. If one color is used to indicate a specific category, we want to use that same color consistently each time that category is included. If one color is used to highlight an important data point, we want to use that for highlighting across visualizations in a dashboard or a report or presentation. This consistency helps the viewer make meaning from the use of color and makes it easier for them to understand the important information. This also means that we don't want to reuse colors on a single dashboard if they aren't representing the same categories. Color should be used with a purpose and not arbitrarily added to a data viz, because using too many colors actually defeats the purpose of associating numbers with colors.
Research shows that most people's short-term memory will only retain up to five pieces of information at one time. The more colors you use to represent your data, the harder it becomes to read it quickly. If you need more than five colors in a chart, you might consider using an alternate chart type. The main ways that color is used in data viz is to highlight the data point or bring the viewer's attention immediately to it. It can also be used to designate different categories or labels. Then it helps the user differentiate between different groups of data. The first one is more of the pre-attentive attribute, this second one is more using the gestalt principles. Color can also be applied to quantitative values either in a sequential or a diverging manner. Sequential color is used to show low to high values on a continuous scale, and divergent color is used to show values above or below a midpoint on a continuous scale. We want to remember that we should only add color when it has meaning and helps with understanding. Color should be used consistently and sparingly.
If we were to use red to visualize an increase, that actually increases the audience's cognitive load or the amount of work they have to do to understand the visualization. It goes against the pre-existing association of the color red with negative numbers, and it makes it harder for them to understand the graph. Another meaning that we perceive in color is that darker means greater or more. When highlighting information, it works best to use a darker color for the important piece and a lighter color for the other pieces. This also applies to sequential and divergent color radiance. You always want to use the darkest color for the largest value and the lightest color for the smallest value. One other important factor to consider when using color and visualization is color blindness and accessibility. One example of making sure that your color use is accessible, you should generally avoid using red and green together without providing other visual cues such as down and up arrows or minus and plus signs. If you want to use red and green to highlight negative and positive change, you could also just highlight the negative change with red and leave the positive change in a default font color other than green.
There are several different online checkers that you can run color palettes through or upload an image of your graph to check that the colors are accessible for different variations of color blindness. Another way to test your final visualization for effective use of color is to try looking away or closing your eyes and then looking back at the visualization you've created to see where your eyes are drawn. You could also ask a colleague to look at a visualization and tell you what they notice and where their eyes go first. That will help you determine if the color used in your visualization is drawing the user's attention to the area you intended or if it is distracting and focusing their attention elsewhere.
Who is your audience for your visualization? What questions do they have? What do they want to know and what do you want them to know? What is their familiarity with the data? Are there key metrics they are looking for? Once you explore your data, keep an eye out for specific messages you find and ask yourself, is there a specific action you want the audience to take after seeing the data? You'll also want to think about the data literacy of your audience. Are they comfortable looking at data? If not, you'll need to spend some time orienting them to the data first. Their comfort with data might also influence your chart choices. The second step is to explore the data for yourself. See what types of data are available and what questions you could answer with the data. Based on the questions you want to answer and the type of data you have, you can pick a couple of different chart types to experiment with.
The third step is to test out different graphs and designs. Here you're going to use the pre-attentive attributes, gestalt principles and colored best practices to draw the audience's attention to the key message. Test out a couple of variations of the same message by using different chart types or applying different principles. Ask a friend or colleague to look at one or two of your visualizations and tell you what they notice first and what message they got from it. If their takeaways don't match what you intended, then make some tweaks to your design. If you don't have someone else to show your data viz to, you can put a graph or chart aside for a little while and then look back at it and pay attention to where your eyes are drawn first, and is that where you want your audience to look first. And the first step is to refine and finalize your visualizations. So you'll take the feedback from your testing and implement it to improve your data viz. You also want to do some final checks based off of the principles and concepts that I shared today to make sure that you don't have any extra elements that aren't adding value to your visualization and message. Then you're ready to present your data viz. Creating data visualizations is an iterative process and you'll constantly learn new things. When you present data, pay attention to how it's perceived and the messages your audience takes away. If that doesn't align with what you intended, think about what you can adjust next time. qqq