Do’s and Don’ts for Data Visualization
Making data visualizations is easy — but making effective visualizations is what will separate you.
Making data visualizations is an essential skill to have in the quest to becoming a well-rounded data scientist. With so many different types and styles to choose from, it can be easy sometimes to try and do too much when creating your visualizations. Unlike a lot of tasks in data science where you know your code will lead to a correct answer, there is no real “right” or “wrong” answer on how to display your data. But, with that being said, there are definitely some good habits you’ll want to develop in data visualization, along with some habits that you’ll want to avoid. In this post, I will go over some of the good and bad habits that I’ve learned and feel are the most important to share!
Do #1 — Use Preattentive Attributes
So, first question… what exactly is a preattentive attribute? According to datasavvy.me, “Preattentive attributes are visual properties that we notice without using conscious effort to do so. Preattentive processes take place within 200ms after exposure to a visual stimulus, and do not require sequential search.” Basically, these are things that our eyes are immediately drawn to when we see them — such as the bold text used above, or when something is written in italics. Using preattentive attributes makes whatever you are trying to point out in your data clear and obvious to whoever it is being presented to. Below are some examples of what preattentive attributes can look like.
Do #2 — Choose the Right Visualization
I know, duh — this sounds like common sense, and that’s because it is. That reason is exactly why I’m pointing this out as a Do. Like I mentioned earlier in the intro, sometimes in the attempt to make ourselves stand out, we may try and make some big, fancy visualization that is too busy and convoluted — or you use the wrong type of visualization entirely. It is much better to have a simple, plain visualization that is the right type and portrays your conclusions accurately than to have a groundbreaking, beautiful one that isn’t accurate. If you like your your viz but feel you just need to spice it up, utilize libraries like seaborn to make it pop off the screen.
Don’t #1 — Don’t Get Your Numbers Wrong
This is another obvious one, but it has to be mentioned. Let me make this very clear — Double.Check.Your.Math. If you are making a big-time presentation to some big-time people, you do not want to be the guy that thinks 35+35=60. That, more than anything, can blackmark you in all sorts of ways, whether it be labeling you as someone who doesn’t have attention to detail, someone who rushes through projects, or someone who is lazy. All of these are qualities that do not — nor should be — affiliated with professional data scientists.
Don’t #2 — Don’t Use 3-Dimensional Data Visualizations
More specifically, you don’t want to do this when presenting your findings to an audience. It’s possible that a 3D visualization can, maybe, be helpful for you as the data scientist in some form of EDA, and that’s really only if you can interpret it. Personally, I can definitely admit that reading 3D visualizations can be incredibly challenging, and that there are better, simpler ways to explore your data and understand what you’re working with. But under no circumstance should you use 3D visualizations when presenting to an audience. Take a look below:
For someone who has no experience working in data science — which is very much possible, if not likely, when you’re giving a non-technical presentation—reading something like the graph above is going to be confusing and frustrating, no matter how well you may explain it to them. Stick to 2D visualizations and make everything as clear as possible on the screen.
Dont #3 — NO PIE CHARTS
This is not a personal vendetta against pie charts. They’re just not the best tool to use when trying to display your results. According to geckoboard.com, “The basic premise is that pie charts are poor at communicating data. They take up more space and are harder to read than the alternatives. The brain’s not very good at comparing the size of angles and because there’s no scale, reading accurate values is difficult.” The common theme with each of the three dont’s is that they all involve having unclear, busy, or complicated visuals. Pie charts are the symbol of this. There are many better ways and methods to visually explain your data, such as scatter plots, histograms, bar graphs, or line graphs, that more clearly and more obviously show the differences in your data. So when the thought crosses your mind on if you should use a pie chart, just remember the following visual:
Thank you for reading! I hope this helps you in building solid data viz habits or breaking some bad ones.