Visualizations-what are they and what do we use them for?
Visualizations are a key part of many walks of life and my quest in learning Data Science is no exception. We?ve got numbers, we?ve got other numbers and we?ve got more numbers. But reading them off in a list rarely does them justice. It often far more helpful and much faster to display them visually. One of the main types of data visualizations is the line-plot.
What is a line-plot?
According to the wiki,
?A line chart or line plot or line graph is a type of chart which displays information as a series of data points called ?markers? connected by straight line segments. It is a basic type of chart common in many fields?
Well okay, that sounds pretty fancy. What does this fancy graph look like?
**google image search ?line plot? ? first result
Hmm. It has ?markers? and we could say the points are ?connected by straight line segments?. But this is more of a histogram functionally, assessing counts of values in one dimension. What about that line-chart in the wiki definition?
**google image search ?line chart? ? first result**
Now that certainly appears more ?Data Science?-ey, doesn?t it? What?s the difference? This line chart is doing a lot. If we look at just one line(let?s start with the blue one), its depicting 6 values for population of bears from six years. The markers represent the individual data points, and the lines connecting them give the viewer a sense of rate of change between points. So that?s one extra dimension numerically(measures different years on one axis, and different populations on the other). This graph also depicts not one lines worth of data but three! Different colored lines portray the population over time for multiple species at once. So it is kind of four graphs in one: a graph of the population trend for each of the three species, as well as a graph comparing those three graphs.
When is this visualization effective?
This graph of wildlife population example is a good example of the ?wheel-house? for line charts. The lines depicting rate of change make it very easy for the human eye to compare rates of change over time or, trends. These time series analyses are also useful when comparing trends between categories, such as the populations of different species, or social media traffic.
A comparison of trends
Where might line charts fall short?
Line chart shows every point and as such is subject to statistical noise. If you were aiming to display an overall rate of change, a single linear regression plot would be easier to understand. The human eye may be distracted by different slopes in a line chart. Also, comparing multiple slopes does not give a clear sense of their combined trend.
Where do line charts excel?
Line charts are perfect for time series analysis. They are commonly used for anything dollar related. Its literally the first thing I think of when I picture my savings. It is also a great visual example of the stock-prices and they are great tool for analyzing different trends all over the social media world.
Any change in a feature over time will be ripe for a line chart. Think political approval ratings, price of milk & bread, crime rates, population, greenhouse gas levels, heart rate, revenue historical analysis and so on.
Weaknesses with comparison to other graphs:
A good visualization is like communicating, and to be effective, it should be clear and concise. The wildlife graph is able to display a lot of information quickly by adding more lines with different colors. This allows it to compare categories. The viewer gets a sense of the relative trends for each species. But while you can compare differences between categorical data by adding other lines with different colors, bar charts are often shorter and more to the point for categorical topics.
On a similar note, you could technically get a sense of the spread of data with a line chart, but it would probably feel messy, and would not be your best tool. For getting a sense of the distribution of data no graph will ever beat a histogram. Line charts are pretty flexible and can do many nice things, but histograms are sweeter and quicker to the punch.
How important is speed and readability? Well it depends on the task for the visualization. A graph could be used for a quick dip into EDA or as evidence in a pivotal presentation. If you are exploring collection of data, a simpler less complicated graph will be good to maintain workflow(histograms for exapmle). I haven?t found myself line plotting many variables. Once you start forming hypotheses and using data for evidence however, we can delve into the more complicated line chart and get more bang for our buck. Try making your own line chart to depict something in your world that changes over time. Maybe your trivia team vs your rivals!