Visualizing and displaying complex data is hard. Understanding complex data is harder. Rapidly making operational decisions based upon complex data is extremely hard.
Historically, operational security analysts rely on alerts, tables, and charts on dashboards or in email to pull potentially useful information out of the vast sea of data dumping into their analytic systems. This has always been problematic due to the combination of false positives and understanding the context of data filtered through the human brain. Most of the standard methodologies for displaying complex information make it harder, not easier, for humans to understand the information they seek in a timely and operationally useful manner.
Everyone has seen dashboards with a wall of text in tables interspersed with colorful, many-fruited pie charts nesting near clusters of yummy looking fruit pops of varying lengths or heights in bar and column charts next to a row of radial meters like tachometers running redline like a series of engines about to blow. These are proudly displayed using giant monitors centered on SOC walls to impress all and sundry. Meanwhile, the useful dashboards are subsets of those giant walls of information overload and are still difficult to use due to the dizzying array of information and data making it difficult for a human operator to zero in upon the most salient and useful bits of information needing further investigation or action.
In “Death by Information Overload,” an article in the September 2009 issue of Harvard Business Review, email is used as an example of individual information overload causing significant productivity losses for organizations. Like email, the volume of information coming at an analyst every day on SOC dashboards is often confusing and time consuming to interpret and manage more so than even a deluge of email in a day. Speeding up the decision-making process on security incidents leads to labor saving, shorter mean time to discovery and response, and faster determinations on true vs. false positives. As a result, any methodology that reduces the time and effort for an analyst to interpret, understand, and then act upon useful security related data has the potential for notable impacts on the organization in terms of cost savings from increased productivity; lower employee stress levels and burnout rates; and reduced risk due to improved security incident response and higher quality analysis.
To illustrate the differences in efficacy of viewing certain complex data and the impacts on a human being’s ability to discern the information out of the data presented, below is a series of visualization methods looking at the exact same data using the same search in Splunk.
The specific search to obtain the source events is long and complex and distracts from the points herein. Therefore, describing it as a search to present all events with either failed or succeeded authentication events across a variety of data sources will suffice.
NOTE: The specific search used was originally derived from an expansion of a Splunk Enterprise Security data model search resulting in the final pipe section using … | ”Authentication.action”=failure OR “Authentication.action”=success followed by a fun series of adventurous hoop jumping to arrive at our final data points.
The first evolution of condensed information display is to distill something down to simple data points.
Enter the single value charts. Behold the power of simple number displays:
At once simple and elegant, in this visualization there are no dizzying arrays of information to distract from the simple counts. However, these numbers without context are meaningless. What the analyst can take from this is “Wow, there was an order of magnitude more failures than successes!” There is no way to know, given this display, whether either of these numbers is alarming or simply normal for a week. Single numbers are highly useful in certain circumstances, as are radial dials, thermometer gauges, and other methods of quickly looking at a valuable metric that presents in context as a single value. This use case is not one of those circumstances, clearly.
NOTE: Recent Splunk single value metrics are far more useful in that they can show trend lines and the value increased or decreased since the last reporting period.
This proves context is of paramount importance to provide relative value for these data for an analyst. Skipping several iterations of methodologies to find valuable metrics for brevity (an amusing term on a blog post over 2,000 words), the result is looking at these data in the context of whether they are statistical anomalies based on simple mean and standard deviation (also called sigma) across several weeks’ time.
(Yes, the stats geeks will be adamant that these are worthless until the data is proven of normal distribution or means of means are used to force normal distribution; at which time the security analysts panic at loss of fidelity of original data; and other such perfectly valid arguments shall ensue. For the purposes of simplifying the example of visualization valuable information, this blog assumes the data in question is normally distributed, even though it is most certainly an unlikely scenario in the supplied dataset.)
Next up, we have the ubiquitous data table:
In this example, we see the data points clearly laid out (rounded to whole numbers). There are our same actual figures as seen in the Single Value dashboard, combined with last week’s number under goal. The forecast is an arbitrarily chosen value of 2 above the mean merely for illustrative purposes. The low, med, and high values are 3 below the mean (with negative values, being meaningless in this context, replaced with 0); the mean; and 3 above the mean, respectively, for sake of having some type of fairly standard benchmark. (For the non-stats folks, this means that 99.7% of all values will land within the the high range. For the purposes of aiding analyst, this is likely sufficient for visual reference.)
This is, also, difficult to read and interpret. The columns could be rearranged in some fashion, perhaps programmatically to place the actual, goal, and forecast values in between the low, med, and high values relative to their numerical value. This would slightly improve a human’s ability to read this information. However, this is quite cumbersome and slow for an analyst’s decision making process.
Next, there is the ever-popular column chart:
This is far easier to interpret quickly compared to the table. The low, med, and high marks are placed as dotted overlays (due to being single value parameters as opposed to multiple values). This has failures above and successes below, with the goal, actual, and forecast as columns. The relative values compared to the low, med, high ranges are instantly obvious, though slightly hard to see due to being single points. The current week is clearly more than last week on both charts, and the forecast is show to be optimistic and likely inaccurate at best. To use a more accurate forecast value, use Splunk’s predict command and other more complex and accurate methods. For this purpose, we’ll use this data point as assumed accurate benchmark.
There are, however, still problems with this chart. It is difficult to ascertain the general numerical values due to the squashing of the chart for it to appear in a reasonable screen space. Anything smaller would render it useless, and a taller visualization becomes difficult to view together without scrolling.
To conquer these problems, the bar chart comes into play:
This visualization is far easier to discern relative values and roughly specific values without further investigation. One can easily see that the actual successes are not far from last week on a relative level, and last week’s failures were far out of line with this week’s numbers. However, this chart carries over the other problems from the column chart.
In addition to the other issues discussed with the column and bar charts, both visualizations use multiple colors. Not only is using multiple colors potentially difficult for people with color vision deficiencies, but humans process different colors as information more slowly than they will gradients of single colors. Noah Iliinsky is one of many experts talking and writing about how humans perceive information, which has been studied rather extensively. He has a great chart illustrating his research on his site at http://complexdiagrams.com/properties, which is explained in detail in his 2013 IBM White Paper Choosing visual properties for successful visualizations. In short, there are faster and slower visual stimuli for humans to interpret, depending on the nature of the information. Generally speaking, the type of numerically ranked data used in the examples herein, is ordered with many data points, and is, therefore, better served by displays using length, size or area, combined with differences in saturation or brightness rather than different colors.
The previous charts may use different lengths to moderate effect, but they fall short on several other points, and the table is simply arcane and difficult to interpret at all, let alone quickly.
Stephen Few has studied, written, and spoken extensively on the subject of visualization in a larger context. He has developed a highly effective method for quickly viewing and interpreting the type of data used to create
the visualizations above: the bullet graph.
Not only is this display highly condensed on the screen, it allows for extremely rapid interpretation of these complex data relationships. Each chart is clearly labeled with numerical values close to the visual bands for easy estimations. The gray shaded background areas indicate the low, med, and high ranges, with the Auth Failures showing now low due to the value being set to 0. The vertical bar is the goal, or last week’s values in this context. The light blue line is the forecast value, and the dark blue line is the actual value. The blue shades are different enough from the gray shades that this could be rendered in grayscale or viewed by someone with color vision deficits with no change in presentation quality or value.
With bullet graph visualization of data, an analyst can quickly discern whether something seems anomalous or within normal bounds and make more rapid operational decisions, driven with higher quality data with better understanding.
This specific graph quickly and clearly shows that the Auth Failures count is generally between zero and 75,000 in a week, with the lower bound statistically below zero so there is no darker shade at the beginning of the graph. The last week, shown by the vertical bar, was notably smaller, being down in the 20,000 range, but the current week is closing in on 150,000 as shown by the dark blue bar. The projected value is near 200,000, which again was chosen arbitrarily for this example, but it could be based on any valid calculation for the given data in another use case.
The Auth Successes show three ranges based on standard deviation calculations, with the lower range crossing to mid-range around 1,800 and the high range starting around 6,200 based on the three grey shaded areas. The number of successes from last week was around 4,500 as shown by the vertical bar, and this week’s number is nearly 8,000 based on the darker blue line. The projection is over 9,000 based on the chosen metric.
The upshot is that an analyst can quite quickly see that the full range of possible values for Auth Failures is fairly wide, with last week being fairly anomalous compared and this week being rather high but generally in the range of normal trends. Therefore, there is a likelihood that last week something strange happened to have so few Auth Failures compared to this week. However, if this chart used Splunk’s predict command, it is possible there would be a clearer understanding of projected values for the final count of the week, which would provide a stronger relative sense of normal vs. anomaly.
The Auth Successes portion of the chart indicate there is a fairly high number of related events, but the overall trend indicates this is well within normal at the time the chart was drawn.
Background Notes:
Few provides the specifications for bullet graphs on his site at https://www.perceptualedge.com/articles/misc/Bullet_Graph_Design_Spec.pdf, including many grayscale and color examples.
The Splunk Bullet Graph Custom Visualization app is available on Splunkbase at https://splunkbase.splunk.com/app/3144/ with extensive documentation at http://docs.splunk.com/Documentation/CustomViz/1.0.0/BulletGraph/BulletGraphIntro.
Implementing the bullet graph app, and other amazing visualization apps, require the use of Splunk Enterprise 6.4’s new Custom Visualization framework, documented at http://docs.splunk.com/Documentation/Splunk/latest/AdvancedDev/CustomVizDevOverview and one of the best innovations in Splunk Enterprise to combat information overload in this data-driven age.
The examples above relied upon the late and dearly missed David Carasso’s command timewrap, found at https://splunkbase.splunk.com/app/1645/.
The data dashboards used this base search (skipping all the scary auth stuff to pull the specific auth related events):
… | search “Authentication.action”=failure OR “Authentication.action”=success | timechart span=1w count by Authentication.action | eventstats mean(failure) AS FailureMeanPast,stdev(failure) AS FailureStdevPast, mean(success) AS SuccessMeanPast, stdev(success) AS SuccessStdevPast | timewrap w
The subsequent searches for the two charts in each used (except the single value dashboard, which merely counted one or the other):
eval title=”Auth Successes” | eval actual=success_latest_week | eval goal=success_1week_before | eval range_low=SuccessMeanPast_latest_week-(SuccessStdevPast_latest_week*3) |eval range_low=if(range_low<0,0,range_low)| eval range_med=SuccessMeanPast_latest_week | eval range_high=SuccessMeanPast_latest_week+(SuccessStdevPast_latest_week*3) | eval forecast=SuccessMeanPast_latest_week+(SuccessStdevPast_latest_week*2) | table title goal range_low range_med range_high actual forecast
The table used a variation of the above with the values rounded.