Data Breaches: The threat of “unknown unknowns”
Posted by Juliet Sigmann on Mon, Jul 07, 2008
The recently published Verizon Data Breach Study,
2008 Data Breach Investigations Report, has some remarkable analysis on data breaches. Based on analyzing a collection of 500 forensic investigations over four years, the study describes interesting patterns that emerge from detailed understanding of data breaches. It is important to note that while the Verizon study has a large sample size; it is biased because it only contains investigations of Verizon customers. Even so, this study has a few statistics that are worth noting. One interesting statistic deals with the threat of "unknown unknowns" and the need for data risk assessment.
What is an unknown unknown?
The Verizon study defines an unknown unknown along one of four dimensions - a system, a data store, a network connection, or privileged access.
1. Unknown system - a system unknown to the organization or business group affected by the breach
2. Unknown data - a data store that the organization did not know existed on that system
3. Unknown network connection - an unknown network connection or access used by the data breach
4. Unknown privileged access - a system that had unknown accounts or privileges
How often is there an unknown unknown involved in a data breach?
90% of data breaches involve an unknown unknown! This, I think, is a truly remarkable statistic. In essence, unknown unknowns seem to be a lead indicator for data breaches.
How is the threat of unknown unknowns distributed?
The figure shows the distribution of unknown unknowns. 66% of data breaches involve unknown data, 27 % involve unknown network connection, 10% involve unknown system, while 7% involve an unknown privileged access. Vast majority involve unknown data - data that the organization did not know was present on the system.
Implications for risk & threat management: What can we do about unknowns?
Threat management may be more black and white than gray
In a sense this points out that most breaches are caused because of an inadequate visibility of the most critical elements of an organization system - its critical applications, its critical data assets, its network connectivity, or privileged access flows. This problem is relatively straightforward to fix since it involves visibility of a relatively small number of critical elements. This is good news. Imagine if this were not the case. If data breaches had involved mostly known elements, both detection and risk mitigation would have had to become more sophisticated around behavior.
Step 1 - Data: The dominant unknown
This analysis shows that 66% of data breaches have an element of unknown data or data store. This points to the need for data discovery (to find databases) and data classification (particularly for critical data assets like SSN, credit cards, and PII information). If such stores were to be found and inventoried, their security would automatically be raised. The simplest first step is to discover the unknown data. It also covers dominant number of breaches, so this step has a huge return on investment.
Step 2 - Data activity risk assessment: cover all unknown unknowns
The next important step is to do a data activity risk assessment - that is to monitor the flows around critical stored data in databases & fileservers (discovered in step above). These flows can show all access activity around stored data. From these activities, simple risk assessment reports can be generated that shed light on the remaining three unknowns accounting for 44% of data breaches:
- Unknown privileges - Discovering and classifying privileged activity with roles should be part of a basic risk assessment exercise. Additionally, high risk privileged activity (such as failed logins, privilege escalation, error codes, access to customer data) should be framed as high priority risks and escalated.
- Unknown connections - Discovering and tracking connections based on access protocols, applications, IP addresses, and hosts should be another part of risk assessment. High risk connections (such as ad hoc activity outside normal applications, activity outside typical set of hosts or IP addresses, etc.) should be framed as high priority risks and escalated.
- Unknown system - Discovering all systems (such as databases and fileservers) that have critical assets should be a basic part of risk assessment.