Privacy Enhancing Data Anonymization Techniques

The following article analyses how anonymization methods and various other obfuscation techniques fare in their undertaking to defend against the privacy concerns.

 

Personally Identifiable Information (PII) & Privacy Enhancing Technologies (PET)

Personally Identifiable Information (PII)

PII is referring to all data or any information that carries with it a possibility of being used as means of identifying a specific individual. There are two types of PII, non-sensitive and sensitive. The only difference being that ‘sensitive’ PII information carries a likelihood of harming the individual whose information is compromised. The category of Personally Identifiable Information additionally includes also those types of information that could assist in distinguishing or de-anonymization process and turn anonymous information into personally identifiable data.

Privacy Enhancement Technologies (PET)

PET covers any method “that protects or enhances an individual’s privacy, including facilitating individual’s access to their rights under the Data Protection Act” (Shen, Y. and Pearson, S., 2011) and are traditionally used in combination with training, HR processes and higher level classification of company policies.

That said, the Privacy Enhancing Technologies (PETs) are primarily the means of alleviating the security issues traditionally connected to the protection of Personally Identifiable Information. The PET techniques are typically deployed to help resolve organizational data privacy concerns. The methods used in PET include approaches such as anonymization, privacy-aware data processing, defense against network invasion as well as improving Identity Management processes or introduction of specific privacy enhancing policies.

Privacy Enhancing Data Anonymization Techniques

The importance of taking privacy into account, as well as safe handling of Personally Identifiable Information (PII) is recognized by most of the businesses today. The individual employee information, information associated with business clients and third party business partners requires protection. The data anonymization is one of the ways to accomplish the task.

Data Anonymization

Data Anonymization is one of the most efficient privacy enhancement technologies. It works by obscuring privately identifiable data stored in a clear format and turning them into an irreversible anonymous information, that cannot be later de-anonymized.

The two of the most common methods of anonymizing the privately identifiable data sets are namely the ‘Data Encryption’ and ‘Removal of PII’ information. Both anonymization techniques assist mainly by sanitizing the data which in turn increases the overall protection of privacy within a data set.

Data Encryption

Alteration of entire dataset or at least PII identifiable elements of information into encrypted unreadable nonsensical information is one of the most secure ways of preserving the privacy. Encryption usually yields very secure outcomes, mainly because to gain access to the data without the encryption key dramatically lowers the possibility of data penetration and most infiltration or intrusion attempts. The phrase or password that cannot easily be cracked, is one of the best methods to preserves the integrity of data and provides an excellent way of blocking unauthorized access.

Data Masking (Removal of PII)

The method of data masking speaks primarily about the process of redacting, which is an actual removal (stripping) or obfuscation (replacement, masking, covering over) of the sensitive elements of the data set. The method deals mostly with PII information such as redacting credit cards, names, emails, IP addresses, computer IDs, mailing addresses, etc. and in such manner ensures the protection of most PII elements. The important to note is that the procedure in most cases is not reversible.

Real World Example – Delphix Corporation

http://www.newcontext.com/wp-content/uploads/2015/10/Delphix_logo.png

Delphix Corp is one of the companies that built a fantastic set of tools in the niche market of data anonymization. The company quickly recognized that data masking is one of the most efficient ways to ensure data security in a variety of environment. Delphix Data Masking is a product that automates the process of data masking.

How does it work? The Delphix Data Masking has a built-in masking algorithm that is deployed as an application in either the cloud infrastructure or in enterprise’s private data center. The Delphix Data Masking application constantly monitors the information in flight, in file systems, and inside databases; and immediately substitutes any sensitive data with a comparable, yet fabricated data. So it is essentially turning the actual PII information into one that is entirely fictitious. The company claims that their solution is 10 times faster than that of their competitors, has a minimal network footprint and “unlike encryption measures that can be bypassed through schemes to obtain user credentials, masking irreversibly protects data in downstream, non-production environments.” (Corp, D., 2017).

Data Anonymization Issues

However, can we always effectively turn PII data into an anonymous information?

In mid-1990, The Massachusetts Group Insurance Commission wanted to help researchers and decided to open some of their data publicly. More specifically a data set containing information about state employees and their hospital visits. To assure the public that all proper measures were taken, William Weld, then Governor of Massachusetts and resident of Cambridge, Massachusetts, sounded very confident in his public pledge of data safety. Weld claimed that GIC had the protection of patient privacy under complete control and all privately identifiable information was anonymized or removed from the data sets. Dr. Latanya Arvette Sweeney, now acting Director of the Data Privacy Lab at Harvard (Institute of Quantitative Social Science), requested a copy of the data. Then for mere twenty dollars purchased a Cambridge voter roll, which contained addresses, zip codes, birth dates and other information and started correlating the two data sets. This simple method of reverse engineering proved successful when Sweeney found Governor Weld’s health records. Apparently, in entire Cambridge, there were only 6 people that were born on the same date as Weld, but only 3 people shared his gender and only one of those men lived in his ZIP code area. To conclude the simple reverse-engineering exercise, “Dr. Sweeney sent the Governor’s health records (which included diagnoses and prescriptions) to his office.” (Anderson, N. , 2009), also claiming that “87 percent of all Americans can be individually identified using only three pieces of information such as area code, date of birth and gender.”.

The above story should serve as a good reminder that protecting the privacy of individuals in Big Data applications by erasing and anonymizing privately identifiable information, such as names, credit cards or social security numbers, may not lead to complete protection of privacy.

Summary

We have learned that the “data in transit or at rest can be extremely vulnerable to a breach” (Simpson, J., 2017). As well as seen first hand that the data removal, masking, obfuscation as well as data encryption are efficient approaches to ensuring the protection of the data and the sensitive information it may contain.

To conclude, I want to say, that in my opinion, the best method of preventing the data breach is not to rely on any single of the methods. However, rather deploy various methods of protecting the privacy of the data sets, such as using anonymization techniques like data encryption and data removal/masking in unison. By combining the strengths of two or more methods, we will likely gain the most benefit and better ensure the protection of data sources.

References

Big Data Security (2014) University of Liverpool. Available at: https://elearning.uol.ohecampus.com/bbcswebdav/institution/UKL1/201740JAN/MS_CKIT/CKIT_525/readings/UKL1_CKIT_525_Week08_LectureNotes.pdf (Accessed: 4 March 2017).

Cloud Security Alliance (2013) Top Ten big data security and privacy challenges. Available at: https://downloads.cloudsecurityalliance.org/initiatives/bdwg/Big_Data_Top_Ten_v1.pdf (Accessed: 4 March 2017).

Shen, Y. and Pearson, S. (2011) Privacy enhancing technologies: A review. Available at: http://www.hpl.hp.com/techreports/2011/HPL-2011-113.pdf (Accessed: 5 March 2017).

Anderson, N. (2009) ‘Anonymized’ data really isn’t—and here’s why not. Available at: https://arstechnica.com/tech-policy/2009/09/your-secrets-live-online-in-databases-of-ruin/ (Accessed: 5 March 2017).

Simpson, J. (2017) data masking and Encryption are different. Available at: http://www.iri.com/blog/data-protection/data-masking-and-data-encryption-are-not-the-same-things/ (Accessed: 5 March 2017).

Corp, D. (2017) Data masking products. Available at: https://www.delphix.com/products/delphix-data-masking (Accessed: 5 March 2017).

Comments

comments