Date Published: November 15, 2022
Comments Due:
Email Questions to:
Author(s)
Simson Garfinkel (NIST), Phyllis Singer (U.S. Census Bureau), Joseph Near (University of Vermont), Aref Dajani (U.S. Census Bureau), Barbara Guttman (NIST)
Announcement
De-identification removes identifying information from a data set so that the remaining data cannot be linked to specific individuals. Government agencies can use de-identification to reduce the privacy risks associated with collecting, processing, archiving, distributing, or publishing government data. Previously, NIST published NIST Internal Report (IR) 8053, De-Identification of Personal Information, which provided a survey of de-identification and re-identification techniques. This document provides specific guidance to government agencies that wish to use de-identification.
Six years have passed since NIST released the second draft of SP 800-188. During this time, there have been significant developments in privacy technology, specifically in the theory and practice of differential privacy. While this draft reflects some of those advances, it remains focused on de-identification, as differential privacy is still not sufficiently mature to be used by many government agencies. Where appropriate, this document cautions users about the inherent limitations of de-identification when compared to formal privacy methods, such as differential privacy.
We encourage you to use this
comment template to document your comments on the draft.
NOTE: A call for patent claims is included on page ii of this draft. For additional information, see the Information Technology Laboratory (ITL) Patent Policy – Inclusion of Patents in ITL Publications.
De-identification is a process that is applied to a dataset with the goal of preventing or limiting informational risks to individuals, protected groups, and establishments while still allowing for meaningful statistical analysis. Government agencies can use de-identification to reduce the privacy risk associated with collecting, processing, archiving, distributing, or publishing government data. Previously, NISTIR 8053, De-Identification of Personal Information, provided a survey of de-identification and re-identification techniques. This document provides specific guidance to government agencies that wish to use de-identification. Before using de-identification, agencies should evaluate their goals for using de-identification and the potential risks that de-identification might create. Agencies should decide upon a de-identification release model, such as publishing de-identified data, publishing synthetic data based on identified data, or providing a query interface that incorporates de-identification. Agencies can create a Disclosure Review Board to oversee the process of de-identification. They can also adopt a de-identification standard with measurable performance levels and perform re-identification studies to gauge the risk associated with de-identification. Several specific techniques for de-identification are available, including de-identification by removing identifiers and transforming quasi-identifiers and the use of formal privacy models. People performing de-identification generally use special-purpose software tools to perform the data manipulation and calculate the likely risk of re-identification. However, not all tools that merely mask personal information provide sufficient functionality for performing de-identification. This document also includes an extensive list of references, a glossary, and a list of specific de-identification tools, which is only included to convey the range of tools currently available and is not intended to imply a recommendation or endorsement by NIST.
De-identification is a process that is applied to a dataset with the goal of preventing or limiting informational risks to individuals, protected groups, and establishments while still allowing for meaningful statistical analysis. Government agencies can use de-identification to reduce the privacy...
See full abstract
De-identification is a process that is applied to a dataset with the goal of preventing or limiting informational risks to individuals, protected groups, and establishments while still allowing for meaningful statistical analysis. Government agencies can use de-identification to reduce the privacy risk associated with collecting, processing, archiving, distributing, or publishing government data. Previously, NISTIR 8053, De-Identification of Personal Information, provided a survey of de-identification and re-identification techniques. This document provides specific guidance to government agencies that wish to use de-identification. Before using de-identification, agencies should evaluate their goals for using de-identification and the potential risks that de-identification might create. Agencies should decide upon a de-identification release model, such as publishing de-identified data, publishing synthetic data based on identified data, or providing a query interface that incorporates de-identification. Agencies can create a Disclosure Review Board to oversee the process of de-identification. They can also adopt a de-identification standard with measurable performance levels and perform re-identification studies to gauge the risk associated with de-identification. Several specific techniques for de-identification are available, including de-identification by removing identifiers and transforming quasi-identifiers and the use of formal privacy models. People performing de-identification generally use special-purpose software tools to perform the data manipulation and calculate the likely risk of re-identification. However, not all tools that merely mask personal information provide sufficient functionality for performing de-identification. This document also includes an extensive list of references, a glossary, and a list of specific de-identification tools, which is only included to convey the range of tools currently available and is not intended to imply a recommendation or endorsement by NIST.
Hide full abstract
Keywords
data life cycle; de-identification; differential privacy; direct identifiers; Disclosure Review Board; the five safes; k-anonymity; privacy; pseudonymization; quasi-identifiers; re-identification; synthetic data
Control Families
Program Management; Risk Assessment; System and Communications Protection