De-Identifying Government Datasets (2nd Draft)

Garfinkel, Simson

SP 800-188 (Draft)

De-Identifying Government Datasets (2nd Draft)

Date Published: December 2016
Comments Due: December 31, 2016 (public comment period is CLOSED)
Email Questions to: sp800-188-draft@nist.gov

Author(s)

Simson Garfinkel (NIST)

Announcement

De-identification removes identifying information from a dataset so that the remaining data cannot be linked with specific individuals. Government agencies can use de-identification to reduce the privacy risk associated with collecting, processing, archiving, distributing or publishing government data. Previously NIST published NISTIR 8053, De-Identification of Personal Information, which provided a survey of de-identification and re-identification techniques. This document provides specific guidance to government agencies that wish to use de-identification.

In developing the draft Privacy Risk Management Framework, NIST sought the perspectives and experiences of de-identification experts both inside and outside the US Government.

Future areas of work will focus on developing metrics and tests for de-identification software, as well as working with industry and academia to make algorithms that incorporate formal privacy guarantees usable for government de-identification activities. Collected input will be used to correct technical errors and expand areas that are unclear.

Abstract

De-identification is a process that is applied to a dataset to reduce the risk of linking information revealed in the dataset to specific individuals. Government agencies can use de-identification to reduce the privacy risk associated with collecting, processing, archiving, distributing or publishing government data. Previously NIST published NISTIR 8053, De-Identification of Personal Information, which provided a survey of de-identification and re-identification techniques. This document provides specific guidance to government agencies that wish to use de-identification. Before using de-identification, agencies should evaluate their goals in using de-identification and the potential risks that de-identification might create. Agencies should decide upon a de-identification release model, such as publishing de-identified data, publishing synthetic data based on identified data, or providing a query interface that incorporates de-identification of the identified data. Agencies can create a Disclosure Review Board to oversee the process of de-identification; they can also adopt a de-identification standard with measurable performance levels. Several specific techniques for de-identification are available, including de-identification by removing identifiers and transforming quasi-identifiers and the use of formal privacy models. People performing de-identification generally use special-purpose software tools to perform the data manipulation and calculate the likely risk of re-identification. However, not all tools that merely mask personal information provide sufficient functionality for performing de-identification. This document also includes an extensive list of references, a glossary, and a list of specific de-identification tools, although the mention of these tools is only to be used to convey the range of tools currently available, and is not intended to imply recommendation or endorsement by NIST.

Hide full abstract

Keywords

privacy; de-identification; re-identification; Disclosure Review Board; data life cycle; the five safes; k-anonymity; differential privacy; pseudonymization; direct identifiers; quasi-identifiers; synthetic data

Control Families

Program Management; Risk Assessment; System and Communications Protection

Documentation

Publication:
Draft SP 800-188

Supplemental Material:
Comment Template (word)

Related NIST Publications:
NISTIR 8053

Document History:
08/25/16: SP 800-188 (Draft)
12/15/16: SP 800-188 (Draft)

Topics

Security and Privacy
privacy

Laws and Regulations
E-Government Act

Information Technology Laboratory

Computer Security Resource Center

Computer Security Resource Center

SP 800-188 (Draft)

De-Identifying Government Datasets (2nd Draft)

Author(s)

Announcement

Abstract

Keywords

Control Families

Documentation

Topics