De-Identifying Government Datasets

Garfinkel, Simson

SP 800-188 (Draft)

Obsoleted on December 15, 2016 by SP 800-188 (Draft).

De-Identifying Government Datasets

Date Published: August 2016
Comments Due: September 26, 2016 (public comment period is CLOSED)
Email Questions to: sp800-188-draft@nist.gov

Author(s)

Simson Garfinkel (NIST)

Announcement

NIST Requests Comments on a Draft Special Publication regarding the De-Identification of Government Datasets

In developing the draft Privacy Risk Management Framework, NIST sought the perspectives and experiences of de-identification experts both inside and outside the US Government.

Future areas of work will focus on developing metrics and tests for de-identification software, as well as working with industry and academia to make algorithms that incorporate formal privacy guarantees usable for government de-identification activities.

Abstract

De-identification removes identifying information from a dataset so that the remaining data cannot be linked with specific individuals. Government agencies can use de-identification to reduce the privacy risk associated with collecting, processing, archiving, distributing or publishing government data. Previously NIST published NISTIR 8053, “De-Identifying Personal Data,” which provided a survey of de-identification and re-identification techniques. This document provides specific guidance to government agencies that wish to use de-identification. Before using de-identification, agencies should evaluate their goals in using de-identification and the potential risks that de-identification might create. Agencies should decide upon a de-identification release model, such as publishing de-identified data, publishing synthetic data based on identified data, and providing a query interface to identified data that incorporates de-identification. Agencies can use a Disclosure Review Board to oversee the process of de-identification; they can also adopt a de-identification standard with measurable performance levels. Several specific techniques for de-identification are available, including de-identification by removing identifiers and transforming quasi-identifiers and the use of formal de-identification models that rely upon Differential Privacy. De-identification is typically performed with software tools which may have multiple features; however, not all tools that mask personal information provide sufficient functionality for performing de-identification. This document also includes an extensive list of references, a glossary, and a list of specific de-identification tools, although the mention of these tools is only to be used to convey the range of tools currently available, and is not intended to imply recommendation or endorsement by NIST.

Keywords

de-identification; re-identification; Disclosure Review Board; data life cycle; the five safes; k-anonymity; differential privacy; pseudonymization; direct identifiers; quasi-identifiers; privacy; synthetic data

Control Families

Program Management; Risk Assessment; System and Communications Protection

Documentation

Publication:
Draft (1st) SP 800-188

Supplemental Material:
None available

Related NIST Publications:
NISTIR 8053

Document History:
08/25/16: SP 800-188 (Draft)
12/15/16: SP 800-188 (Draft)

Topics

Security and Privacy
privacy

Laws and Regulations
E-Government Act

Information Technology Laboratory

Computer Security Resource Center

Computer Security Resource Center

SP 800-188 (Draft)

De-Identifying Government Datasets

Author(s)

Announcement

Abstract

Keywords

Control Families

Documentation

Topics