Skip to Main Content

Writing an NIH Data Management & Sharing Plan

Guidance, considerations, tips, and resources to write a competitive DMS Plan for your NIH proposal

Common Terms

Confidentiality is the protection of data from unauthorized access and disclosure. (source: ICPSR)

Data catalogs act as an index of datasets, using metadata to describe the content and location of data. As opposed to a repository that stores the data in one place, a data catalog describes data wherever it is stored. Data catalogs may also describe restricted data, outlining how a person can apply for access to data that is restricted due to ethical or legal reasons. (source: NNLM Data Glossary)

Data curation is the process of organizing, describing, cleaning, enhancing, and preserving data to make the data accessible to the public now and in the future. (source: ICPSR)

Data Use Agreement (DUA): A contractual agreement used to define how access to and/or exchanged data may be used.

De-identified data (according to HIPAA): Data are considered de-identified if the covered entity removes 18 specified personal identifiers from the data. (source FDP DTUA Glossary)

Disclosure Risk is the degree of risk that a data record from a study could be linked to a specific person or organization, thereby revealing information that otherwise would not be known or known with as much certainty (source: ICPSR)

Limited Dataset: Protected Health Information that excludes the following direct identifiers of the patient or of relatives, employers, or household members of the patient: Names; Postal address information, other than town or city, State, and zip code; Telephone numbers; Fax numbers; Electronic mail addresses; Social security numbers; Medical record numbers; Health plan beneficiary numbers; Account numbers; Certificate/license numbers; Vehicle identifiers and serial numbers, including license plate numbers; Device identifiers and serial numbers; Web Universal Resource Locators (URLs); Internet Protocol (IP) address numbers; Biometric identifiers, including finger and voice prints; Full face photographic images and any comparable images; and Any other unique identifying number, characteristic, or code except as specifically permitted by HIPAA. (source FDP DTUA Glossary)

Metadata is structured data that provides information about data, conveying information necessary to ensure data are discoverable, accessible, and usable. (source: ICPSR)

Open Data is data that are freely available for reuse/secondary data analysis. The “openness” of data can be viewed and treated differently based on a spectrum of openness ranging from being highly restricted (as in the case of personal health information) to being freely available data that are usable. In general, open data follows the FAIR principles of being: Findable, Accessible, Interoperable and Reusable. (source: NNLM Data Glossary)

Personally Identifiable Information (PII): Any information maintained by an agency, including: (1) any information that can be used to distinguish or trace an individual’s identity, such as name, social security number, date and place of birth, mother’s maiden name, or biometric records; and (2) any other information that is linked or linkable to an individual, such as medical, educational, financial, and employment information. (source FDP DTUA Glossary)

Public-use data: Data in the public domain; no regulatory prescription for its use. (source FDP DTUA Glossary)

(data) Repository is infrastructure that collects, manages, stores data for analysis, sharing, and reporting. (source: ICPSR, Brook, 2018). There are several types of repositories, including but not limited to institutional repositories, generalist repositories, and subject-specific repositories. Institutional repositories are generally managed and financed by an academic institution for use by its researchers and their colleagues. Subject-specific repositories are designed to fit the needs of specific research fields and the data they produce. Generalist repositories are intended to store and render findable any data from any field. (source: NNLM Data Glossary)