The value of any data at NCI is considered at the Data Collection and SubCollection level, including funding arrangements for the storage allocations for each of the underlying data Subcollections.
Data Quality Strategy
To ensure that data is managed for all these different uses, NCI Data Quality Strategy to Enable FAIR, Programmatic Access across Large, Diverse Data Collections for High Performance Data Analysis (link to Informatics - Open Access Journal) and how we are Implementing a Data Quality Strategy to Simplify Access to Data (link to AGU 2016 abstract) and AGU-DQS-2016-Druken_v2-1.pdf (PDF).
This process is to ensure that datasets complies with known standards, can be delivered using data services and their specialised capabilities, are represented correctly through identified dataa portals of need to the community, and can be used programmatically for high performance simulation and data analysis.
Data Hierarchy Definitions
|Dataset||A Dataset is a compilation of data that constitutes a programmable data unit that has been collected and organised using the one process. For this purpose it must have a named Data Owner, a single license, one set of semantics, ontologies, vocabularies, and has a single data format and internal data convention. A Dataset must include its version.|
|Data Subcollection||A Data Subcollection is an exclusive grouping of Datasets (i.e., belonging to only one Subcollection) where the constituent Datasets are tightly managed. It must have responsibilities within one organisation with responsibility for the underlying management of its constituent datasets. A Data Subcollection constitutes a strong connection between the component Datasets, and is organised coherently around a single scientific element (e.g., model, instrument). A Subcollection must have compatible licenses such that constituent Datasets do not need different access arrangements.|
A Data Collection is the highest in the hierarchy of data groupings at NCI. It is comprised of either an exclusive grouping of Data Subcollections; or, it is a tiered structure with an exclusive grouping of lower tiered Data Collections, where the lowest tier Data Collection will only contain Data Subcollections.
|Dataset Granule||A Dataset Granule is sometimes used for some scientific domains – particularly in Satellite Earth Observation. In this case it refers to the smallest aggregation of data that can be independently described, inventoried, and retrieved (https://earthdata.nasa.gov/user-resources/glossary#ed-glossary-g ). Dataset granules have their own metadata and support values associated with the additional attributes defined by parent Datasets.|