Welcome to NCI's Data Collections information space, where you can find out more about the National Reference Data Collections managed at NCI, the data services we offer, and information about the data publishing process at NCI.
National Reference Datasets
The NCI National Research Data Collection is Australia’s largest collection of research climate, weather, Earth systems, environmental, satellite, and geophysics research datasets. NCI also has many other specialised domain datasets, such as optical astronomy and genomic data. This data is a mix of nationally generated datasets as well as replicated international datasets that need to be hosted at NCI.
- FAIR data principles for its major data collections. FAIR is Findable, Accessible, Interoperable, Reusable.
- Programmable and high performance access
- Open as possible, Closed as necessary
- Use Data Standards where-ever possible
- Transdisciplinary access
Finding and Accessing the datasets published by NCI
You can discover the datasets published and available at NCI using our NCI GeoNetwork catalogue, using ISO19115 compliant data records. As well as the general data catalogue, there are specialist domain information such as the Coupled Model Intercomparison Project (CMIP) service, or Australasia Regional Copernicus Hub. Each collection and constituent dataset has information available as catalogue records in through the NCI GeoNetwork. The data can be accessed through:
- NCI Lustre filesystems /g/data[1a,1b,2,..]/<NCI code>, which are available on NCI's Raijin or VDI systems
- NCI THREDDS data service (httphttps://dapds00.nci.org.au), primarily using Open Geospatial Data Services (OGC) and DAP protocols (e.g., subsetting and aggregation)
- GSKY data service (httphttps://gsky.nci.org.au) using OGC data protocols (WMS, WCS and WPS) for very large datasets (e.g., Satellite imagery)
- NCI Terria server (https://terria.nci.org.au) using the TerriaJS software for accessing GSKY and OGC services (e.g., our THREDDS server)
- Earth Systems Grid Federation (httphttps://esgf.nci.org.au) using DAP protocols
- Sentinel Data service (https://copernicus.nci.org.au/sara.client/#/home)
NCI tracks usage statistics around all accesses on datasets - via the open data services and the different protocols of access and usage, as well as in-situ access within the NCI computing systems. This provides information for planning and measuring demand for existing datasets, as well as impacts for upgrades and decommissioning of datasets.
NCI has a team of expert data managers who work with, organise, and curate the datasets for optimal accessibility, analysis and data publication and accessibility.
These definitions have been described in more detail in a peer reviewed paper on our approach to Quality Data Management. NCI mostly focuses on datasets since it is an more tightly defined data product, and uses subcollections and collections to organise for both data management and licensing requirements.
Preparing and organising datasets
To provide a data publication and sharing service, NCI provides a data management team. This team works closely with data depositors to develop their Data Management Plan that will inform how datasets are catalogued, published, data capacity and managed over time on NCI. This also prepares the information for how the datasets are supported and paid for (e.g., through NCRIS, agency or university funding).
NCI’s Data Management Tool portal provides the access point for providing this information. NCI adheres to Implementing a Data Quality Strategy to Simplify Access to Data (link to AGU 2016 abstract)and our Data Quality Strategy to Enable FAIR, Programmatic Access across Large, Diverse Data Collections for High Performance Data Analysis (link to Informatics - Open Access Journal).
Help with Accessing or managing datasets
If you represent a university, federal or state government science or institution, NCRIS capability that generates, owns or requires access to big data, contact us at email@example.com to find out how we can help you.