...
Intake is a Python package that is used for data access and management. It provides a simple interface for loading and exploring data from different data sources in both local and remote storage systems. Intake allows users to define and share data catalogs, which are collections of metadata about the data sources. Intake also provides a mechanism for lazy loading of data, allowing users to work with extremely large datasets (> PB), with index beyond capability of available memory.
NCI provides 2 dataset two dataset indexing schemes as below based on different Intake techniques in use.:
- Intake-spark Scheme ( for expert users): For every data collection hosted by NCI, we generate intake data source files in parquet format, encompassing all file attributes as metadata. These files can be manipulated using the intake-spark package.
- Intake-ESM Scheme ( for climate users): Additionally, for certain data collections, we create lightweight data source files in CSV format, containing selected metadata. These files can be handled seamlessly with our NCI intake-esm indexes, and handled with associated intake software.
...