We’re hosting a tutorial to introduce the NCI data catalog and its two indexing schemes: Intake-ESM and Intake-Spark. A data catalog helps users discover and access datasets through structured metadata, while indexing improves performance by enabling fast, targeted searches. Built on the Python Intake package, these tools support scalable, memory-efficient access to large datasets. At NCI, Intake-Spark uses Parquet-based indexes for high-performance querying with Spark, while Intake-ESM uses lightweight CSV-based indexes ideal for climate data workflows. This session will include hands-on Jupyter Notebook examples showing how to use the catalog in data analysis and machine learning workflows. You’ll learn how to search, load, and filter datasets efficiently from the /g/data collections. The tutorial is ideal for researchers working with large-scale data or looking to streamline their pipelines. Stay tuned for scheduling details—everyone is welcome!
In order to participate in the practises session of workshop please check the prerequisites well in advance of the workshop.
Prerequisites
Join the following projects before the tutorial
dk92: NCI-data-analysis virtual environment group
wb00: AI/DL reference data collection
oi10: ESGF CMIP6 Replication Data
Agenda
Exploring the NCI Data Catalog with Intake-Spark and Intake-ESM
Date: 11 June 2025 Time: 2:00 PM – 4:00 PM
Time | Speaker | Session |
---|---|---|
2:00 – 2:15 PM | Dr. Rui Yang | Welcome and Set up ARE JupyterLab Session |
2:15 – 2:30 PM | Dr. Hannes Hollmann | Overview of NCI’s Data Services |
2:30 – 2:45 PM | Dr. Rui Yang | Introduction to NCI’s Data Catalog and Indexing Schemes |
2:45 – 3:00 PM | Dr. Rui Yang | Hands-on Practise: Working with the Intake-ESM Indexing Scheme |
3:00 – 3:15 PM | Dr. Rui Yang | Hands-on Practise: Applying the Intake-ESM Scheme in AI/ML Workflows |
3:15 – 3:20 PM | short break | |
3:20 – 3:45 PM | Dr. Rui Yang | Hands-on Practise: Using the Intake-Spark Indexing Scheme |
3:45 – 4:00 PM | All | Open Discussion: Q&A, Support Needs, and Feedback |