I have installed and played around with pyesgf http://esgf-pyclient.readthedocs.org/en/latest/index.html a python module that allows to interact with the ESGF API using python.
Basically you can build a search passing on constraints as you can do online and get back the search results.
The advantage of using this is that you can decide how you want these results to be retrieved, for example currently Lawson scripts and also my own python scripts where retrieving a wget file which then was parsed to extract the information on available files such as checksum, download url etc.
With pyesgf you have different options to retrieve the search results:
wget
opendap aggregation
a "list of datasets" object DatasetResult and each of these has a "list of files" object FileResult
Following are some of the properties that can be retrieve for each of these
DatasetResult:
methods: 'context', 'dataset_id', 'download_url', 'file_context', 'index_node', 'json', 'las_url', 'number_of_files', 'opendap_url', 'urls'
Where all the info is hidden is the "json" attribute (a python dictionary), that's the actually entire response from the server, so you get from it much more information including the version. The dict keys are:
[u'index_node', u'version', u'dataset_id_template_', u'cf_standard_name', u'number_of_aggregations', u'south_degrees', u'drs_id', u'cmor_table', u'replica', u'west_degrees', u'master_id', u'height_top', u'id', u'datetime_stop', u'height_bottom', u'access', u'realm', u'data_node', u'title', u'description', u'instance_id', u'height_units', u'variable_long_name', u'metadata_format', u'variable_units', u'score', u'datetime_start', u'number_of_files', u'_version_', u'size', u'type', u'ensemble', u'product', u'experiment', u'format', u'timestamp', u'time_frequency', u'_timestamp', u'variable', u'east_degrees', u'experiment_family', u'forcing', u'institute', u'north_degrees', u'project', u'url', u'model', u'latest']
FileResult:
methods: 'checksum', 'checksum_type', 'context', 'download_url', 'file_id', 'filename', 'index_node', 'json', 'las_url', 'opendap_url', 'size', 'tracking_id', 'urls'
json dictionary keys:
[u'index_node', u'version', u'dataset_id_template_', u'cf_standard_name', u'cmor_table', u'replica', u'master_id', u'dataset_id', u'id', u'size', u'instance_id', u'realm', u'data_node', u'title', u'description', u'drs_id', u'variable_long_name', u'metadata_format', u'variable_units', u'score', u'_version_', u'forcing', u'type', u'ensemble', u'product', u'experiment', u'format', u'timestamp', u'time_frequency', u'variable', u'_timestamp', u'institute', u'checksum', u'experiment_family', u'project', u'url', u'tracking_id', u'checksum_type', u'model', u'latest']
ESGF search records contain various ids. These include:
The dataset_id listed above is the first, but we could potentially retrieve any of them, Examples are
id - cmip5.output1.MIROC.MIROC5.historical.mon.atmos.Amon.r5i1p1.v20120710|aims3.llnl.gov which is the dataset_id property
drs_id - cmip5.output1.MIROC.MIROC5.historical.mon.atmos.Amon.r5i1p1
master_id - cmip5.output1.MIROC.MIROC5.historical.mon.atmos.Amon.r5i1p1
instance_id - cmip5.output1.MIROC.MIROC5.historical.mon.atmos.Amon.r5i1p1.v20120710