Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Show unique values of a single column of "attributes.experiment_id" and "attributes.institution_id"


Code Block
In[3]: df.select("attributes.experiment_id")\


.distinct().show()


+--------------+


| experiment_id|


+--------------+


|   ssp534-over|


|piClim-histall|


| esm-piControl|


|        ssp585|


|     piControl|


|        ssp460|


|piClim-control|


|        ssp370|


|        ssp126|


|        ssp119|


|        ssp245|


|       1pctCO2|


 abrupt-4xCO2|


|          amip|


|    historical|


|piClim-histghg|


|        ssp434|


|      esm-hist|


|piClim-histaer|


|      hist-aer|


+--------------+


only showing top 20
rows
  
 rows
  
In[3]: df.select("attributes.institution_id")\


.distinct().show()


+-----------------+


|institution_id   |


+-----------------+


|UA               |


|IPSL             |


|E3SM-Project     |


|CCCma            |


|CAS              |


|MIROC            |


|NASA-GISS        |


|MRI              |


|NCC              |


|HAMMOZ-Consortium|


|NCAR             |


|NUIST            |


|NOAA-GFDL        |


|NIMS-KMA         |


|BCC              |


|CNRM-CERFACS     |


|AWI              |


|MPI-M            |


|KIOST            |


|MOHC             |


+-----------------+


only showing top 20 rows
Code Block
In[3]: df.select("attributes.institution_id",\


          "attributes.experiment_id")\ .distinct().show(40,truncate=False)


 


+--------------+--------------+


|institution_id|experiment_id |


+--------------+--------------+


|NCC           |ssp585        |


|NCAR          |esm-piControl |


|MRI           |esm-hist      |


|NOAA-GFDL     |ssp370        |


|E3SM-Project  |ssp585        |


|UA            |ssp245        |


|NUIST         |ssp245        |


|UA            |ssp370        |


|CNRM-CERFACS  |ssp245        |


|NOAA-GFDL     |ssp585        |


|CMCC          |abrupt-
4xCO2 
4xCO2  |


|MIROC         |ssp534-over   |


|CAMS          |
1pctCO2      
1pctCO2       |


|CNRM-CERFACS  |ssp126        |


|NASA-GISS     |piClim-histall|


|CMCC          |piControl     |


|MRI           |historical    |


|BCC           |piControl     |


|MPI-M         |esm-hist      |


|NCAR          |historical    |


|MIROC         |ssp126        |


|MOHC          |piClim-histall|


|CAMS          |piControl     |


|MIROC         |ssp370        |


|MOHC          |ssp534-over   |


|MPI-M         |amip          |


|NOAA-GFDL     |ssp126        |


|CCCma         |piClim-histall|


|CAMS          |abrupt-
4xCO2 
4xCO2  |


|CAMS          |amip          |


|CNRM-CERFACS  |ssp370        |


|KIOST         |ssp126        |


|CNRM-CERFACS  |ssp585        |


|NCAR          |piClim-control|


|UA            |ssp126        |


|MOHC          |ssp119        |


|CMCC          |amip          |


|MRI           |abrupt-
4xCO2 
4xCO2  |


|NUIST         |ssp126        |


|NASA-GISS     |piClim-control|


+--------------+--------------+


only showing top 40 rows


You can specify a larger number in show() function and set "truncate=False" to display more rows in full lengths.

...