The following tutorial demonstrates how to access magnetotelluric (MT) time series data stored as NetCDF4 files from the National Computational Infrastructure (NCI) THREDDS data server.
This tutorial is written as a Jupyter notebook, which can be accessed here. To run this notebook on NCI's ARE VDI app, you can use the dk92 NCI-data-analysis environment.
If you have access to NCI, to run the example in our notebooks:
- Login to the ARE service
- Launch a Virtual Desktop Interface (VDI) instance.
- Open up a terminal
- cp /g/data/dk92/notebooks/geophysics-mt/* <your local working directory>
- cd <your local working directory>
- module use /g/data/dk92/apps/Modules/modulefiles
- module avail NCI-data-analysis
- module load NCI-data-analysis_2021.06
- To launch the Jupyter notebook from the terminal, run:
- Jupyter notebook MT_02_accessing_and_processing_time_series_data_from_THREDDS_using_OPeNDAP.ipynb
Describing the example
In this example, we analyse some MT time series data directly from the THREDDS server API as a network service rather than downloading and storing a local copy for processing. The example makes use of software called IOOS THREDDS crawler, as well as the Python3 library urllib.request.
Additionally, the following Python3 libraries are used by this notebooks, for which we have included links to their reference documentation:
netcdf4-python
matplotlib.pyplot
numpy
os
The following information is provided in the notebook, but may be easier to read for some web browsers rather than the notebook.
Import libraries
The following code block demonstrates how to import the various libraries.
from thredds_crawler.crawl import Crawl import urllib.request import urllib from netCDF4 import Dataset import matplotlib.pyplot as plt import numpy as np import os %matplotlib inline
Crawl the NCI THREDDS Data Server AusLAMP Musgraves MT time series endpoint xml
For this example, we will first search the NCI Geonetwork Catalogue for the South Australian sites of the AusLAMP Level 1 Musgraves MT times series data. Let's navigate to THREDDS by clicking on "Open link" for the SA sites:
This will bring you to the following page:
To get the XML endpoint used for our IOOS crawler, simply replace "catalog.html" with "catalog.xml" in the THREDDS URL above:
Now let's run the IOOS THREDDS crawler for all NetCDF files from this XML endpoint:
c = Crawl('http://dapds00.nci.org.au/thredds/catalog/my80/AusLAMP/SA_AusLAMP_MT_Survey_Musgraves_APY_2016_to_2018/SA/Level_1_Concatenated_Resampled_Rotated_Time_Series_NetCDF/catalog.xml', select=['.*nc'])
Note that if the crawl is taking a long time (more than 30 seconds), just stop and re-run the Jupyter notebook cell. Once the crawl is complete, we can use OPeNDAP to view the MT time series metadata and data. To do this, we need to first define all OPeNDAP ulrs:
urls_opendap = [s.get("url") for d in c.datasets for s in d.services if s.get("service").lower() == "opendap"]
Let's view the first three OPeNDAP endpoints:
print(urls_opendap[:3]) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ['http://dapds00.nci.org.au/thredds/dodsC/my80/AusLAMP/SA_AusLAMP_MT_Survey_Musgraves_APY_2016_to_2018/SA/Level_1_Concatenated_Resampled_Rotated_Time_Series_NetCDF/SA225-2_138_173.nc', 'http://dapds00.nci.org.au/thredds/dodsC/my80/AusLAMP/SA_AusLAMP_MT_Survey_Musgraves_APY_2016_to_2018/SA/Level_1_Concatenated_Resampled_Rotated_Time_Series_NetCDF/SA227_130_133.nc', 'http://dapds00.nci.org.au/thredds/dodsC/my80/AusLAMP/SA_AusLAMP_MT_Survey_Musgraves_APY_2016_to_2018/SA/Level_1_Concatenated_Resampled_Rotated_Time_Series_NetCDF/SA242_304_322.nc']
To see how many OPeNDAP endpoints our crawl picked up:
print(len(urls_opendap)) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 47
Let's define 2 example sites (SA271 and SA272) that we can take a look at:
SA271 = [i for i in urls_opendap if i.split('/')[-1].startswith('SA271')] SA272 = [i for i in urls_opendap if i.split('/')[-1].startswith('SA272')] print(SA271) print(SA272) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ['http://dapds00.nci.org.au/thredds/dodsC/my80/AusLAMP/SA_AusLAMP_MT_Survey_Musgraves_APY_2016_to_2018/SA/Level_1_Concatenated_Resampled_Rotated_Time_Series_NetCDF/SA271_134_172.nc'] ['http://dapds00.nci.org.au/thredds/dodsC/my80/AusLAMP/SA_AusLAMP_MT_Survey_Musgraves_APY_2016_to_2018/SA/Level_1_Concatenated_Resampled_Rotated_Time_Series_NetCDF/SA272_134_172.nc']
We can browse some of the metadata from the 2 example Level 1 Musgraves time series NetCDF files:
for opendap_example in SA271: print(opendap_example) f = Dataset(opendap_example, 'r') print(f.dimensions.keys()) print('site name is:', f.site_name) print('field recorded latitude is:', f.field_GPS_latitude_dec_deg) print('field recorded longitude is:', f.field_GPS_longitude_dec_deg) print('MT box case number is:', f.MT_box_case_number) print('magnetometer_type_model is',f.magnetometer_type_model) print('power_source_type is',f.power_source_type_or_model) print('') for opendap_example in SA272: print(opendap_example) f = Dataset(opendap_example, 'r') print(f.dimensions.keys()) print('site name is:', f.site_name) print('field recorded latitude is:', f.field_GPS_latitude_dec_deg) print('field recorded longitude is:', f.field_GPS_longitude_dec_deg) print('MT box case number is:', f.MT_box_case_number) print('magnetometer_type_model is',f.magnetometer_type_model) print('power_source_type is',f.power_source_type_or_model) print('') ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- http://dapds00.nci.org.au/thredds/dodsC/my80/AusLAMP/SA_AusLAMP_MT_Survey_Musgraves_APY_2016_to_2018/SA/Level_1_Concatenated_Resampled_Rotated_Time_Series_NetCDF/SA271_134_172.nc odict_keys(['bx', 'by', 'bz', 'ex', 'ey', 'time']) site name is: GVD field recorded latitude is: -27.49422 field recorded longitude is: 132.00921 MT box case number is: 18 magnetometer_type_model is Bartington, Model - Mag03MS70 (70 nano Tesla), Type - Fluxgate, 3 Component, Frequency response: 0 - 1kHz, Bandwidth: 0 - 3kHz power_source_type is 12Volt 72 Amp/Hr Battery, Power Supply Charging - Solar Panel, 60Watt http://dapds00.nci.org.au/thredds/dodsC/my80/AusLAMP/SA_AusLAMP_MT_Survey_Musgraves_APY_2016_to_2018/SA/Level_1_Concatenated_Resampled_Rotated_Time_Series_NetCDF/SA272_134_172.nc odict_keys(['bx', 'by', 'bz', 'ex', 'ey', 'time']) site name is: South of Illllinna field recorded latitude is: -27.49907 field recorded longitude is: 131.50538 MT box case number is: 5 magnetometer_type_model is Bartington, Model - Mag03MS70 (70 nano Tesla), Type - Fluxgate, 3 Component, Frequency response: 0 - 1kHz, Bandwidth: 0 - 3kHz power_source_type is 12Volt 72 Amp/Hr Battery, Power Supply Charging - Solar Panel, 60Watt
Let's ensure that the electromagnetic variables have the same shape:
f = Dataset(SA271[0],'r') vars = f.variables.keys() for item in vars: print('Variable: \t', item) print('Dimensions: \t', f[item].dimensions) print('Shape: \t', f[item].shape, '\n') ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Variable: ex Dimensions: ('ex',) Shape: (3369600,) Variable: ey Dimensions: ('ey',) Shape: (3369600,) Variable: bx Dimensions: ('bx',) Shape: (3369600,) Variable: by Dimensions: ('by',) Shape: (3369600,) Variable: bz Dimensions: ('bz',) Shape: (3369600,) Variable: time Dimensions: ('time',) Shape: (3369600,)
g = Dataset(SA272[0],'r') vars = g.variables.keys() for item in vars: print('Variable: \t', item) print('Dimensions: \t', g[item].dimensions) print('Shape: \t', g[item].shape, '\n') ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Variable: ex Dimensions: ('ex',) Shape: (3369600,) Variable: ey Dimensions: ('ey',) Shape: (3369600,) Variable: bx Dimensions: ('bx',) Shape: (3369600,) Variable: by Dimensions: ('by',) Shape: (3369600,) Variable: bz Dimensions: ('bz',) Shape: (3369600,) Variable: time Dimensions: ('time',) Shape: (3369600,)
Let's start visualising some of the time series data from SA271 and SA272:
ex_271 = f['ex'] ey_271 = f['ey'] bx_271 = f['bx'] by_271 = f['by'] time_271 = f['time'] ex_272 = g['ex'] ey_272 = g['ey'] bx_272 = g['bx'] by_272 = g['by'] time_272 = g['time'] ### checking 50000 points for ex and ey to see if the time series data looks reasonable a = 100000 b = 150000 plt.figure(figsize=(12,12)) plt.title("ex,ey time series for SA271", fontsize=18) plt.plot(time_271[a:b],ex_271[a:b]-np.mean(ex_271[a:b]),label = ex_271.long_name) plt.plot(time_271[a:b],ey_271[a:b]-np.mean(ey_271[a:b]),label = ey_271.long_name) plt.legend(loc = 'upper right',prop={'size': 16}) plt.xlabel(time_271.long_name+' ('+ time_271.units +') ', fontsize=16) plt.ylabel(ex_271.units, fontsize=16) plt.figure(figsize=(12,12)) plt.title("ex,ey time series for SA272", fontsize=18) plt.plot(time_272[a:b],ex_272[a:b]-np.mean(ex_272[a:b]),label = ex_272.long_name) plt.plot(time_272[a:b],ey_272[a:b]-np.mean(ey_272[a:b]),label = ey_272.long_name) plt.legend(loc = 'upper right',prop={'size': 16}) plt.xlabel(time_272.long_name+' ('+ time_272.units +') ', fontsize=16) plt.ylabel(ex_272.units, fontsize=16)
Let's check 50000 points of bx and by to see if the data looks reasonable:
plt.figure(figsize=(12,12)) plt.title("bx,by time series SA271", fontsize=18) plt.plot(time_271[a:b],bx_271[a:b]-np.mean(bx_271[a:b]),label = bx_271.long_name) plt.plot(time_271[a:b],by_271[a:b]-np.mean(by_271[a:b]),label = by_271.long_name) plt.legend(loc = 'upper right',prop={'size': 16}) plt.xlabel(time_271.long_name+' ('+ time_271.units +') ', fontsize=16) plt.ylabel(bx_271.units, fontsize=16) plt.figure(figsize=(12,12)) plt.title("bx,by time series SA272", fontsize=18) plt.plot(time_272[a:b],bx_272[a:b]-np.mean(bx_272[a:b]),label = bx_272.long_name) plt.plot(time_272[a:b],by_272[a:b]-np.mean(by_272[a:b]),label = by_272.long_name) plt.legend(loc = 'upper right',prop={'size': 16}) plt.xlabel(time_272.long_name+' ('+ time_272.units +') ', fontsize=16) plt.ylabel(bx_272.units, fontsize=16)
Checking 1000 points of ex and ey from SA271 and SA272:
a = 150000 b = 151000 plt.figure(figsize=(12,12)) plt.title("ex,ey time series SA271", fontsize=18) plt.plot(time_271[a:b],ex_271[a:b]-np.mean(ex_271[a:b]),label = ex_271.long_name) plt.plot(time_271[a:b],ey_271[a:b]-np.mean(ey_271[a:b]),label = ey_271.long_name) plt.legend(loc = 'upper right',prop={'size': 16}) plt.xlabel(time_271.long_name+' ('+ time_271.units +') ', fontsize=16) plt.ylabel(ex_271.units, fontsize=16) plt.figure(figsize=(12,12)) plt.title("ex,ey time series SA272", fontsize=18) plt.plot(time_272[a:b],ex_272[a:b]-np.mean(ex_272[a:b]),label = ex_272.long_name) plt.plot(time_272[a:b],ey_272[a:b]-np.mean(ey_272[a:b]),label = ey_272.long_name) plt.legend(loc = 'upper right',prop={'size': 16}) plt.xlabel(time_272.long_name+' ('+ time_272.units +') ', fontsize=16) plt.ylabel(ex_272.units, fontsize=16)
Now we can compare ex,ey,bx,by for stations SA271 and SA272:
a = 150000 b = 160000 fig, axs = plt.subplots(2, 2,figsize=(15,15)) axs[0, 0].plot(time_271[a:b], ex_271[a:b]-np.mean(ex_271[a:b]),'tab:orange') axs[0, 0].plot(time_272[a:b], ex_272[a:b]-np.mean(ex_272[a:b]),'tab:blue') axs[0, 0].set_title('ex time series SA271(orange) vs SA272(blue)') axs[0, 1].plot(time_271[a:b], ey_271[a:b]-np.mean(ey_271[a:b]), 'tab:orange') axs[0, 1].plot(time_272[a:b], ey_272[a:b]-np.mean(ey_272[a:b]),'tab:blue') axs[0, 1].set_title('ey time series SA271(orange) vs SA272(blue)') axs[1, 0].plot(time_271[a:b], bx_271[a:b]-np.mean(bx_271[a:b]), 'tab:orange') axs[1, 0].plot(time_272[a:b], bx_272[a:b]-np.mean(bx_272[a:b]), 'tab:blue') axs[1, 0].set_title('bx time series SA271(orange) vs SA272(blue)') axs[1, 1].plot(time_271[a:b], by_271[a:b]-np.mean(by_271[a:b]), 'tab:orange') axs[1, 1].plot(time_272[a:b], by_272[a:b]-np.mean(by_272[a:b]), 'tab:blue') axs[1, 1].set_title('by time series SA271(orange) vs SA272(blue)') for ax in axs.flat: ax.set(xlabel='time', ylabel='counts') # Hide x labels and tick labels for top plots and y ticks for right plots. for ax in axs.flat: ax.label_outer()
To close our plots:
plt.close('all')
Let's try and process some time series in the frequency domain using the Bounded Influence, Remote Reference Processing (BIRRP) code.
Note that for this example, we are using an NCI modified parallelised version of BIRRP that allows for electromagnetic field inputs from a NetCDF4 file.
### set this to the location of the NetCDF/mpi modified version of the BIRRP code birrp_location = '/g/data/up99/mt_apps/birrp_mpi/birrp_mpi' ### set this to a working directory where you would like to run BIRRP example working_directory = '/g/data/up99/sandbox/birrp_test/thredds_opendap' ### let's change into our working directory os.chdir(working_directory)
Let's run BIRRP on 2 million data points for this example. First we can define SA271 as the local site and SA272 as the remote reference:
start_point = 10 end_point = 2000010 EX = ex_271[start_point:end_point] EY = ey_271[start_point:end_point] BX = bx_271[start_point:end_point] BY = by_271[start_point:end_point] BXRR = bx_272[start_point:end_point] BYRR = by_272[start_point:end_point]
Now we can create a NetCDF file with the BIRRP variable inputs:
dataset_P = Dataset('SA271_rr_SA272.nc','w',format='NETCDF4') ### define netCDF dimensions ex = dataset_P.createDimension('ex', len(EX)) ey = dataset_P.createDimension('ey', len(EY)) bx = dataset_P.createDimension('bx', len(BX)) by = dataset_P.createDimension('by', len(BY)) bxrr = dataset_P.createDimension('bxrr', len(BXRR)) byrr = dataset_P.createDimension('byrr', len(BYRR)) ### define netCDF variables ex = dataset_P.createVariable('ex',np.float32,('ex',)) ex[:] = EX ey = dataset_P.createVariable('ey',np.float32,('ey',)) ey[:] = EY bx = dataset_P.createVariable('bx',np.float32,('bx',)) bx[:] = BX by = dataset_P.createVariable('by',np.float32,('by',)) by[:] = BY bxrr = dataset_P.createVariable('bxrr',np.float32,('bxrr',)) bxrr[:] = BXRR byrr = dataset_P.createVariable('byrr',np.float32,('byrr',)) byrr[:] = BYRR dataset_P.close()
Let's also define our BIRRP processing parameters (see the BIRRP manual for more information):
birrp_processing_parameters = """1 2 2 2 1 3 1 32768,2,9 3,1,3 y 2 0,0.0003,0.9999 0.7 0.0 0.003,1 SA271_rrSA272 0 0 1 10 0 0 2000000 0 ex@SA271_rr_SA272.nc 0 0 ey@SA271_rr_SA272.nc 0 0 bx@SA271_rr_SA272.nc 0 0 by@SA271_rr_SA272.nc 0 0 bxrr@SA271_rr_SA272.nc 0 0 byrr@SA271_rr_SA272.nc 0 180,90,180 0,90,0 0,90,0 """
We can write these BIRRP parameters to an input file for BIRRP:
birrp_input = open("birrp_processing_parameters","wt") n = birrp_input.write(birrp_processing_parameters) birrp_input.close()
Next we need to make a Gadi PBS job script to run BIRRP on our example file:
runjob = """ #!/bin/bash #PBS -q normal #PBS -P up99 #PBS -l walltime=00:01:00 #PBS -l ncpus=8 #PBS -l mem=2GB #PBS -l jobfs=1GB #PBS -l wd #PBS -N birrp_271_272 #PBS -l storage=gdata/up99+gdata/my80 module load openmpi/3.1.4 mpirun -np 8 --mca pml ob1 /g/data/up99/mt_apps/birrp_mpi/birrp_mpi < birrp_processing_parameters """ ### write PBS job script file runjob_input = open("runjob.ob1.sh", "wt") runjob_input.write(runjob) runjob_input.close()
In order to submit our job to the Gadi PBS queue, we will first need to launch the OOD Gadi Terminal App. Once logged in, we would need to navigate to our working directory where we created our job script.
Now we can submit our PBS job:
$ qsub runjob.ob1.sh
We can monitor our PBS job:
$ qstat --------------------------------------------------------------------------------- Job id Name User Time Use S Queue --------------------- ---------------- ---------------- -------- - ------------ 5843475.gadi-pbs birrp_271_272 abc123 0 Q normal-exec
Once the job has completed, we can check that our BIRRP output files have been created:
$ ls ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- birrp_271_272.e5843475 SA271_rrSA272.1r.2c2 SA271_rrSA272.2r.2c2 birrp_271_272.o5843475 SA271_rrSA272.1r2.rf SA271_rrSA272.2r2.rf birrp_processing_parameters SA271_rrSA272.1r2.rp SA271_rrSA272.2r2.rp runjob.ob1.sh SA271_rrSA272.1r2.tf SA271_rrSA272.2r2.tf SA271_rrSA272.1n.1c2 SA271_rrSA272.2n.1c2 SA271_rrSA272.diag.000 SA271_rrSA272.1n1.rf SA271_rrSA272.2n1.rf SA271_rrSA272.diag.001 SA271_rrSA272.1n1.rp SA271_rrSA272.2n1.rp SA271_rrSA272.diag.002 SA271_rrSA272.1n1.tf SA271_rrSA272.2n1.tf SA271_rrSA272.diag.003 SA271_rrSA272.1n.2c2 SA271_rrSA272.2n.2c2 SA271_rrSA272.diag.004 SA271_rrSA272.1n2.rf SA271_rrSA272.2n2.rf SA271_rrSA272.diag.005 SA271_rrSA272.1n2.rp SA271_rrSA272.2n2.rp SA271_rrSA272.diag.006 SA271_rrSA272.1n2.tf SA271_rrSA272.2n2.tf SA271_rrSA272.diag.007 SA271_rrSA272.1r.1c2 SA271_rrSA272.2r.1c2 SA271_rrSA272.j SA271_rrSA272.1r1.rf SA271_rrSA272.2r1.rf SA271_rr_SA272.nc SA271_rrSA272.1r1.rp SA271_rrSA272.2r1.rp SA271_rrSA272.r.cov SA271_rrSA272.1r1.tf SA271_rrSA272.2r1.tf