Accessing MT time series data from THREDDS using OPeNDAP

The following tutorial demonstrates how to access magnetotelluric (MT) time series data stored as NetCDF4 files from the National Computational Infrastructure (NCI) THREDDS data server.

This tutorial is written as a Jupyter notebook, which can be accessed here. To run this notebook on NCI's ARE VDI app, you can use the dk92 NCI-data-analysis environment.

If you have access to NCI, to run the example in our notebooks:

Login to the ARE service
Launch a Virtual Desktop Interface (VDI) instance.
Open up a terminal
cp /g/data/dk92/notebooks/geophysics-mt/* <your local working directory>
cd <your local working directory>
module use /g/data/dk92/apps/Modules/modulefiles
module avail NCI-data-analysis
module load NCI-data-analysis_2021.06
To launch the Jupyter notebook from the terminal, run:
- Jupyter notebook MT_02_accessing_and_processing_time_series_data_from_THREDDS_using_OPeNDAP.ipynb

Describing the example

In this example, we analyse some MT time series data directly from the THREDDS server API as a network service rather than downloading and storing a local copy for processing. The example makes use of software called IOOS THREDDS crawler, as well as the Python3 library urllib.request.

Additionally, the following Python3 libraries are used by this notebooks, for which we have included links to their reference documentation:
netcdf4-python
matplotlib.pyplot
numpy
os

The following information is provided in the notebook, but may be easier to read for some web browsers rather than the notebook.

Import libraries

The following code block demonstrates how to import the various libraries.

from thredds_crawler.crawl import Crawl
import urllib.request 
import urllib
from netCDF4 import Dataset
import matplotlib.pyplot as plt 
import numpy as np
import os
%matplotlib inline

Crawl the NCI THREDDS Data Server AusLAMP Musgraves MT time series endpoint xml

For this example, we will first search the NCI Geonetwork Catalogue for the South Australian sites of the AusLAMP Level 1 Musgraves MT times series data. Let's navigate to THREDDS by clicking on "Open link" for the SA sites:

This will bring you to the following page:

https://dapds00.nci.org.au/thredds/catalog/my80/AusLAMP/SA_AusLAMP_MT_Survey_Musgraves_APY_2016_to_2018/SA/Level_1_Concatenated_Resampled_Rotated_Time_Series_NetCDF/catalog.html

To get the XML endpoint used for our IOOS crawler, simply replace "catalog.html" with "catalog.xml" in the THREDDS URL above:

https://dapds00.nci.org.au/thredds/catalog/my80/AusLAMP/SA_AusLAMP_MT_Survey_Musgraves_APY_2016_to_2018/SA/Level_1_Concatenated_Resampled_Rotated_Time_Series_NetCDF/catalog.xml

Now let's run the IOOS THREDDS crawler for all NetCDF files from this XML endpoint:

c = Crawl('http://dapds00.nci.org.au/thredds/catalog/my80/AusLAMP/SA_AusLAMP_MT_Survey_Musgraves_APY_2016_to_2018/SA/Level_1_Concatenated_Resampled_Rotated_Time_Series_NetCDF/catalog.xml', select=['.*nc'])

Note that if the crawl is taking a long time (more than 30 seconds), just stop and re-run the Jupyter notebook cell. Once the crawl is complete, we can use OPeNDAP to view the MT time series metadata and data. To do this, we need to first define all OPeNDAP ulrs:

urls_opendap = [s.get("url") for d in c.datasets for s in d.services if s.get("service").lower() == "opendap"]

Let's view the first three OPeNDAP endpoints:

print(urls_opendap[:3])

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
['http://dapds00.nci.org.au/thredds/dodsC/my80/AusLAMP/SA_AusLAMP_MT_Survey_Musgraves_APY_2016_to_2018/SA/Level_1_Concatenated_Resampled_Rotated_Time_Series_NetCDF/SA225-2_138_173.nc', 'http://dapds00.nci.org.au/thredds/dodsC/my80/AusLAMP/SA_AusLAMP_MT_Survey_Musgraves_APY_2016_to_2018/SA/Level_1_Concatenated_Resampled_Rotated_Time_Series_NetCDF/SA227_130_133.nc', 'http://dapds00.nci.org.au/thredds/dodsC/my80/AusLAMP/SA_AusLAMP_MT_Survey_Musgraves_APY_2016_to_2018/SA/Level_1_Concatenated_Resampled_Rotated_Time_Series_NetCDF/SA242_304_322.nc']

To see how many OPeNDAP endpoints our crawl picked up:

print(len(urls_opendap))

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
47

Let's define 2 example sites (SA271 and SA272) that we can take a look at:

SA271 = [i for i in urls_opendap if i.split('/')[-1].startswith('SA271')]
SA272 = [i for i in urls_opendap if i.split('/')[-1].startswith('SA272')]
print(SA271)
print(SA272)

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
['http://dapds00.nci.org.au/thredds/dodsC/my80/AusLAMP/SA_AusLAMP_MT_Survey_Musgraves_APY_2016_to_2018/SA/Level_1_Concatenated_Resampled_Rotated_Time_Series_NetCDF/SA271_134_172.nc']
['http://dapds00.nci.org.au/thredds/dodsC/my80/AusLAMP/SA_AusLAMP_MT_Survey_Musgraves_APY_2016_to_2018/SA/Level_1_Concatenated_Resampled_Rotated_Time_Series_NetCDF/SA272_134_172.nc']

We can browse some of the metadata from the 2 example Level 1 Musgraves time series NetCDF files:

for opendap_example in SA271:
    print(opendap_example)
    f = Dataset(opendap_example, 'r')
    print(f.dimensions.keys())
    print('site name is:', f.site_name)
    print('field recorded latitude is:', f.field_GPS_latitude_dec_deg)
    print('field recorded longitude is:', f.field_GPS_longitude_dec_deg)   
    print('MT box case number is:', f.MT_box_case_number)
    print('magnetometer_type_model is',f.magnetometer_type_model)
    print('power_source_type is',f.power_source_type_or_model)
    print('')

for opendap_example in SA272:
    print(opendap_example)
    f = Dataset(opendap_example, 'r')
    print(f.dimensions.keys())
    print('site name is:', f.site_name)
    print('field recorded latitude is:', f.field_GPS_latitude_dec_deg)
    print('field recorded longitude is:', f.field_GPS_longitude_dec_deg)    
    print('MT box case number is:', f.MT_box_case_number)
    print('magnetometer_type_model is',f.magnetometer_type_model)
    print('power_source_type is',f.power_source_type_or_model)
    print('')

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
http://dapds00.nci.org.au/thredds/dodsC/my80/AusLAMP/SA_AusLAMP_MT_Survey_Musgraves_APY_2016_to_2018/SA/Level_1_Concatenated_Resampled_Rotated_Time_Series_NetCDF/SA271_134_172.nc
odict_keys(['bx', 'by', 'bz', 'ex', 'ey', 'time'])
site name is: GVD
field recorded latitude is: -27.49422
field recorded longitude is: 132.00921
MT box case number is: 18
magnetometer_type_model is Bartington, Model - Mag03MS70  (70 nano Tesla), Type - Fluxgate, 3 Component, Frequency response: 0 - 1kHz, Bandwidth: 0 - 3kHz
power_source_type is 12Volt 72 Amp/Hr Battery, Power Supply Charging - Solar Panel, 60Watt

http://dapds00.nci.org.au/thredds/dodsC/my80/AusLAMP/SA_AusLAMP_MT_Survey_Musgraves_APY_2016_to_2018/SA/Level_1_Concatenated_Resampled_Rotated_Time_Series_NetCDF/SA272_134_172.nc
odict_keys(['bx', 'by', 'bz', 'ex', 'ey', 'time'])
site name is: South of Illllinna
field recorded latitude is: -27.49907
field recorded longitude is: 131.50538
MT box case number is: 5
magnetometer_type_model is Bartington, Model - Mag03MS70  (70 nano Tesla), Type - Fluxgate, 3 Component, Frequency response: 0 - 1kHz, Bandwidth: 0 - 3kHz
power_source_type is 12Volt 72 Amp/Hr Battery, Power Supply Charging - Solar Panel, 60Watt

Let's ensure that the electromagnetic variables have the same shape:

f = Dataset(SA271[0],'r')
vars = f.variables.keys()
for item in vars:
    print('Variable: \t', item)
    print('Dimensions: \t', f[item].dimensions)
    print('Shape:    \t', f[item].shape, '\n')

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Variable: 	 	ex
Dimensions: 	('ex',)
Shape:    	 	(3369600,) 

Variable: 	 	ey
Dimensions: 	('ey',)
Shape:    	 	(3369600,) 

Variable: 	 	bx
Dimensions: 	('bx',)
Shape:    	 	(3369600,) 

Variable: 	 	by
Dimensions: 	('by',)
Shape:    	 	(3369600,) 

Variable: 	 	bz
Dimensions: 	('bz',)
Shape:    	 	(3369600,) 

Variable: 	 	time
Dimensions: 	('time',)
Shape:    	    (3369600,)

g = Dataset(SA272[0],'r')
vars = g.variables.keys()
for item in vars:
    print('Variable: \t', item)
    print('Dimensions: \t', g[item].dimensions)
    print('Shape:    \t', g[item].shape, '\n')

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Variable: 	 	ex
Dimensions: 	('ex',)
Shape:    	 	(3369600,) 

Variable: 	 	ey
Dimensions: 	('ey',)
Shape:    	 	(3369600,) 

Variable: 	 	bx
Dimensions: 	('bx',)
Shape:    	 	(3369600,) 

Variable: 	 	by
Dimensions: 	('by',)
Shape:    	 	(3369600,) 

Variable: 	 	bz
Dimensions: 	('bz',)
Shape:    	 	(3369600,) 

Variable: 	 	time
Dimensions: 	('time',)
Shape:    	    (3369600,)

Let's start visualising some of the time series data from SA271 and SA272:

ex_271 = f['ex']
ey_271 = f['ey']
bx_271 = f['bx']
by_271 = f['by']
time_271 = f['time']

ex_272 = g['ex']
ey_272 = g['ey']
bx_272 = g['bx']
by_272 = g['by']
time_272 = g['time']

### checking 50000 points for ex and ey to see if the time series data looks reasonable
a = 100000
b = 150000
plt.figure(figsize=(12,12))
plt.title("ex,ey time series for SA271", fontsize=18)
plt.plot(time_271[a:b],ex_271[a:b]-np.mean(ex_271[a:b]),label = ex_271.long_name)
plt.plot(time_271[a:b],ey_271[a:b]-np.mean(ey_271[a:b]),label = ey_271.long_name)
plt.legend(loc = 'upper right',prop={'size': 16})
plt.xlabel(time_271.long_name+' ('+ time_271.units +') ', fontsize=16)
plt.ylabel(ex_271.units, fontsize=16)

plt.figure(figsize=(12,12))
plt.title("ex,ey time series for SA272", fontsize=18)
plt.plot(time_272[a:b],ex_272[a:b]-np.mean(ex_272[a:b]),label = ex_272.long_name)
plt.plot(time_272[a:b],ey_272[a:b]-np.mean(ey_272[a:b]),label = ey_272.long_name)
plt.legend(loc = 'upper right',prop={'size': 16})
plt.xlabel(time_272.long_name+' ('+ time_272.units +') ', fontsize=16)
plt.ylabel(ex_272.units, fontsize=16)

Let's check 50000 points of bx and by to see if the data looks reasonable:

plt.figure(figsize=(12,12))
plt.title("bx,by time series SA271", fontsize=18)
plt.plot(time_271[a:b],bx_271[a:b]-np.mean(bx_271[a:b]),label = bx_271.long_name)
plt.plot(time_271[a:b],by_271[a:b]-np.mean(by_271[a:b]),label = by_271.long_name)
plt.legend(loc = 'upper right',prop={'size': 16})
plt.xlabel(time_271.long_name+' ('+ time_271.units +') ', fontsize=16)
plt.ylabel(bx_271.units, fontsize=16)

plt.figure(figsize=(12,12))
plt.title("bx,by time series SA272", fontsize=18)
plt.plot(time_272[a:b],bx_272[a:b]-np.mean(bx_272[a:b]),label = bx_272.long_name)
plt.plot(time_272[a:b],by_272[a:b]-np.mean(by_272[a:b]),label = by_272.long_name)
plt.legend(loc = 'upper right',prop={'size': 16})
plt.xlabel(time_272.long_name+' ('+ time_272.units +') ', fontsize=16)
plt.ylabel(bx_272.units, fontsize=16)

Checking 1000 points of ex and ey from SA271 and SA272:

a = 150000
b = 151000

plt.figure(figsize=(12,12))
plt.title("ex,ey time series SA271", fontsize=18)
plt.plot(time_271[a:b],ex_271[a:b]-np.mean(ex_271[a:b]),label = ex_271.long_name)
plt.plot(time_271[a:b],ey_271[a:b]-np.mean(ey_271[a:b]),label = ey_271.long_name)
plt.legend(loc = 'upper right',prop={'size': 16})
plt.xlabel(time_271.long_name+' ('+ time_271.units +') ', fontsize=16)
plt.ylabel(ex_271.units, fontsize=16)

plt.figure(figsize=(12,12))
plt.title("ex,ey time series SA272", fontsize=18)
plt.plot(time_272[a:b],ex_272[a:b]-np.mean(ex_272[a:b]),label = ex_272.long_name)
plt.plot(time_272[a:b],ey_272[a:b]-np.mean(ey_272[a:b]),label = ey_272.long_name)
plt.legend(loc = 'upper right',prop={'size': 16})
plt.xlabel(time_272.long_name+' ('+ time_272.units +') ', fontsize=16)
plt.ylabel(ex_272.units, fontsize=16)

Now we can compare ex,ey,bx,by for stations SA271 and SA272:

a = 150000
b = 160000

fig, axs = plt.subplots(2, 2,figsize=(15,15))
axs[0, 0].plot(time_271[a:b], ex_271[a:b]-np.mean(ex_271[a:b]),'tab:orange')
axs[0, 0].plot(time_272[a:b], ex_272[a:b]-np.mean(ex_272[a:b]),'tab:blue')
axs[0, 0].set_title('ex time series SA271(orange) vs SA272(blue)')
axs[0, 1].plot(time_271[a:b], ey_271[a:b]-np.mean(ey_271[a:b]), 'tab:orange')
axs[0, 1].plot(time_272[a:b], ey_272[a:b]-np.mean(ey_272[a:b]),'tab:blue')
axs[0, 1].set_title('ey time series SA271(orange) vs SA272(blue)')
axs[1, 0].plot(time_271[a:b], bx_271[a:b]-np.mean(bx_271[a:b]), 'tab:orange')
axs[1, 0].plot(time_272[a:b], bx_272[a:b]-np.mean(bx_272[a:b]), 'tab:blue')
axs[1, 0].set_title('bx time series SA271(orange) vs SA272(blue)')
axs[1, 1].plot(time_271[a:b], by_271[a:b]-np.mean(by_271[a:b]), 'tab:orange')
axs[1, 1].plot(time_272[a:b], by_272[a:b]-np.mean(by_272[a:b]), 'tab:blue')
axs[1, 1].set_title('by time series SA271(orange) vs SA272(blue)')

for ax in axs.flat:
    ax.set(xlabel='time', ylabel='counts')

# Hide x labels and tick labels for top plots and y ticks for right plots.
for ax in axs.flat:
    ax.label_outer()

To close our plots:

plt.close('all')

Let's try and process some time series in the frequency domain using the Bounded Influence, Remote Reference Processing (BIRRP) code.

Note that for this example, we are using an NCI modified parallelised version of BIRRP that allows for electromagnetic field inputs from a NetCDF4 file.

### set this to the location of the NetCDF/mpi modified version of the BIRRP code
birrp_location = '/g/data/up99/mt_apps/birrp_mpi/birrp_mpi'

### set this to a working directory where you would like to run BIRRP example 
working_directory = '/g/data/up99/sandbox/birrp_test/thredds_opendap'

### let's change into our working directory
os.chdir(working_directory)

Let's run BIRRP on 2 million data points for this example. First we can define SA271 as the local site and SA272 as the remote reference:

start_point = 10
end_point = 2000010

EX = ex_271[start_point:end_point]
EY = ey_271[start_point:end_point]
BX = bx_271[start_point:end_point]
BY = by_271[start_point:end_point]
BXRR = bx_272[start_point:end_point]
BYRR = by_272[start_point:end_point]

Now we can create a NetCDF file with the BIRRP variable inputs:

dataset_P = Dataset('SA271_rr_SA272.nc','w',format='NETCDF4')

### define netCDF dimensions
ex = dataset_P.createDimension('ex', len(EX))
ey = dataset_P.createDimension('ey', len(EY))
bx = dataset_P.createDimension('bx', len(BX))
by = dataset_P.createDimension('by', len(BY))
bxrr = dataset_P.createDimension('bxrr', len(BXRR))
byrr = dataset_P.createDimension('byrr', len(BYRR))

### define netCDF variables
ex = dataset_P.createVariable('ex',np.float32,('ex',))
ex[:] = EX

ey = dataset_P.createVariable('ey',np.float32,('ey',))
ey[:] = EY

bx = dataset_P.createVariable('bx',np.float32,('bx',))
bx[:] = BX

by = dataset_P.createVariable('by',np.float32,('by',))
by[:] = BY

bxrr = dataset_P.createVariable('bxrr',np.float32,('bxrr',))
bxrr[:] = BXRR

byrr = dataset_P.createVariable('byrr',np.float32,('byrr',))
byrr[:] = BYRR

dataset_P.close()

Let's also define our BIRRP processing parameters (see the BIRRP manual for more information):

birrp_processing_parameters = """1
2
2
2
1
3
1
32768,2,9
3,1,3
y
2
0,0.0003,0.9999
0.7
0.0
0.003,1
SA271_rrSA272
0
0
1
10
0
0
2000000
0
ex@SA271_rr_SA272.nc
0
0
ey@SA271_rr_SA272.nc
0
0
bx@SA271_rr_SA272.nc
0
0
by@SA271_rr_SA272.nc
0
0
bxrr@SA271_rr_SA272.nc
0
0
byrr@SA271_rr_SA272.nc
0
180,90,180
0,90,0
0,90,0
"""

We can write these BIRRP parameters to an input file for BIRRP:

birrp_input = open("birrp_processing_parameters","wt")
n = birrp_input.write(birrp_processing_parameters)
birrp_input.close()

Next we need to make a Gadi PBS job script to run BIRRP on our example file:

runjob = """
#!/bin/bash
#PBS -q normal
#PBS -P up99
#PBS -l walltime=00:01:00   
#PBS -l ncpus=8   
#PBS -l mem=2GB   
#PBS -l jobfs=1GB  
#PBS -l wd
#PBS -N birrp_271_272 
#PBS -l storage=gdata/up99+gdata/my80

module load openmpi/3.1.4

mpirun -np 8 --mca pml ob1 /g/data/up99/mt_apps/birrp_mpi/birrp_mpi  < birrp_processing_parameters 
"""

### write PBS job script file

runjob_input = open("runjob.ob1.sh", "wt")
runjob_input.write(runjob)
runjob_input.close()

In order to submit our job to the Gadi PBS queue, we will first need to launch the OOD Gadi Terminal App. Once logged in, we would need to navigate to our working directory where we created our job script.

Now we can submit our PBS job:

$ qsub runjob.ob1.sh

We can monitor our PBS job:

$ qstat


---------------------------------------------------------------------------------
Job id                 Name             User              Time Use S Queue
---------------------  ---------------- ----------------  -------- - ------------
5843475.gadi-pbs       birrp_271_272    abc123                   0 Q normal-exec

Once the job has completed, we can check that our BIRRP output files have been created:

$ ls

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
birrp_271_272.e5843475	     SA271_rrSA272.1r.2c2  SA271_rrSA272.2r.2c2
birrp_271_272.o5843475	     SA271_rrSA272.1r2.rf  SA271_rrSA272.2r2.rf
birrp_processing_parameters  SA271_rrSA272.1r2.rp  SA271_rrSA272.2r2.rp
runjob.ob1.sh		     	 SA271_rrSA272.1r2.tf  SA271_rrSA272.2r2.tf
SA271_rrSA272.1n.1c2	     SA271_rrSA272.2n.1c2  SA271_rrSA272.diag.000
SA271_rrSA272.1n1.rf	     SA271_rrSA272.2n1.rf  SA271_rrSA272.diag.001
SA271_rrSA272.1n1.rp	     SA271_rrSA272.2n1.rp  SA271_rrSA272.diag.002
SA271_rrSA272.1n1.tf	     SA271_rrSA272.2n1.tf  SA271_rrSA272.diag.003
SA271_rrSA272.1n.2c2	     SA271_rrSA272.2n.2c2  SA271_rrSA272.diag.004
SA271_rrSA272.1n2.rf	     SA271_rrSA272.2n2.rf  SA271_rrSA272.diag.005
SA271_rrSA272.1n2.rp	     SA271_rrSA272.2n2.rp  SA271_rrSA272.diag.006
SA271_rrSA272.1n2.tf	     SA271_rrSA272.2n2.tf  SA271_rrSA272.diag.007
SA271_rrSA272.1r.1c2	     SA271_rrSA272.2r.1c2  SA271_rrSA272.j
SA271_rrSA272.1r1.rf	     SA271_rrSA272.2r1.rf  SA271_rr_SA272.nc
SA271_rrSA272.1r1.rp	     SA271_rrSA272.2r1.rp  SA271_rrSA272.r.cov
SA271_rrSA272.1r1.tf	     SA271_rrSA272.2r1.tf

Page tree

Accessing MT time series data from THREDDS using OPeNDAP

Describing the example

Import libraries

Crawl the NCI THREDDS Data Server AusLAMP Musgraves MT time series endpoint xml

Let's try and process some time series in the frequency domain using the Bounded Influence, Remote Reference Processing (BIRRP) code.