Blog

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

echo "   if [ \"x${md5}\" != \"x\" ]" >> $oFile
echo "   then" >> $oFile
echo "      case \$chksum_type in" >> $oFile
echo "         md5)" >> $oFile
echo "             newMd5=\`md5sum ${lfn} | cut -d ' ' -f1\`;;" >> $oFile
echo "         sha256)" >> $oFile
echo "             newMd5=\`sha256sum ${lfn} | cut -d ' ' -f1\`;;" >> $oFile
echo "      esac" >> $oFile
echo "    " >> $oFile
echo "   if [ \"\${newMd5}\" != \"${md5}\" ]" >> $oFile
echo "   then" >> $oFile
.....

Changes in servers' names

My version of the bash scripts now works again fine and can download files, the problem is that some of the servers have changed name and consequently the url in which we relied to identify if the file was already in the tree area. Initially I've modified my scripts to make sure for example that if the server was aims3.llnl.gov the script will also check files in pcmdi3{7/9}.llnl.gov . Then I noticed that dkrz changed server name too and handling all these exceptions it's too messy.

Not to say that PCMDI is not using the proper ensemble version but directories named 1/2/3 etc instead. Since the use of elasticsearch has been momentarily suspended, I'm looking again at using the sqlite database to handle this.

So I'm starting today to adapt my python script to download the search for data, update information on database including new fields for md5 and sha256 checksum and finally download the data if it isn't there.

Clearly adding md5/sha256 info means that we need to have a row for each file and so probably I'll have to split the database in different "experiments" or try to use postgres or mysql. Any suggestion and or comment on this is very welcome.