MINOS computing on the OSG

Screen Shot 2016-06-06 at 9.30.49 PMComputing in the MINOS/MINOS+ experiment has evolved greatly in the eleven years since it started taking data in the NuMI beam (April 2005). The scale has increased from the 50 core FNALU batch system to the 15000 cores of Fermigrid/GPGrid.

As MINOS prepares to stop taking data at the 2016 Fermilab summer shutdown, another change will be the use of Fermibatch jobsub tools for opportunistic use of the Open Science Grid offsite.

Use of remote resources is not new to MINOS. Monte Carlo data has always been generated by collaborating institutions outside Fermilab with eight sites participating over the years. Tarfiles of code were copied to each site and submitted locally at the sites.

In 2009, MINOS used the U.T. Austin TACC facility to double the reconstruction capacity for a special analysis. Again this was a somewhat manual process, with a lot of effort going into setting up the TACC resources and moving data.

The latest change has been the use of Fermibatch jobub submission and monitoring tools for access to OSG and the use of ifdhc for access to the data. This follows the pioneering examples of the Mu2e and NOvA experiments running on OSG sites in 2015. In May 2016, MINOS started running at the Caltech and Michigan sites with more sites to follow soon, including an OSG gateway to XSEDE resources at TACC.

Key elements to being ready to run on the OSG have been :

  • Running from code in /cvmfs/minos.opensciencegrid.org
  • Removing all dependences on /grid/data, /grid/fermiapp, /minos/data  and /minos/app
  • Using ifdh cp to move data to and from the worker node
  • Keeping CPU efficiency high. Historically around 90% on and offsite for MINOS.
  • Keeping memory use under 2 GBytes, for access to more workers
  • Keeping runtime under 8 hours, for access to more workers
  • Including with the MINOS libraries a set of system shared libraries typically missing on workers. MINOS already had  most of these available, as it builds on SLF5 and runs on SLF6 on Fermigrid. A few more were needed at other sites.

These are all things that are valuable locally on Fermigrid. The additional resources gained from the OSG provided incentive to do the restructuring a bit earlier.

Thanks for all the cores, OSG!

— Arthur Kreymer