Ways to improve your life: POMS updates

With an increasing demand from the production groups, the Production Operations Management System (POMS) is being extended to meet the Intensity Frontier (IF) experiments’ requirements for high scale production and distributed analysis processing.

Fig 1:The day-by-day spreadsheet for a specific campaign gives the experiment production group a glance at the number of the daily submitted jobs, requested input files and delivered output files for those jobs. A list of other useful information is also included, like the efficiency and the jobs exit codes, which are particularly important for debugging purposes.

Several experiments are using or have expressed interest in using POMS. NOvA is extensively tracking their entire production. LArIAT and MicroBooNE have adopted POMS for some of their data processing. g-2 just started with Monte Carlo tests; Adam Lyon, quadrant head of the Scientific Computing Division and senior scientist of the g-2 collaboration, says: “Muon g-2 is excited to be about to launch a major simulation generation effort with POMS. Its tracking, bookkeeping and interface are very appealing to us. We look forward to gaining more experience with it and enjoying its benefits for production campaigns.”

POMS assists the production processing of experiments, starting from grid job submission and proceeding through monitoring, automatic resubmission, failure triage and bookkeeping, thanks to the inter-operation with other systems like SAM and Fifemon.

Among many other features, POMS guides the user through the submission of jobs on the grid through a web interface. It allows users to run arbitrary executables/scripts and workflows while keeping track of the configuration used. This information can be used, for example, to recover from grid failures. (Fig 1)

Furthermore, the system has been designed in a way that it is possible to override some parameters without changing the configuration scripts. POMS also allows the scheduling of job submissions at specific dates and times through a crontab. This feature is particularly important for the daily processing of the data collected by the detector during the data-taking of the experiment, thus reducing the scheduling overhead of the experiment production groups. (Fig. 2)

Fig 2:The launch template module of POMS helps the user to build the crontab commands that serve to to run a production campaign automatically at a specified time and date

Monitoring is another important aspect as it provides the users with information on the progress of the grid jobs and the status of the data files. The display of output logs helps users to understand failures that can occur during grid submission and data processing, which can otherwise be a very time-consuming task and become a critical deadlock for small collaborations with limited manpower. (Fig. 3)

Fig 3:This table shows the status of the grid jobs and the output data for a specific campaign of an experiment.

We are working to develop new features required by future experiments like the Fermilab flagship experiment DUNE. A test for MC production processing for ProtoDUNE has already started and will continue processing data for analysis and presentation at conferences and collaboration meetings. ProtoDUNE, a crucial milestone for the DUNE experiment as it will test and validate the design and technologies of the far detector, has a very demanding schedule and production processing must be efficient.

New features include improvements of the web interface and monitoring through the integration of systems like Fifemon that monitor the HTCondor pools, data handling and storage systems, and other related systems.

An ongoing effort involves data management: POMS will support the model of sending data to jobs and the pre-staging of the input datasets at different local caches in order to make an efficient use of the Grid. Furthermore, large datasets will be automatically split into subsets to avoid the overload of grid resources.

The first major POMS release for the ProtoDUNE test beam experiment was deployed in January this year with another major release planned for this summer that will be focused on data management. Major features of this next release are included in the roadmap, which can be found here.

–Anna Mazzacane