Best practices for experiment database access

A well-designed database can be a strong workhorse for an experiment.  However, if not built for the future that workhorse will age and become a detriment to analysis. As experiments scale up production during their lifecycles, adding more and faster CPUs, they require the same level of performance from the database. But that database can only carry so large a load after which performance drops and drops fast.  Planning must be done at the start on how to maintain the performance of the database throughout its life. Here are a few basic ways to help achieve this.

Database indexes are a primary way to increase performance. An index is a data structure that improves the speed of data retrieval operations on a database table. Every table should be properly indexed; this is more than just an index on the primary key. You must consider the patterns of expected queries and index accordingly. Remember, all queries not using indexes require every record to be pulled from the disk and be examined. That is very expensive. Use indexes; save a hard drive.

Provide database access through a server; do not give every client direct access to your database. Every client database connection creates a process on the server, and submitting a few thousand jobs to Jobsub with every job connecting to the database is a lovely way to sink a database. Wrap the database with a server to protect your database from exuberant users.

Use a cache. Structure your queries to be cacheable. For example, IFBeam normalizes queries on start and end times. The results are cached and all further identical requests are delivered from the cache. Caching those results has reduced database access thereby increasing the databases responsiveness.

Consider using a cluster of mirrored databases. With multiple identical databases available the load can be spread. Multiple servers can also be added as the load grows.

Finally, Scientific Computing Services provides knowledgeable personnel who are glad to assist in the design and access of databases.

–Stephen P. White