How to locate and manage large files on the IBM i

Formulating a strategy for reorganizing physical files and performing purges is a must. Database management is often overlooked but is an integral part of maintaining the IBM i.

There are many factors that need to be considered when fine-tuning a system. The purpose of this article does not include documenting all of these concerns. However, proper use of reorganizations and purges can help form a solid foundation for effective data management.

Certain physical files, by definition, will have a lot of deleted records. For example, open records are often moved to history files after they are processed. Often sales orders, purchase orders, GL batches, and many other situations require records to be deleted on an ongoing basis. In a typical IBM i application physical files rarely have the 'reuse deleted records' flag set. Records are not deleted but only removed from an index. The DASD allotted is not freed-up until a "reorganize physical file member" (RGZPFM) is performed.

First there is a need to locate and identify the largest files. There are a couple of fairly easy ways to do this. One way is to display object descriptions to an outfile and then run a query over that outfile selecting objects based on an arbitrary size.

DSPOBJD OBJ(MYLIB/*ALL) OBJTYPE(*FILE) OUTPUT(*OUTFILE) OUTFILE(MYLIB/FILESIZES)

In this example the outfile "filesizes" will be created automatically. Note the object type is *file but there may be other large objects with different object types. Using OBJTYPE(*ALL) will catch everything if other object types are a concern.

RUNQRY *N MYLIB/FILESIZES RCDSLT(*YES)

With the above query select ODOBSZ greater than whatever size limit you wish, 10,000,000 roughly 10MB, or whatever.

Another method is to use PDM work with objects, or WRKOBJPDM and select F17=Subset. The subset screen allows selection by object size. This method is preferable because no outfile is required. You can confirm the size of an individual object by using the display file description command and reviewing the last page which shows size and member information.



DSPFD file-name

Beware that some files are "memberized" (having multiple members) but usually this is not the case. Normally most physical files will have only one member, which is commonly referred to as the first member or *first. If multiple members exist, old obsolete members can be removed with the command RMVM which accepts wild-cards. For example, to remove all the members from file-name that being with the letter W;

RMVM FILE(file-name) MBR(W*)

Custom purge programs are fairly easy to create. Commonly purges are written using a cut-off date based on how many years of data the organization is required to keep. Most managers will happily agree to remove data that is over ten years old. Occasionally SQL can be used for purging but always create a backup copy prior to purging.

Depending on the record length, system activity, and the number of deleted records purges and reorgs can take quite a while to run and RGZPFM requires exclusive use of the object. Preferable a plan is put into place where certain files are purged and/or reorganized after hours possibly from the job scheduler.

RGZPFM file-name

Ignore this subject at the peril of the entire system! This is an issue that will not go away if left unattended and may only be compounded over time, sort of a "pay me now or pay me later" situation. DASD can be recovered and applications can run more efficiently by having a strategy to manage mainly just the largest of files (for example, the five or ten largest files in each library). The medium size and smaller files tend not to be much of a concern.