Interfacing Hierarchical Storage Managment Systems

Overview

Using OS level HSM tools

Though DCM4CHEE can interface HSM systems using OS level tools out of the box, you will need to change some configuration parameters to adjust it to your environment and get it running.

This document assumes that you already have DCM4CHEE running, and you can send DICOM images to DCM4CHEE and query-retrieve them.

First you'll have to define a separate directory to put files to be migrated by HSM tools. If you plan to deal with a big amount of DICOM data, then it is advised to have a separate partition on physically separate disk or RAID system for HSM migrations. This will reduce the IO contention on the disks and increase the throughput of the system.

I will use here the UNIX notation to make things clearer and to be consistent with the DCM4CHEE's internal representation. Let say you have a partition mounted as /hsm and it will be used for HSM migrations.

Add this partition as a file system to DCM4CHEE. In the JMX console open FileSystemMgt MBean view. Scroll down to addFileSystem() method and invoke it with the parameters: dirPath => tar:/hsm, aet => YOUR_DCM4CHEE_INSTALLATION_AET, availability => NEARLINE, status => RO, user info => SOME_INFO_FOR_YOUR_REFERENCE. Here dirPath has to have the prefix tar: to match the DestinationFileSystem parameter in the FileCopy service. This prefix is used to tell FileCopy service to pack files into a tar file before copying.
On the top of FileSystemMgt view, in the configuration parameters part, adjust the clean-up rules: set DeleteLocalStudiesCopyAvailable to true, ValidFileStatus to ARCHIVED, StudyAgeForDeletion to something like 52w. Here w means weeks, you can use h - hours and d - days as well.
FileSystemMgt cleans up main file system in intervals shown in FreeDiskSpaceInterval parameter. By default it is 5m, change it to a longer period to reduce the contention on DB and disks. Files are copied by the FileCopy service. After a file was copied to the HSM partition, FileCopy service doesn't touch the original file entry in the database, but adds a new entry for it. This entry will have a different status (see below) and a different file_path - something like <SOME_PATH>.tar!<PATH_TO_FILE_IN_TAR>. This is used by FileSystemMgt service. During the clean-up session it looks for the files older than the given age and if it can find the mentioned copy entry in the database with the status ARCHIVED, it deletes the original file and it's original entry from the database.
In FileCopy MBean view you'll have to adjust the following parameters: set DestinationFileSystem to tar:/hsm, FileStatus to TO_ARCHIVE, VerifyCopy to true. This will tell the FileCopy service to pack files into a tar archive, verify MD5 sums of copied files, save the tar file under /hsm, add a copy DB entry for an each file in the tar archive and change it's status TO_ARCHIVE. At this point HSM tools step in. Depending on your environment you might have a transparent HSM migration tool or a command line tool to migrate a given file to tapes or other long term storage. If you have a transparent HSM migration agent, then configure it to migrate all files under /hsm to your long term storage. If you don't have a transparent migration tool, then use TarCopyCommand and TarOutgoingDirectory config parameters of the FileCopy service to invoke an HSM migration command after files were packed into a tar file.
FileCopy service will reschedule file copy orders if by some reason they fail. Number and the interval of retries can be changed in RetryIntervals parameter.
Next you'll have to change SyncFileStatus MBean configuration. MonitoredFileSystem is your original main file system path, where all files sent to your server are kept. Change Command according to HSM query tools provided by your environment. Pattern is the regular expression to check the output of the Command. It will also depend on your environment - on the response of HSM query. Change TaskInterval to adjust when and how often you'd like to run file status checks.

If everything is done correctly, then as soon as your server receives files it will schedule a file copy order. FileCopy service will put them into a tar archive and will trigger an HSM migration. At this moment you'll have a copy of the files in the long term storage as well as in your online storage. Also you'll have doubled DB entries for each file, but with a different file_path and status. Depending on the intervals, file status checks will be invoked and for the successful ones files will be marked as ARCHIVED. FileSystemMgt will take care of cleaning up your online storage and will delete old successfully archived files. When somebody will try to access the archived files, invoking a C-MOVE request, QueryRetrieveScpService will use TarRetriever to retrieve files. If you have a transparent HSM, then the only thing is to tell TarRetriever which directory to use as a CacheRoot and leave TarFetchCommand as NONE. The rest will be transparent: your HSM will bring files back from the long term storage, when TarRetriever will try to access them. If your HSM is not transparent, then use TarFetchCommand to invoke an HSM retrieve command.

Configuration

HSM Service Configuration