Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

...

Using OS level HSM tools

Though DCM4CHEE can interface HSM systems using OS level tools out of the box, you will need to change some configuration parameters to adjust it to your environment and get it running.

This document assumes that you already have DCM4CHEE running, and you can send DICOM images to DCM4CHEE and query-retrieve them.

First you'll have to define a separate directory to put files to be migrated by HSM tools. If you plan to deal with a big amount of DICOM data, then it is advised to have a separate partition on physically separate disk or RAID system for HSM migrations. This will reduce the IO contention on the disks and increase the throughput of the system.

I will use here the UNIX notation to make things clearer and to be consistent with the DCM4CHEE's internal representation. Let say you have a partition mounted as /hsm and it will be used for HSM migrations.

  1. Add this partition as a file system to DCM4CHEE. In the JMX console open FileSystemMgt MBean view. Scroll down to addFileSystem() method and invoke it with the parameters: dirPath => tar:/hsm, aet => YOUR_DCM4CHEE_INSTALLATION_AET, availability => NEARLINE, status => RO, user info => SOME_INFO_FOR_YOUR_REFERENCE. Here the tar: prefix for the directory path is due to bug in FileCopy service, which, I assume, will be fixed soon. Use at the moment a workaround by prefixing the directory path as shown above.
  2. On the top of FileSystemMgt view, in the configuration parameters part, adjust the clean-up rules: set DeleteLocalStudiesCopyAvailable to true, ValidFileStatus to ARCHIVED, StudyAgeForDeletion to something like 52w. Here w means weeks, you can use h - hours and d - days as well.
    FileSystemMgt cleans up main file system in intervals shown in FreeDiskSpaceInterval parameter. By default it is 5m, change it to a longer period to reduce the contention on DB and disks. Files are copied by the FileCopy service. After a file was copied to the HSM partition, FileCopy service doesn't touch the original file entry in the database, but adds a new entry for it. This entry will have a different status (see below) and a different file_path - something like <SOME_PATH>.tar!<PATH_TO_FILE_IN_TAR>. Don't forget to add "tar:" prefix to your destination file system (tar:/hsm) in FileCopyService, otherwise files won't be packed into a tar file.
  3. HSM software migrates it to tape (or whatever you have it set up to do).
  4. At this stage SyncFileStatus comes into play. It checks files in your original filesystem (/archive) querying HSM and marks them as ARCHIVED - 2, if it was successful. Query is done using "Command" parameter in the SyncFileStatus configuration and checking the result against the regular expression given in "Pattern" field of the SyncFileStatus configuration. It is environment specific and you'll have to change them according the tools you use for HSM.
  5. FileSytemMgt service checks file system using intervals from it's configuration and deletes files depending on rules you showed in the configuration. In your case you'll need to change DeleteLocalStudiesCopyAvailable to "true", ValidFileStatus to "ARCHIVED" and study age for deletion in StudyAgeForDeletion to something like "52w" (means 52 weeks, you can use h - hours, d - days as well). It checks all files older than the given age and if it can find a copy entry in DB with the status ARCHIVED then it deletes the original file and deletes the DB entry for the original file. At this stage in DB you'll have only one entry for the file with file_path something like <SOME_PATH>.tar!<PATH_TO_FILE_IN_TAR>
  6. Then QueryRetrieveScpService will use TarRetriever to retrieve files on demand. TarRetriever uses TarFetchCommand config parameter to retrieve files from an external system (tapes), but, I guess, in your case your HSM agent will retrieve tar-files when TarRetriever tries to access them. If so, then leave TarFetchCommand as NONE.

...

  1. . This is used by FileSystemMgt service. During the clean-up session it looks for the files older than the given age and if it can find the mentioned copy entry in the database with the status ARCHIVED, it deletes the original file and it's original entry from the database.
  2. In FileCopy MBean view you'll have to adjust the following parameters: set DestinationFileSystem to tar:/hsm, FileStatus to TO_ARCHIVE, VerifyCopy to true. This will tell the FileCopy service to pack files into a tar archive, verify MD5 sums of copied files, save the tar file under /hsm, add a copy DB entry for an each file in the tar archive and change it's status TO_ARCHIVE. At this point HSM tools step in. Depending on your environment you might have a transparent HSM migration tool or a command line tool to migrate a given file to tapes or other long term storage. If you have a transparent HSM migration agent, then configure it to migrate all files under /hsm to your long term storage. If you don't have a transparent migration tool, then use TarCopyCommand and TarOutgoingDirectory config parameters of the FileCopy service to invoke an HSM migration command after files were packed into a tar file.
    FileCopy service will reschedule file copy orders if by some reason they fail. Number and the interval of retries can be changed in RetryIntervals parameter.
  3. Next you'll have to change SyncFileStatus MBean configuration. MonitoredFileSystem is your original main file system path, where all files sent to your server land. Change Command according to HSM query tools provided by your environment. Pattern is the regular expression to check the output of the Command. It will also depend on your environment - on the response of HSM query. Change TaskInterval to adjust when and how often you'd like to run file status checks.

If everything is done correctly, then as soon as your server receives files it will schedule a file copy order. FileCopy service will put them into a tar archive and will trigger an HSM migration. At this moment you'll have a copy of the files in the long term storage as well as in your online storage. Also you'll have a doubled DB entries for each file, but with a different file_path and status. Depending on the intervals file status checks will be invoked and for the successful ones files will be marked as ARCHIVED. FileSystemMgt will take care of cleaning up your online storage and will delete old successfully archived files. When somebody will try to access the archived files, invoking a C-MOVE request, QueryRetrieveScpService will use TarRetriever to retrieve files. If you have a transparent HSM, then the only thing is to tell TarRetriever which directory to use as a CacheRoot and leave TarFetchCommand as NONE. The rest will be transparent: your HSM will bring files back from the long term storage, when TarRetriever will try to access them. If your HSM is not transparent, then use TarFetchCommand to invoke an HSM retrieve command.

Configuration

HSM Service Configuration