Interfacing Hierarchical Storage Managment Systems
Overview
Using OS level HSM tools
Though DCM4CHEE can interface HSM systems using OS level tools out of the box, you will need to change some configuration parameters to adjust it to your environment and get it running.
This document assumes that you already have DCM4CHEE running, and you can send DICOM images to DCM4CHEE and query-retrieve them.
First you'll have to define a separate directory to put files to be migrated by HSM tools. If you plan to deal with a big amount of DICOM data, then it is advised to have a separate partition on physically separate disk or RAID system for HSM migrations. This will reduce the IO contention on the disks and increase the throughput of the system.
I will use here the UNIX notation to make things clearer and to be consistent with the DCM4CHEE's internal representation. Let say you have a partition mounted as /hsm
and it will be used for HSM migrations.
- Add this partition as a file system to DCM4CHEE. In the JMX console open
FileSystemMgt
MBean view. Scroll down toaddFileSystem()
method and invoke it with the parameters:dirPath => tar:/hsm, aet => YOUR_DCM4CHEE_INSTALLATION_AET, availability => NEARLINE, status => RO, user info => SOME_INFO_FOR_YOUR_REFERENCE
. HeredirPath
has to have the prefixtar:
to match theDestinationFileSystem
parameter in theFileCopy
service. This prefix is used to tellFileCopy
service to pack files into a tar file before copying. - On the top of
FileSystemMgt
view, in the configuration parameters part, adjust the clean-up rules: setDeleteLocalStudiesCopyAvailable
totrue
,ValidFileStatus
toARCHIVED
,StudyAgeForDeletion
to something like52w
. Herew
means weeks, you can useh
- hours andd
- days as well.
FileSystemMgt
cleans up main file system in intervals shown inFreeDiskSpaceInterval
parameter. By default it is5m
, change it to a longer period to reduce the contention on DB and disks. Files are copied by theFileCopy
service. After a file was copied to the HSM partition,FileCopy
service doesn't touch the original file entry in the database, but adds a new entry for it. This entry will have a differentstatus
(see below) and a differentfile_path
- something like<SOME_PATH>.tar!<PATH_TO_FILE_IN_TAR>
. This is used byFileSystemMgt
service. During the clean-up session it looks for the files older than the given age and if it can find the mentioned copy entry in the database with the statusARCHIVED
, it deletes the original file and it's original entry from the database. - In
FileCopy
MBean view you'll have to adjust the following parameters: setDestinationFileSystem
totar:/hsm
,FileStatus
toTO_ARCHIVE
,VerifyCopy
totrue
. This will tell theFileCopy
service to pack files into a tar archive, verify MD5 sums of copied files, save the tar file under/hsm
, add a copy DB entry for an each file in the tar archive and change it's status TO_ARCHIVE. At this point HSM tools step in. Depending on your environment you might have a transparent HSM migration tool or a command line tool to migrate a given file to tapes or other long term storage. If you have a transparent HSM migration agent, then configure it to migrate all files under/hsm
to your long term storage. If you don't have a transparent migration tool, then useTarCopyCommand
andTarOutgoingDirectory
config parameters of theFileCopy
service to invoke an HSM migration command after files were packed into a tar file.
FileCopy
service will reschedule file copy orders if by some reason they fail. Number and the interval of retries can be changed inRetryIntervals
parameter. - Next you'll have to change
SyncFileStatus
MBean configuration.MonitoredFileSystem
is your original main file system path, where all files sent to your server are kept. ChangeCommand
according to HSM query tools provided by your environment.Pattern
is the regular expression to check the output of theCommand
. It will also depend on your environment - on the response of HSM query. ChangeTaskInterval
to adjust when and how often you'd like to run file status checks.
If everything is done correctly, then as soon as your server receives files it will schedule a file copy order. FileCopy
service will put them into a tar archive and will trigger an HSM migration. At this moment you'll have a copy of the files in the long term storage as well as in your online storage. Also you'll have doubled DB entries for each file, but with a different file_path
and status
. Depending on the intervals, file status checks will be invoked and for the successful ones files will be marked as ARCHIVED
. FileSystemMgt
will take care of cleaning up your online storage and will delete old successfully archived files. When somebody will try to access the archived files, invoking a C-MOVE
request, QueryRetrieveScpService
will use TarRetriever
to retrieve files. If you have a transparent HSM, then the only thing is to tell TarRetriever
which directory to use as a CacheRoot
and leave TarFetchCommand
as NONE
. The rest will be transparent: your HSM will bring files back from the long term storage, when TarRetriever
will try to access them. If your HSM is not transparent, then use TarFetchCommand
to invoke an HSM retrieve command.