Home

The CERN Advanced STORage manager (CASTOR) is a hierarchical storage (i.e. has disk and tape) management system which was developed at CERN for archiving physics data (with very large data volumes, see the plot on the right). Files can be stored, listed, retrieved and remotely accessed using CASTOR command-line tools or user applications that were developed using the CASTOR API. CASTOR provides a set of access protocols such as XROOT (the main and recommended protocol) and GridFTP. RFIO (Remote File IO) used to be supported until 2016.

CASTOR is the successor of SHIFT, the Scalable Heterogeneous Integrated FaciliTy for HEP computing, which was developed and operated in the 1990s. As of June 29th 2020, CTA, the CERN Tape Archive, started to be operated as the successor of CASTOR and gradually replaced it. The evolution of total data on tape at CERN since 2001 is displayed on the right, including statistics gathered from CASTOR 1 (1998-2007), CASTOR 2 (2005-2022), and CTA (2020-onwards).

Design

The design is based on a component architecture (Architecture diagram) using a central database in order to safeguard the state changes of the CASTOR components. The access to disk pools is controlled by the Stager; the directory structure is kept by the Name Server. The tape access (writes and recalls) is controlled by the Tape Infrastructure.

The 5 major functional modules are:
  1. Stager - this disk pool manager allocates and reclaims space; it also controls client access and oversees the disk pool local catalogue
  2. Name Server - this CASTOR name space (files and directories) includes the corresponding file metadata (size, dates, checksum, ownership and ACLs (Access Control List), tape copy information). Command-line tools modelled along Unix tools enables the manipulation of the name space (e.g. nsls corresponds to ls, etc...)
  3. Tape Infrastructure - under certain conditions CASTOR saves files onto tape in order to provide data safety and to manage data storage that is larger than the available disks. At CERN, the high capacity tape units that are used are Oracle StorageTek (photo) T10000C (5 TB) and IBM TS1140 (4 TB). Cartridges are housed in tape libraries, and access to them is fully automatized. The libraries used by CASTOR in production are 4 x Oracle SL8500 and 3 x IBM TS3500. The current total tape archive capacity is ~100 PB (January 2013).

    The CASTOR Volume Manager database contains information about each tape's characteristics, capacity and status. The Name Server database contains information about the files (sometimes referred to as segments) on a tape:

    • ownership
    • permission details
    • file offset location on tape

    User commands are available to display information in both the Name Server and Volume Manager databases.

    The mounting of cartridges to and from tape drives is managed by the Volume Drive Queue Manager (VDQM) in conjunction with library control software specific to each model of tape library.

    The cost of storage per terabyte on tape is a lot less than that on hard disk, and it has the advantage of not consuming electricity when tapes are not being accessed. However, access times on tape are longer, in the order of minutes rather than seconds.

  4. Client - this allows the user to upload, download, access and manage CASTOR data
  5. Storage Resource Management - allows for data access in a computing Grid via the SRM protocol. It interacts with CASTOR on behalf of a user or other services (such as FTS, the File Transfer System used by the LHC community to export data).