areaDetector Plugin NDFileHDF5

March 16, 2015

Ulrik Pedersen (Diamond Light Source), Arthur Glowacki (Argonne National Laboratory), Alan Greer (Observatory Sciences), (Mark Rivers (University of Chicago)

NDFileHDF5 inherits from NDPluginFile. This plugin uses the HDF5 libraries to store data. HDF5 file format is a self-describing binary format supported by the hdfgroup.

The plugin supports all NDArray datatypes and any number of NDArray dimensions (tested up to 3). It supports storing multiple NDArrays in a single file (in stream or capture modes) where each NDArray gets appended to an extra dimension.

NDArray attributes are stored in the HDF5 file. In case of multi-frame files the attributes are stored in 1D datasets (arrays).

The NDFileHDF5 plugin is created with the NDFileHDF5Configure command, either from C/C++ or from the EPICS IOC shell.

NDFileHDF5Configure (const char *portName, int queueSize, int blockingCallbacks, 
                     const char *NDArrayPort, int NDArrayAddr, size_t maxMemory, 
                     int priority, int stackSize)
  

For details on the meaning of the parameters to this function refer to the detailed documentation on the NDFileHDF5Configure function in the NDFileHDF5.cpp documentation and in the documentation for the constructor for the NDFileHDF5 class.

File Structure Layout

The HDF5 files comprises a hierachial data structure, similar to a file system structure with directories (groups) and files (datasets) [ref]

XML Defined File Structure Layout

It is possible to define the layout of the data structures in an XML definition file. The XML file allow defining the location of HDF5 Groups, Datasets and Attributes based on detector data (NDArrays), metadata (NDAttributes) and constants (for instance NeXus tags).

The XML definition contains the following 4 main elements: group, dataset, attribute, and hardlink. The terms refer to the HDF5 elements of the same names.

XML attribute Required Description Value
XML "attribute" element attributes
name yes Name (key) of the HDF attribute (key, value) pair string
source yes Definition of where the attribute gets its value from Enum string: "constant", "ndattribute"
when optional Event when the attribute data is updated Enum string: "OnFileOpen", "OnFileClose", "OnFileWrite"
value Required only if source="constant" The constant value to give the attribute string (possibly containing an int or float)
type optional - use if source="constant" The constant datatype Enum string: "int", "float", "string"
ndattribute Required only if source="ndattribute" Name of the areaDetector NDAttribute which is the source of this HDF5 attribute's data value string containing the name of an NDAttribute
XML "group" element attributes
name yes The (relative) name of the HDF5 group string
ndattr_default optional This attribute flags a group as being a 'default' container for NDAttributes which have not been defined to be stored elsewhere. If there is no group defined with ndaddr_default=true, and if the root group does not have auto_ndattr_default=false, then the 'default' container will be the root group. boolean (default=false)
auto_ndattr_default optional If this attribute is present for the root group and is set to false then NDAttributes which have not been defined to be stored elsewhere will not be stored at all boolean (default=false)
XML "dataset" element attributes
name yes Name of the HDF5 dataset string
source yes Definition of where the dataset gets its data values from string enum: "detector", "ndattribute", "constant"
value Required only if source="constant" Constant value to write directly into the HDF5 dataset String - possibly containing int or float values. Arrays of int and float values can also be represented in a comma-separated string
ndattribute Required only if source="ndattribute" The name of the areaDetector NDAttribute to use as a data source for this HDF5 dataset string containing the name of the NDAttribute
det_default optional Flag to indicate that this HDF5 dataset is the default dataset for the detector to write NDArrays into. Only sensible to set true if source="detector" boolean (default=false)
XML "global" element attributes
name yes Name of the global functionality or parameter to set enum string: "detector_data_destination"
(currently only one supported parameter)
ndattribute Required when name="detector_data_destination" Name of the NDAttribute which defines the name of the dataset where incoming NDArrays are to be stored string containing the name of an NDAttribute
XML "hardlink" element attributes
name yes Name of the link string: string containing the name of the hardlink being created
target yes Name of the existing target object in the HDF5 file being linked to. string containing the name of the target object being linked to

An example XML layout file is provided in "ADExample/iocs/simDetectorIOC/iocBoot/iocSimDetector/hdf5_layout_demo.xml".

An XML schema is provided in "ADCore/iocBoot/hdf5_xml_layout_schema.xsd". The schema defines the syntax that is allowed in the user's XML definition. It can also be used with the 'xmllint' command to validate a user's XML definition:

xmllint --noout --schema ADCore/iocBoot/hdf5_xml_layout_schema.xsd /path/to/users/layout.xml

Default File Structure Layout

If no XML Layout Definition file is loaded, the plugin will revert to using its default file structure layout. The default layout is compatible with the plugin's original, hard-code layout. This layout is actually defined by an XML layout string defined in the source code file NDFileHDF5LayoutXML.cpp.

This default layout is compatible with the Nexus file format. This is achieved by defining a specific hierachial structure of groups and datasets and by tagging elements in the hierachy with certain "NX_class=nnn" attributes. Although Nexus libraries are not used to write the data to disk, this file structure allow Nexus-aware readers to open and read the content of these HDF5 files. This has been tested with the Nexus reader in the GDA application. Hardlinks in the HDF5 file can be used to make the same dataset appear in more than one location. This can be useful for defining a layout that is Nexus compatible, as well as conforming to some other desired layout.

NDArray attributes

The attributes from NDArrays are stored in the HDF5 files. The list of attributes is loaded when a file is opened so XML attributes files should not be reloaded while writing a file in stream mode.

If the dataset is defined in the XML layout file then the user-specified name is used for the dataset. If the dataset is not defined in the XML layout file then the dataset name will be the the NDArray attribute name. The NDArray attribute datasets automatically have 4 HDF "attributes") to indicate their source type and origin. These are:

NDArray attributes will be stored as 1D datasets. If the location of an NDArray attribute dataset is not defined in the XML layout file then the dataset will appear in the group that has the property ndattr_default="true". If there is no group with that property then the dataset will appear in the root group. If the root group has the property auto_ndattr_default="false" then datasets that are not explicitly defined in the XML layout file will not appear in the HDF5 file at all.

There are 4 "virtual" attributes that are automatically created. These are:

These properties are added to the property list and will be written to the HDF5 file following the same rules as the actual NDArray ndAttributes described above.

It is possible to validate the syntax of an NDArray attributes XML file. For example, the command (starting from the iocBoot directory) to validate the syntax of the iocBoot/iocSimDetector/simDetectorAttributes.xml file is:

xmllint --noout --schema ./attributes.xsd iocSimDetector/simDetectorAttributes.xml
Default tree structure

The group/dataset structure of the HDF5 files, generated by this plugin:

entry                   <-- NX_class=NXentry
|
+--instrument           <-- NX_class=NXinstrument
   |
   +--detector          <-- NX_class=NXdetector
   |  |
   |  +--data           <-- NX_class=SDS, signal=1
   |  |
   |  +--NDAttributes
   |     |
   |     +--ColorMode
   |
   +--NDAttributes      <-- NX_class=NXCollection, ndattr_default="true"
   |  |
   |  +---              <-- Any number of NDAttributes from the NDArrays as individual 1D datasets
   |
   +--performance       <-- Performance of the file writing
      |
      +--timestamp      <-- A 2D dataset of different timing measurements taking during file writing
+--data                 <-- NX_class=NDdata
   |  |
   |  +--data           <-- Hardlink to /entry/instrument/detector/data

HDF5 File Viewers

Note that with the current (1.8) release series of the HDF5 libraries it is not possible to do a live "monitoring" of a file which is being written by another process. The file writers mentioned in this section can only be used to open, browse and read the HDF5 files after the NDFileHDF5 plugin has completed writing and closed the file. This limitation will be lifted in the 1.10 release of the HDF5 library which will include the Single Writer Multiple Reader (SWMR) feature.

HDFView is a simple GUI tool for viewing and browsing HDF files. It has some limited support for viewing images, plotting graphs and displaying data tables.

The HDF5 libraries also ships with a number of command-line tools for browsing and dumping data.

The screenshot below shows the hdfview application with a datafile open. The datafile is generated by the plugin and a number of elements are visible:

==== HDFView-screenshot.png ====

Multiple Dimensions

Both areaDetector and the HDF5 file format supports multidimensional datasets. The dimensions from the NDArray are preserved when writing to the HDF5 file. In multi-frame files (plugin in Stream or Capture mode) an additional dimension is added to the dataset to hold the array of frames.

In addition to the dimensions of the NDArray it is also possible to specify up to 2 extra "virtual" dimensions to store datasets in the file. This is to support applications where a sample is scanned in up to two dimensions, say X and Y. For each scan point a dataset comprising of multiple frames can be stored. The length of (i.e. number of points in) each of the two virtual dimensions have to be specified before the plugin opens the file for writing. This feature is only supported in the Stream and Capture modes.

This feature allow for creating very large sets of scan data which matches the dimensions of the performed scan in one datafile. Depending on the application this can be a benefit in post processing.

The figure below illustrate the use of the two extra "virtual" dimensions in a 2D (X,Y) raster scan with N frames per point:

HDFmultiple-dimensions.png

Prior to starting a scan like this the user will need to configure the number of virtual dimensions to use (none, 1 or 2); the number of frames per point; and the length of each of the virtual dimensions (4 x 2 in the example figure). It is not possible to change the number or size of dimensions while the file is open.

For 2D image (greyscale) formats the dimensions in the multiframe HDF5 file are organised as follows:

Chunking

This plugin uses HDF5 chunking to store the raw image data. The chunk size (the size of each I/O block) can be configured for the frame X and Y dimensions as well as the N'th image (which essentially implies memory caching before writing to disk). Configuring chunking correctly for a given application is a complex matter where both the write performance and the read performance for a given post processing application will have to be evaluated. As a basic starting point, setting the nColChunks and nRowChunks parameters to the X and Y frame size respectively, should give a decent result. In fact if these parameters are configured by the user to the special value 0, they will default to the dimensions of the incoming frames. Further explanations and documentation of the HDF5 chunking feature is available in the HDF5 documentation:

Compression

The HDF5 library supports a number of compression algorithms. When using HDF5 libraries to write and read files the compression is seemless: it only need to be switched on when writing and HDF5 enabled applications can read the files without any additional configuration. Only one compression filter can be applied at the time.

The following compression filters are supported in the NDFileHDF5 plugin:

Parameters and Records

Parameter Definitions and EPICS Record Definitions in NDFileHDF5.template
Parameter index variable asyn interface Access Description drvInfo string EPICS record name EPICS record type
HDF5 XML Layout Definition
XMLFileName asynOctet r/w XML filename, pointing to an XML HDF5 Layout Definition
This waveform also supports loading raw XML code directly; up to a maximum of 1MB long (NELM=1MB)
HDF5_layoutFilename $(P)$(R)XMLFileName
$(P)$(R)XMLFileName_RBV
waveform
layoutValid asynInt32 r/o Flag to report the validity (xml syntax only) of the loaded XML. Updated when the XMLFileName is updated with a new filename and when the XML file is read at HDF5 file creation HDF5_layoutValid $(P)$(R)XMLValid_RBV bi
layoutErrorMsg asynOctet r/o XML parser error message HDF5_layoutErrorMsg $(P)$(R)XMLErrorMsg_RBV waveform
HDF5 Chunk Configuration
nRowChunks asynInt32 r/w Configure HDF5 "chunking" to approriate size for the filesystem: sets number of rows to use per chunk HDF5_nRowChunks $(P)$(R)NumRowChunks
$(P)$(R)NumRowChunks_RBV
longout
longin
nColChunks asynInt32 r/w Configure HDF5 "chunking" to approriate size for the filesystem: sets number of columns to use per chunk HDF5_nColChunks $(P)$(R)NumColChunks
$(P)$(R)NumColChunks_RBV
longout
longin
nFramesChunks asynInt32 r/w Configure HDF5 "chunking" to approriate size for the filesystem: sets number of frames to use per chunk. For a 2D image, setting this parameter > 1 essentially implies using in-memory cache as HDF5 only writes full chunks to disk. HDF5_nFramesChunks $(P)$(R)NumFramesChunks
$(P)$(R)NumFramesChunks_RBV
longout
longin
Disk Boundary Alignment
chunkBoundaryAlign asynInt32 r/w Set the disk boundary alignment in bytes. This parameter can be used to optimise file I/O performance on some file systems. For instance on the Lustre file system where the it is optimal to align data to the 'stripe size' (default 1MB).
This parameter applies to all datasets in the file.
Setting this parameter to 0 disables use of disk boundary alignment.
Warning: setting this parameter to a larger size than the size of a single chunk will cause datafiles to grow larger than the actual contained data.
HDF5_chunkBoundaryAlign $(P)$(R)BoundaryAlign
$(P)$(R)BoundaryAlign_RBV
longout
longin
chunkBoundaryThreshold asynInt32 r/w Set a minimum size (bytes) of chunk or dataset where boundary alignment is to be applied. This can be used to filter out small datasets like NDAttributes from the boundary alignment as it could blow up the file size.
Setting this parameter to 0 will disable the use of boundary alignment
HDF5_chunkBoundaryThreshold $(P)$(R)BoundaryThreshold
$(P)$(R)BoundaryThreshold_RBV
longout
longin
Metadata
storeAttributes asynInt32 r/w Enable or disable support for storing NDArray attributes in file HDF5_storeAttributes $(P)$(R)StoreAttr
$(P)$(R)StoreAttr_RBV
bo
bi
storePerformance asynInt32 r/w Enable or disable support for storing file IO timing measurements in file HDF5_storePerformance $(P)$(R)StorePerform
$(P)$(R)StorePerform_RBV
bo
bi
flushNthFrame asynInt32 r/w Flush the file metadata to disk every N'th frame. Image data is written to disk on every write operation, but HDF5 internal metadata to describe the data layout and indices is normally only written at close time. HDF5_flushNthFrame $(P)$(R)NumFramesFlush
$(P)$(R)NumFramesFlush_RBV
longout
longin
Additional Virtual Dimensions
nExtraDims asynInt32 r/w Number of extra dimensions [0..2] HDF5_nExtraDims $(P)$(R)NumExtraDims
$(P)$(R)NumExtraDims
mbbo
mbbi
extraDimSizeN asynInt32 r/w Size of extra dimension N (no. of frames per point) HDF5_extraDimSizeN $(P)$(R)ExtraDimSizeN
$(P)$(R)ExtraDimSizeN_RBV
extraDimSizeX asynInt32 r/w Size of extra dimension X HDF5_extraDimSizeX $(P)$(R)ExtraDimSizeX
$(P)$(R)ExtraDimSizeX_RBV
longout
longin
extraDimSizeY asynInt32 r/w Size of extra dimension Y HDF5_extraDimSizeY $(P)$(R)ExtraDimSizeY
$(P)$(R)ExtraDimSizeY_RBV
longout
longin
Runtime Statistics
totalRuntime asynFloat64 r/o Total runtime in seconds from first frame to file closed HDF5_totalRuntime $(P)$(R)Runtime ai
totalIoSpeed asynFloat64 r/o Overall IO write speed in megabit per second from first frame to file closed HDF5_totalIoSpeed $(P)$(R)IOSpeed ai
Compression Filters
compressionType asynInt32 r/w Select or switch off compression filter HDF5_compressionType $(P)$(R)Compression
$(P)$(R)Compression_RBV
mbbo
mbbi
nbitsPrecision asynInt32 r/w N-bit compression filter: number of data bits per pixel HDF5_nbitsPrecision $(P)$(R)NumDataBits
$(P)$(R)NumDataBits_RBV
longout
longin
nbitsOffset asynInt32 r/w N-bit compression filter: dataword bit-offset in pixel HDF5_nbitsOffset $(P)$(R)DataBitsOffset
$(P)$(R)DataBitsOffset_RBV
longout
longin
szipNumPixels asynInt32 r/w szip compression filter: number of pixels in filter [1..32] HDF5_szipNumPixels $(P)$(R)SZipNumPixels
$(P)$(R)SZipNumPixels_RBV
longout
longin
zCompressLevel asynInt32 r/w zlib compression filter: compression level [1..9] HDF5_zCompressLevel $(P)$(R)ZLevel
$(P)$(R)ZLevel_RBV
longout
longin

NDFileHDF5.adl

NDFileHDF5.png