areaDetector Plugin NDFileHDF5

March 16, 2015

Ulrik Pedersen (Diamond Light Source), Arthur Glowacki (Argonne National Laboratory), Alan Greer (Observatory Sciences), (Mark Rivers (University of Chicago)

NDFileHDF5 inherits from NDPluginFile. This plugin uses the HDF5 libraries to store data. HDF5 file format is a self-describing binary format supported by the hdfgroup.

The plugin supports all NDArray datatypes and any number of NDArray dimensions (tested up to 3). It supports storing multiple NDArrays in a single file (in stream or capture modes) where each NDArray gets appended to an extra dimension.

NDArray attributes are stored in the HDF5 file. In case of multi-frame files the attributes are stored in 1D datasets (arrays).

The NDFileHDF5 plugin is created with the NDFileHDF5Configure command, either from C/C++ or from the EPICS IOC shell.

NDFileHDF5Configure (const char *portName, int queueSize, int blockingCallbacks, 
                     const char *NDArrayPort, int NDArrayAddr, size_t maxMemory, 
                     int priority, int stackSize)

For details on the meaning of the parameters to this function refer to the detailed documentation on the NDFileHDF5Configure function in the NDFileHDF5.cpp documentation and in the documentation for the constructor for the NDFileHDF5 class.

File Structure Layout

The HDF5 files comprises a hierachial data structure, similar to a file system structure with directories (groups) and files (datasets) [ref]

XML Defined File Structure Layout

It is possible to define the layout of the data structures in an XML definition file. The XML file allow defining the location of HDF5 Groups, Datasets and Attributes based on detector data (NDArrays), metadata (NDAttributes) and constants (for instance NeXus tags).

The XML definition contains the following 4 main elements: group, dataset, attribute, and hardlink. The terms refer to the HDF5 elements of the same names.

attribute: represent a HDF5 attribute (key, value metadata) and is assigned to groups and datasets
dataset: represent a HDF5 dataset (N-dimensional array). Can contain attributes and is assigned to a group
group: represent a HDF5 group. Groups can contain datasets, attributes and other groups (recursively)
hardlink: represent an HDF5 hard link. Hardlinks can point to other elements in the file, such as datasets. They are analogous to hard links in the Linux file system.
global: Rather than representing a HDF5 object; this element represents a global functionality definition

XML attribute	Required	Description	Value
XML "attribute" element attributes
name	yes	Name (key) of the HDF attribute (key, value) pair	string
source	yes	Definition of where the attribute gets its value from	Enum string: "constant", "ndattribute"
when	optional	Event when the attribute data is updated	Enum string: "OnFileOpen", "OnFileClose", "OnFileWrite"
value	Required only if source="constant"	The constant value to give the attribute	string (possibly containing an int or float)
type	optional - use if source="constant"	The constant datatype	Enum string: "int", "float", "string"
ndattribute	Required only if source="ndattribute"	Name of the areaDetector NDAttribute which is the source of this HDF5 attribute's data value	string containing the name of an NDAttribute
XML "group" element attributes
name	yes	The (relative) name of the HDF5 group	string
ndattr_default	optional	This attribute flags a group as being a 'default' container for NDAttributes which have not been defined to be stored elsewhere. If there is no group defined with ndaddr_default=true, and if the root group does not have auto_ndattr_default=false, then the 'default' container will be the root group.	boolean (default=false)
auto_ndattr_default	optional	If this attribute is present for the root group and is set to false then NDAttributes which have not been defined to be stored elsewhere will not be stored at all	boolean (default=false)
XML "dataset" element attributes
name	yes	Name of the HDF5 dataset	string
source	yes	Definition of where the dataset gets its data values from	string enum: "detector", "ndattribute", "constant"
value	Required only if source="constant"	Constant value to write directly into the HDF5 dataset	String - possibly containing int or float values. Arrays of int and float values can also be represented in a comma-separated string
ndattribute	Required only if source="ndattribute"	The name of the areaDetector NDAttribute to use as a data source for this HDF5 dataset	string containing the name of the NDAttribute
det_default	optional	Flag to indicate that this HDF5 dataset is the default dataset for the detector to write NDArrays into. Only sensible to set true if source="detector"	boolean (default=false)
XML "global" element attributes
name	yes	Name of the global functionality or parameter to set	enum string: "detector_data_destination" (currently only one supported parameter)
ndattribute	Required when name="detector_data_destination"	Name of the NDAttribute which defines the name of the dataset where incoming NDArrays are to be stored	string containing the name of an NDAttribute
XML "hardlink" element attributes
name	yes	Name of the link	string: string containing the name of the hardlink being created
target	yes	Name of the existing target object in the HDF5 file being linked to.	string containing the name of the target object being linked to

An example XML layout file is provided in "ADExample/iocs/simDetectorIOC/iocBoot/iocSimDetector/hdf5_layout_demo.xml".

An XML schema is provided in "ADCore/iocBoot/hdf5_xml_layout_schema.xsd". The schema defines the syntax that is allowed in the user's XML definition. It can also be used with the 'xmllint' command to validate a user's XML definition:

xmllint --noout --schema ADCore/iocBoot/hdf5_xml_layout_schema.xsd /path/to/users/layout.xml

Default File Structure Layout

If no XML Layout Definition file is loaded, the plugin will revert to using its default file structure layout. The default layout is compatible with the plugin's original, hard-code layout. This layout is actually defined by an XML layout string defined in the source code file NDFileHDF5LayoutXML.cpp.

This default layout is compatible with the Nexus file format. This is achieved by defining a specific hierachial structure of groups and datasets and by tagging elements in the hierachy with certain "NX_class=nnn" attributes. Although Nexus libraries are not used to write the data to disk, this file structure allow Nexus-aware readers to open and read the content of these HDF5 files. This has been tested with the Nexus reader in the GDA application. Hardlinks in the HDF5 file can be used to make the same dataset appear in more than one location. This can be useful for defining a layout that is Nexus compatible, as well as conforming to some other desired layout.

NDArray attributes

The attributes from NDArrays are stored in the HDF5 files. The list of attributes is loaded when a file is opened so XML attributes files should not be reloaded while writing a file in stream mode.

If the dataset is defined in the XML layout file then the user-specified name is used for the dataset. If the dataset is not defined in the XML layout file then the dataset name will be the the NDArray attribute name. The NDArray attribute datasets automatically have 4 HDF "attributes") to indicate their source type and origin. These are:

NDAttrName: The name of the NDArray attribute.
NDAttrDescrption: The description of the NDArray attribute.
NDAttrSourceType: The source type of the NDArray attribute: NDAttrSourceDriver, NDAttrSourceParam, NDAttrSourceEPICSPV, or NDAttrSourceFunct.
NDAttrSource: The source of the NDArray attribute data, i.e. the name of the EPICS PV, the drvInfo string for the parameter, or the name of the attribute function.

NDArray attributes will be stored as 1D datasets. If the location of an NDArray attribute dataset is not defined in the XML layout file then the dataset will appear in the group that has the property ndattr_default="true". If there is no group with that property then the dataset will appear in the root group. If the root group has the property auto_ndattr_default="false" then datasets that are not explicitly defined in the XML layout file will not appear in the HDF5 file at all.

There are 4 "virtual" attributes that are automatically created. These are:

NDArrayUniqueId: The NDArray.uniqueId value.
NDArrayTimeStamp: The NDArray.timeStamp value.
NDArrayEpicsTSSec: The NDArray.epicsTS.secPastEpoch value.
NDArrayEpicsTSnSec: The NDArray.epicsTS.nsec value.

These properties are added to the property list and will be written to the HDF5 file following the same rules as the actual NDArray ndAttributes described above.

It is possible to validate the syntax of an NDArray attributes XML file. For example, the command (starting from the iocBoot directory) to validate the syntax of the iocBoot/iocSimDetector/simDetectorAttributes.xml file is:

xmllint --noout --schema ./attributes.xsd iocSimDetector/simDetectorAttributes.xml

Default tree structure

The group/dataset structure of the HDF5 files, generated by this plugin:

entry                   <-- NX_class=NXentry
|
+--instrument           <-- NX_class=NXinstrument
   |
   +--detector          <-- NX_class=NXdetector
   |  |
   |  +--data           <-- NX_class=SDS, signal=1
   |  |
   |  +--NDAttributes
   |     |
   |     +--ColorMode
   |
   +--NDAttributes      <-- NX_class=NXCollection, ndattr_default="true"
   |  |
   |  +---              <-- Any number of NDAttributes from the NDArrays as individual 1D datasets
   |
   +--performance       <-- Performance of the file writing
      |
      +--timestamp      <-- A 2D dataset of different timing measurements taking during file writing
+--data                 <-- NX_class=NDdata
   |  |
   |  +--data           <-- Hardlink to /entry/instrument/detector/data

HDF5 File Viewers

Note that with the current (1.8) release series of the HDF5 libraries it is not possible to do a live "monitoring" of a file which is being written by another process. The file writers mentioned in this section can only be used to open, browse and read the HDF5 files after the NDFileHDF5 plugin has completed writing and closed the file. This limitation will be lifted in the 1.10 release of the HDF5 library which will include the Single Writer Multiple Reader (SWMR) feature.

HDFView is a simple GUI tool for viewing and browsing HDF files. It has some limited support for viewing images, plotting graphs and displaying data tables.

The HDF5 libraries also ships with a number of command-line tools for browsing and dumping data.

The screenshot below shows the hdfview application with a datafile open. The datafile is generated by the plugin and a number of elements are visible:

The NDArray NDAttributes appear as 1D datasets in the group "/entry/instrument/NDAttributes/"
The image data is in the dataset "/entry/instrument/detector/data". The metadata (in HDF known as "attributes") for that dataset indicate 8bit unsigned char data, 10 frames of 60x40 pixels
Image and table view of the first frame data is open

Multiple Dimensions

Both areaDetector and the HDF5 file format supports multidimensional datasets. The dimensions from the NDArray are preserved when writing to the HDF5 file. In multi-frame files (plugin in Stream or Capture mode) an additional dimension is added to the dataset to hold the array of frames.

In addition to the dimensions of the NDArray it is also possible to specify up to 2 extra "virtual" dimensions to store datasets in the file. This is to support applications where a sample is scanned in up to two dimensions, say X and Y. For each scan point a dataset comprising of multiple frames can be stored. The length of (i.e. number of points in) each of the two virtual dimensions have to be specified before the plugin opens the file for writing. This feature is only supported in the Stream and Capture modes.

This feature allow for creating very large sets of scan data which matches the dimensions of the performed scan in one datafile. Depending on the application this can be a benefit in post processing.

The figure below illustrate the use of the two extra "virtual" dimensions in a 2D (X,Y) raster scan with N frames per point:

Prior to starting a scan like this the user will need to configure the number of virtual dimensions to use (none, 1 or 2); the number of frames per point; and the length of each of the virtual dimensions (4 x 2 in the example figure). It is not possible to change the number or size of dimensions while the file is open.

For 2D image (greyscale) formats the dimensions in the multiframe HDF5 file are organised as follows:

For a multiframe file with no use of "virtual" dimension the order is: {Nth frame, width, height}
For a multiframe file using 1 "virtual" dimension (X) the order is: {X, Nth frame, width, height}
For a multiframe file using 2 "virtual" dimension (X,Y) the order is: {Y, X, Nth frame, width, height}

Chunking

This plugin uses HDF5 chunking to store the raw image data. The chunk size (the size of each I/O block) can be configured for the frame X and Y dimensions as well as the N'th image (which essentially implies memory caching before writing to disk). Configuring chunking correctly for a given application is a complex matter where both the write performance and the read performance for a given post processing application will have to be evaluated. As a basic starting point, setting the nColChunks and nRowChunks parameters to the X and Y frame size respectively, should give a decent result. In fact if these parameters are configured by the user to the special value 0, they will default to the dimensions of the incoming frames. Further explanations and documentation of the HDF5 chunking feature is available in the HDF5 documentation:

HDF5 documentation advanced topics: Chunking in HDF5
HDF5 User guide: 14.3 Data Chunking
hdfgroup presentation: HDF5 Advanced Topics - Chunking in HDF5

Compression

The HDF5 library supports a number of compression algorithms. When using HDF5 libraries to write and read files the compression is seemless: it only need to be switched on when writing and HDF5 enabled applications can read the files without any additional configuration. Only one compression filter can be applied at the time.

The following compression filters are supported in the NDFileHDF5 plugin:

Lossless SZIP compression is using a separate library from the hdfgroup. NOTE: The szip library contains the following in its COPYING license agreement file:
Revocable (in the event of breach by the user or if required by law), royalty-free, nonexclusive sublicense to use SZIP compression software routines and underlying patents for non-commercial, scientific use only is hereby granted by ICs, LLC, to users of and in conjunction with HDF data storage and retrieval file format and software library products.
This means that the szip compression should not be used by commercial users without first obtaining a license.
External libz -also lossless
N-bit compression is a bit-packing scheme to be used when a detector provide fewer databits than standard 8,16,32 bit words. Data width and offset in the word is user configurable

Parameters and Records

Parameter index variable	asyn interface	Access	Description	drvInfo string	EPICS record name	EPICS record type
Parameter Definitions and EPICS Record Definitions in NDFileHDF5.template
HDF5 XML Layout Definition
XMLFileName	asynOctet	r/w	XML filename, pointing to an XML HDF5 Layout Definition This waveform also supports loading raw XML code directly; up to a maximum of 1MB long (NELM=1MB)	HDF5_layoutFilename	$(P)$(R)XMLFileName $(P)$(R)XMLFileName_RBV	waveform
layoutValid	asynInt32	r/o	Flag to report the validity (xml syntax only) of the loaded XML. Updated when the XMLFileName is updated with a new filename and when the XML file is read at HDF5 file creation	HDF5_layoutValid	$(P)$(R)XMLValid_RBV	bi
layoutErrorMsg	asynOctet	r/o	XML parser error message	HDF5_layoutErrorMsg	$(P)$(R)XMLErrorMsg_RBV	waveform
HDF5 Chunk Configuration
nRowChunks	asynInt32	r/w	Configure HDF5 "chunking" to approriate size for the filesystem: sets number of rows to use per chunk	HDF5_nRowChunks	$(P)$(R)NumRowChunks $(P)$(R)NumRowChunks_RBV	longout longin
nColChunks	asynInt32	r/w	Configure HDF5 "chunking" to approriate size for the filesystem: sets number of columns to use per chunk	HDF5_nColChunks	$(P)$(R)NumColChunks $(P)$(R)NumColChunks_RBV	longout longin
nFramesChunks	asynInt32	r/w	Configure HDF5 "chunking" to approriate size for the filesystem: sets number of frames to use per chunk. For a 2D image, setting this parameter > 1 essentially implies using in-memory cache as HDF5 only writes full chunks to disk.	HDF5_nFramesChunks	$(P)$(R)NumFramesChunks $(P)$(R)NumFramesChunks_RBV	longout longin
Disk Boundary Alignment
chunkBoundaryAlign	asynInt32	r/w	Set the disk boundary alignment in bytes. This parameter can be used to optimise file I/O performance on some file systems. For instance on the Lustre file system where the it is optimal to align data to the 'stripe size' (default 1MB). This parameter applies to all datasets in the file. Setting this parameter to 0 disables use of disk boundary alignment. Warning: setting this parameter to a larger size than the size of a single chunk will cause datafiles to grow larger than the actual contained data.	HDF5_chunkBoundaryAlign	$(P)$(R)BoundaryAlign $(P)$(R)BoundaryAlign_RBV	longout longin
chunkBoundaryThreshold	asynInt32	r/w	Set a minimum size (bytes) of chunk or dataset where boundary alignment is to be applied. This can be used to filter out small datasets like NDAttributes from the boundary alignment as it could blow up the file size. Setting this parameter to 0 will disable the use of boundary alignment	HDF5_chunkBoundaryThreshold	$(P)$(R)BoundaryThreshold $(P)$(R)BoundaryThreshold_RBV	longout longin
Metadata
storeAttributes	asynInt32	r/w	Enable or disable support for storing NDArray attributes in file	HDF5_storeAttributes	$(P)$(R)StoreAttr $(P)$(R)StoreAttr_RBV	bo bi
storePerformance	asynInt32	r/w	Enable or disable support for storing file IO timing measurements in file	HDF5_storePerformance	$(P)$(R)StorePerform $(P)$(R)StorePerform_RBV	bo bi
flushNthFrame	asynInt32	r/w	Flush the file metadata to disk every N'th frame. Image data is written to disk on every write operation, but HDF5 internal metadata to describe the data layout and indices is normally only written at close time.	HDF5_flushNthFrame	$(P)$(R)NumFramesFlush $(P)$(R)NumFramesFlush_RBV	longout longin
Additional Virtual Dimensions
nExtraDims	asynInt32	r/w	Number of extra dimensions [0..2]	HDF5_nExtraDims	$(P)$(R)NumExtraDims $(P)$(R)NumExtraDims	mbbo mbbi
extraDimSizeN	asynInt32	r/w	Size of extra dimension N (no. of frames per point)	HDF5_extraDimSizeN	$(P)$(R)ExtraDimSizeN $(P)$(R)ExtraDimSizeN_RBV
extraDimSizeX	asynInt32	r/w	Size of extra dimension X	HDF5_extraDimSizeX	$(P)$(R)ExtraDimSizeX $(P)$(R)ExtraDimSizeX_RBV	longout longin
extraDimSizeY	asynInt32	r/w	Size of extra dimension Y	HDF5_extraDimSizeY	$(P)$(R)ExtraDimSizeY $(P)$(R)ExtraDimSizeY_RBV	longout longin
Runtime Statistics
totalRuntime	asynFloat64	r/o	Total runtime in seconds from first frame to file closed	HDF5_totalRuntime	$(P)$(R)Runtime	ai
totalIoSpeed	asynFloat64	r/o	Overall IO write speed in megabit per second from first frame to file closed	HDF5_totalIoSpeed	$(P)$(R)IOSpeed	ai
Compression Filters
compressionType	asynInt32	r/w	Select or switch off compression filter	HDF5_compressionType	$(P)$(R)Compression $(P)$(R)Compression_RBV	mbbo mbbi
nbitsPrecision	asynInt32	r/w	N-bit compression filter: number of data bits per pixel	HDF5_nbitsPrecision	$(P)$(R)NumDataBits $(P)$(R)NumDataBits_RBV	longout longin
nbitsOffset	asynInt32	r/w	N-bit compression filter: dataword bit-offset in pixel	HDF5_nbitsOffset	$(P)$(R)DataBitsOffset $(P)$(R)DataBitsOffset_RBV	longout longin
szipNumPixels	asynInt32	r/w	szip compression filter: number of pixels in filter [1..32]	HDF5_szipNumPixels	$(P)$(R)SZipNumPixels $(P)$(R)SZipNumPixels_RBV	longout longin
zCompressLevel	asynInt32	r/w	zlib compression filter: compression level [1..9]	HDF5_zCompressLevel	$(P)$(R)ZLevel $(P)$(R)ZLevel_RBV	longout longin