1. Introduction

This is the documentation for the Python interface to load and analyze FACET datasets. It is designed with an analysis envorinment consisting of iPython, Numpy, SciPy, Matplotlib, and PyQt4.

1.1. How Datasets are Saved: DAQ

At FACET, datasets are collected through use of the FACET data acquisition software typically referred to as “The DAQ.” The DAQ software is written in Matlab, and triggers and collects data via EPICS. Data is stored in network attached storage (NAS) in Sector 20 at FACET, because currently the network does not support transmitting the data away from Sector 20.

1.1.1. BSA Data

There are two categories of underlying data at FACET: beam-synchronous-acquired (BSA) data and non-beam-synchronous-acquired (non-BSA) data. BSA data is correlated by shots; each shot should have a unique data point for each BSA data type. Typical BSA data include, for instance:

  • Images
  • Beam position monitor (BPM) data
  • Pyro data
  • Toroid data
  • Etc.

Background data is typically acquired for images at the beginning of a dataset and can be loaded alongside images.

1.1.2. Non-BSA Data

Non-BSA data is assumed to vary slowly and remain the same over the course of the data set and is taken only once at the beginning of a set. This assumption is not tested by the software- it is theoretically possible to record a dataset for hours, and non-BSA values will not update to reflect changing conditions. Typical non-BSA data include, for instance:

  • Magnet power supply current values
  • Etc.

1.2. Underlying Dataset File Types

FACET saves several data types, which are linked together in a master file that represents a dataset. This master file sometimes contains data or references. Rather than have the user piece these disparate data together in an analysis, this package has the capability of loading and relating data as needed. This package was started in 2014 and can only load datasets taken since then.

Perhaps the most important linked data are images. Cameras record images to directories in the NAS. How they record images (what cameras are used, what file format is used, where they are stored in the NAS, etc.) has changed over time and may continue to change as technical needs evolve. Due to the changing nature of images, the E200 package is designed to account for these changes so the scientist can assume the images received need no further adjustment.

BSA and non-BSA data are saved in the master file itself. This has remained fairly stable. BSA and non-BSA data tend to be scalar in nature- one single number saved for each shot or dataset respectively. There has been no need for higher-dimensional data thus far, although the DAQ has anticipated this possible eventual need.

1.3. Master File Type

While Matlab can write HDF5 files, the files it saves by default may not always conform to the HDF5 standard. While it is possible to force Matlab to write HDF5 files, in certain scenarios the time to save these files can be quite lengthy and cause problems. Therefore, files are saved using Matlab’s default save() function, which chooses an optimal file type. In order to read these Matlab files in Python, they must first be translated by Matlab into HDF5, which Python can then load. The data files follow a hierarchical structure, which is ideal for HDF5. Unfortunately, the de facto Python software for reading and writing HDF5 files is h5py. H5Py returns nested dictionaries, which, while a standard data type in Python, are awkward to access in any sort of automated format when developing in iPython. The E200 package attempts to address this deficit through custom nested classes that should make traversing the data structure simpler. The E200 package does preserve the original nested dictionaries.

1.4. Prerequisites

1.4.1. Python 3

E200 works with Python 3 and up, which should be installed via apt-get on *nix, Macports on Apple machines, or downloaded from https://www.python.org/downloads/.

1.4.2. Matlab

E200 requires Matlab in order to translate Matlab files to HDF5. Currently this is accomplished through the E200_data software.

  1. Install Matlab
  2. Download E200_data software
  3. Add downloaded software to the Matlab path

1.4.3. HDF5 and h5py

While technically E200 only depends on h5py, h5py depends on HDF5. h5py can be installed via pip:

pip install h5py

However, HDF5 may need to be installed at the system level via apt-get or Macports.

It is possible to download or compile HDF5 and h5py from source:

* https://www.hdfgroup.org/HDF5/release/obtain5.html
* https://pypi.python.org/pypi/h5py

1.4.4. NumPy and SciPy

E200 depends on NumPy and SciPy to manipulate data.

NumPy has dependencies such as BLAS, LAPACK, and ATLAS, which makes downloading building form source is difficult. Installation via apt-get or Macports is highly recommended in order to handle these dependencies. It is possible to download or to build from source.

SciPy is similar to NumPy, and it is usually easiest to install the two at the same time - they have nearly identical install methods, to the point that at times it is difficult to tell the two apart. It is easiest to follow the install instructions.

1.4.5. PyQt4

E200 requires PyQt4, which has dependencies of its own. Installation via apt-get or Macports is highly recommended in order to handle these dependencies.

It is possible, although difficult, to install from source, including dependencies. This was necessary on facet-srv20; talk to Joel Frederico if you would like more information on running on facet-srv20. If you would like to run on a *nix machine without access to apt-get, you may have to compile from source. It is possible to build against Qt 5.

1.4.6. SciSalt

E200 makes uses of many helper functions from SciSalt. If installing from PyPI, these dependencies will be taken care of automatically. If installing any other way, SciSalt can be installed:

pip install SciSalt

1.5. Installation

There are a few ways to install E200. If you are unsure or want something more reliable (and also updated less frequently), install from PyPI.

1.5.1. From PyPI

You can install the most recent E200 version using pip:

sudo pip install E200

This will install E200 in your Python installation’s site-packages directory.

The PyPI site is found at https://pypi.python.org/pypi/E200.

1.5.2. From the tarball release

  1. Download the most recent tarball from the download page
  2. Unpack the tarball
  3. sudo python setup.py install

Note that you have to have setuptools installed.

This will install E200 into your Python installation’s site-packages directory.

1.5.3. Installing the development version

  1. Install git (available through Linux’s apt-get and Macports as well)
  2. git clone git@github.com:joelfrederico/E200.git
  3. cd E200
  4. python setup.py develop

Note that you have to have setuptools installed.

1.6. Basic Usage

This section gives you a brief introduction in loading datasets.

1.6.1. Access Data

First, a nas directory must be accessible. There are several ways to do this.

1.6.1.1. Copy data from Unix (rsync)

The simplest method is to copy data from a source. The most universal method is to copy data from SLAC’s Unix area, which is frequently synced from NAS:

mkdir -p ~/FACET_data/<dataset>
rsync -avPh <username>@noric.slac.stanford.edu:/nfs/slac/g/facetdata/<dataset> <dataset>

Generally, <dataset> has the form nas/nas-li20-pm00/E200/2015/20150602/E200_17712.

The nas folder can now be found in ~/FACET_data.

1.6.1.2. Mount Local Copy (OS X Only)

If you are using an Apple machine and either on the SLAC network it is possible to mount the drive directly to your system:

  1. Open the Finder and press Command + k
  2. Enter the server address as: afp://pwfa.slac.stanford.edu
  3. Double-click on PWFA_5big to mount the drive

The nas folder can now be found in /Volumes/PWFA_5big.

1.6.2. Load Data

The simplest and most-accessible way to load datasets is to use E200_load_data_gui():

>>> import E200
>>> data = E200.E200_load_data_gui()

If you have not loaded data before, you may be prompted to locate the nas folder.

Once loaded, a dataset can be displayed:

>>> camera = data.rdrill.data.raw.images.CMOS_ELAN
>>> uid = camera.UID[0]
>>> images = E200.E200_load_images(camera, uid)
>>> plt.imshow(images.images[0])
<matplotlib.image.AxesImage object at 0x105e9c080>
>>> plt.show()

(Note that this assumes the dataset recorded camera CMOS_ELAN.) In OS X this opens a window:

_images/basic_usage.png