2. API

The E200 package makes it easy to analyze datasets saved at FACET. In order to do analysis, some understanding of how data can be correlated in practice is necessary. For more information, see the Introduction.

2.1. Loading Data

Before anything can be done, a dataset must be loaded. When loading data, only data contained in the master file is loaded immediately. Other data, such as image data, must be loaded later. This is a practical consideration in order to load files quickly and avoid filling up memory, as there can be anywhere from one image to terabytes of images.

Data can be loaded in two main ways. The most accessible way is through E200.E200_load_data_gui(), which presents the user with a graphical file picker:

import E200
data = E200.E200_load_data_gui()

The file path picked is then passed through to E200.E200_load_data(). This function can, of course, be loaded directly instead of accessed through E200.E200_load_data_gui(). The function E200.E200_load_data() loads a file from a string:

import E200
filepath = 'nas/nas-li20-pm00/E217/2015/20150606/E217_17990/E217_17990.mat'
data = E200.E200_load_data(filepath)

2.2. Drill Data Class

It becomes immediately obvious that loaded data is returned in the form of the class E200.Drill. The dataset’s nested dictionary as returned from h5py is given by:

data.read_file

It is cumbersome to find all of the nested dictionaries, as tab completion does not work for dictionaries in the Python interpreter. Each nested level must be explored individually:

>>> list(data.read_file['data'].keys())
['VersionInfo', 'processed', 'raw', 'user']

2.2.1. Top Data Level

The E200.Drill class anticipates this problem: it is far simpler to enter:

>>> data.rdrill.data
<E200.E200_load_data.Drill with keys:
_hdf5
VersionInfo
processed
raw
user
>

One can immediately see that this class includes useful members. Ignoring _hdf5, which includes the dictionary for each level, the keys are:

  • VersionInfo: Information about the DAQ version used to collect data
  • raw: Data and references to data
  • processed: Data post-processed by other routines
  • user: Space to hold individuals’ calculations

2.2.2. Raw Data Level

Of these, only raw is expected to hold data. Exploring raw reveals its own levels:

>>> data.rdrill.data.raw
<E200.E200_load_data.Drill with keys:
_hdf5
arrays
images
metadata
scalars
vectors
>
  • images: Cameras with images
  • scalars: Scalar BSA data and data indicating DAQ settings
  • metadata: Data about the dataset collected
  • arrays: Multi-dimensional BSA data (not used)
  • vectors: List of BSA data (not used)

2.2.3. Tree Tip Level

At the tips of the nested data class are actual data. For instance, scalars.step_num (which is a record of the step in a scan the dataset was on) shows:

>>> data.rdrill.data.raw.scalars.step_num
<E200.E200_load_data.Drill with keys:
_hdf5
IDtype
UID
dat
desc
>

Of these, UID, dat, and desc are interesting:

  • UID: An array of the UIDs available
  • dat: An array of the data available
  • desc: A description of the data in step_num

This holds true across all tree tips, except for in images, where dat is a file path to the data. While the file path is relative to the top of the directory holding all of the datasets, we have a way of loading images automatically, and this is not needed by the average analyst.

2.3. UID

While it’s possible to do a lot of statistical analysis off of a single measurement source, the real power of datasets is in correlating pieces of data. In order to do this, every single shot at FACET is designed to have a unique identification number or UID. Every piece of BSA data, whether it is an image or a number, is correlated to a UID.

2.3.1. Limitations

There are a few problems that may arise when trying to collect BSA data. The biggest is that, technically, it is not necessarily possible to start all of the data collection simultaneously. In this case, the UIDs for each shot should be collected correctly. But it is important to not assume that the beginnings or the ends line up. In fact, it is a near certainty that the UIDs of images and data will NOT line up.

2.3.2. Selecting Data by UID

The most obvious way to use UIDs is to create a master index of the desired UIDs. For example, it is possible to take only the first 10 UIDs common to step_num and step_value and use E200.E200_api_getdat() to load the corresponding data from any dataset:

>>> import numpy as np
>>> step_num = data.rdrill.data.raw.scalars.step_num
>>> step_value = data.rdrill.data.raw.scalars.step_value
>>> wanted_uids = np.intersect1d(step_num.UID, step_value.UID)[0:10]
>>> dat_step_num = E200.E200_api_getdat(step_num, wanted_uids)
>>> dat_step_val = E200.E200_api_getdat(step_value, wanted_uids)
>>> dat_step_val.dat
array([-2., -2., -2., -2., -2., -2., -2., -2., -2., -2.])
>>> dat_step_num.dat
array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

Note: it’s important to take the intersection. Without intersecting, it is impossible to be sure that the UIDs will exist for step_num and step_value. The API function E200.E200_api_getdat() is designed to mitigate this problem by returning an instance of the class E200.E200_Dat. This class has members:

2.4. Selecting Data by Value

Another likely scenario is to select all of the shots that correspond to a value. For instance, it may be desirable to select only the second step of a scan. This can be done using E200.E200_api_getUID() to get the uids where step_num equals 2:

>>> uids_step_num_2 = E200.E200_api_getUID(step_num, 2)
>>> uids_step_num_2
array([  1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12])
>>> uids_step_num_2.shape
(60,)

In this case, there are 60 UIDs for step 2.

2.5. Loading Images

Loading images is special, as images aren’t stored directly in the master file. In this case, use E200.E200_load_images():

>>> camera = data.rdrill.data.raw.images.CMOS_ELAN
>>> uid = camera.UID[0]
>>> images = E200.E200_load_images(camera, uid)
>>> images
<E200.classes.E200_Image at 0x11ae082e8>
>>> plt.imshow(images.images[0])
<matplotlib.image.AxesImage object at 0x105e9c080>
>>> plt.show()
_images/basic_usage.png

The class E200.E200_Image has several members. The one of interest are:

  • UID or uid: An array of the UIDs available
  • images: An array of images
  • imgs_subbed: An array of images with background subtraction (experimental)
  • image_backgrounds: The background image for the image data

2.5.1. For Loops and Images

If you want to iterate over images, it is advisable to use E200.E200_Image_Iter:

>>> for image in E200.E200_Image_Iter(camera, uids_step_num_2):
>>>     (do something)