2. API¶

The E200 package makes it easy to analyze datasets saved at FACET. In order to do analysis, some understanding of how data can be correlated in practice is necessary. For more information, see the Introduction.

2.1. Loading Data¶

Before anything can be done, a dataset must be loaded. When loading data, only data contained in the master file is loaded immediately. Other data, such as image data, must be loaded later. This is a practical consideration in order to load files quickly and avoid filling up memory, as there can be anywhere from one image to terabytes of images.

Data can be loaded in two main ways. The most accessible way is through E200.E200_load_data_gui(), which presents the user with a graphical file picker:

import E200
data = E200.E200_load_data_gui()

The file path picked is then passed through to E200.E200_load_data(). This function can, of course, be loaded directly instead of accessed through E200.E200_load_data_gui(). The function E200.E200_load_data() loads a file from a string:

import E200
filepath = 'nas/nas-li20-pm00/E217/2015/20150606/E217_17990/E217_17990.mat'
data = E200.E200_load_data(filepath)

2.2. Drill Data Class¶

It becomes immediately obvious that loaded data is returned in the form of the class E200.Drill. The dataset’s nested dictionary as returned from h5py is given by:

data.read_file

It is cumbersome to find all of the nested dictionaries, as tab completion does not work for dictionaries in the Python interpreter. Each nested level must be explored individually:

>>> list(data.read_file['data'].keys())
['VersionInfo', 'processed', 'raw', 'user']

2.2.1. Top Data Level¶

The E200.Drill class anticipates this problem: it is far simpler to enter:

>>> data.rdrill.data
<E200.E200_load_data.Drill with keys:
_hdf5
VersionInfo
processed
raw
user
>

One can immediately see that this class includes useful members. Ignoring _hdf5, which includes the dictionary for each level, the keys are:

VersionInfo: Information about the DAQ version used to collect data
raw: Data and references to data
processed: Data post-processed by other routines
user: Space to hold individuals’ calculations

2.2.2. Raw Data Level¶

Of these, only raw is expected to hold data. Exploring raw reveals its own levels:

>>> data.rdrill.data.raw
<E200.E200_load_data.Drill with keys:
_hdf5
arrays
images
metadata
scalars
vectors
>

images: Cameras with images
scalars: Scalar BSA data and data indicating DAQ settings
metadata: Data about the dataset collected
arrays: Multi-dimensional BSA data (not used)
vectors: List of BSA data (not used)

2.2.3. Tree Tip Level¶

At the tips of the nested data class are actual data. For instance, scalars.step_num (which is a record of the step in a scan the dataset was on) shows:

>>> data.rdrill.data.raw.scalars.step_num
<E200.E200_load_data.Drill with keys:
_hdf5
IDtype
UID
dat
desc
>

Of these, UID, dat, and desc are interesting:

UID: An array of the UIDs available
dat: An array of the data available
desc: A description of the data in step_num

This holds true across all tree tips, except for in images, where dat is a file path to the data. While the file path is relative to the top of the directory holding all of the datasets, we have a way of loading images automatically, and this is not needed by the average analyst.

2.3. UID¶

While it’s possible to do a lot of statistical analysis off of a single measurement source, the real power of datasets is in correlating pieces of data. In order to do this, every single shot at FACET is designed to have a unique identification number or UID. Every piece of BSA data, whether it is an image or a number, is correlated to a UID.

2.3.1. Limitations¶

There are a few problems that may arise when trying to collect BSA data. The biggest is that, technically, it is not necessarily possible to start all of the data collection simultaneously. In this case, the UIDs for each shot should be collected correctly. But it is important to not assume that the beginnings or the ends line up. In fact, it is a near certainty that the UIDs of images and data will NOT line up.

2.3.2. Selecting Data by UID¶

The most obvious way to use UIDs is to create a master index of the desired UIDs. For example, it is possible to take only the first 10 UIDs common to step_num and step_value and use E200.E200_api_getdat() to load the corresponding data from any dataset:

>>> import numpy as np
>>> step_num = data.rdrill.data.raw.scalars.step_num
>>> step_value = data.rdrill.data.raw.scalars.step_value
>>> wanted_uids = np.intersect1d(step_num.UID, step_value.UID)[0:10]
>>> dat_step_num = E200.E200_api_getdat(step_num, wanted_uids)
>>> dat_step_val = E200.E200_api_getdat(step_value, wanted_uids)
>>> dat_step_val.dat
array([-2., -2., -2., -2., -2., -2., -2., -2., -2., -2.])
>>> dat_step_num.dat
array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

Note: it’s important to take the intersection. Without intersecting, it is impossible to be sure that the UIDs will exist for step_num and step_value. The API function E200.E200_api_getdat() is designed to mitigate this problem by returning an instance of the class E200.E200_Dat. This class has members:

E200.E200_Dat.field: The field retrieved (usually dat)
E200.E200_Dat.uid: The UIDs retrieved
E200.E200_Dat.dat: The data retrieved, correlated by position to UID.

2.4. Selecting Data by Value¶

Another likely scenario is to select all of the shots that correspond to a value. For instance, it may be desirable to select only the second step of a scan. This can be done using E200.E200_api_getUID() to get the uids where step_num equals 2:

>>> uids_step_num_2 = E200.E200_api_getUID(step_num, 2)
>>> uids_step_num_2
array([  1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12,
         1.79900002e+12,   1.79900002e+12,   1.79900002e+12])
>>> uids_step_num_2.shape
(60,)

In this case, there are 60 UIDs for step 2.

2.5. Loading Images¶

Loading images is special, as images aren’t stored directly in the master file. In this case, use E200.E200_load_images():

>>> camera = data.rdrill.data.raw.images.CMOS_ELAN
>>> uid = camera.UID[0]
>>> images = E200.E200_load_images(camera, uid)
>>> images
<E200.classes.E200_Image at 0x11ae082e8>
>>> plt.imshow(images.images[0])
<matplotlib.image.AxesImage object at 0x105e9c080>
>>> plt.show()

The class E200.E200_Image has several members. The one of interest are:

UID or uid: An array of the UIDs available
images: An array of images
imgs_subbed: An array of images with background subtraction (experimental)
image_backgrounds: The background image for the image data

2.5.1. For Loops and Images¶

If you want to iterate over images, it is advisable to use E200.E200_Image_Iter:

>>> for image in E200.E200_Image_Iter(camera, uids_step_num_2):
>>>     (do something)

2. API¶

2.1. Loading Data¶

2.2. Drill Data Class¶

2.2.1. Top Data Level¶

2.2.2. Raw Data Level¶

2.2.3. Tree Tip Level¶

2.3. UID¶

2.3.1. Limitations¶

2.3.2. Selecting Data by UID¶

2.4. Selecting Data by Value¶

2.5. Loading Images¶

2.5.1. For Loops and Images¶

Table Of Contents

Previous topic

Next topic

This Page

Navigation

2. API¶

2.1. Loading Data¶

2.2. Drill Data Class¶

2.2.1. Top Data Level¶

2.2.2. Raw Data Level¶

2.2.3. Tree Tip Level¶

2.3. UID¶

2.3.1. Limitations¶

2.3.2. Selecting Data by UID¶

2.4. Selecting Data by Value¶

2.5. Loading Images¶

2.5.1. For Loops and Images¶

Navigation