T4: Automating the analysis of large datasets (CLI)

Introduction

This tutorial showcases the command-line parameters --recursive and --profile (see Advanced usage) to automate data analysis, exemplarily the analysis pipeline of tutorial 2. After this tutorial, you will be able to automate the application of multiple analysis pipelines with additional, custom analysis steps to large datasets.

Prerequisites

For this tutorial, you need:

Setup example measurements

To simulate the presence of several datasets, create a few copies of DHM_HL60_cells.zip in a designated folder (here C:\\Path\to\data\). You may also create subfolders and put copies of the dataset there. The contents of the folder could be three copies:

  • DHM_HL60_cells_01.zip

  • DHM_HL60_cells_02.zip

  • DHM_HL60_cells_03.zip

Create and import analysis profiles

In tutorial 2, the analysis pipeline is executed four times with a different [sphere] section. For each of these runs, we create a separate profile and import it into the local library of DryMass so that we may use it later on. The contents of the first profile are:

[bg]
phase border px = 30
phase profile = poly2o

[meta]
medium index = 1.335
pixel size um = 0.107
wavelength nm = 633.0

[roi]
pad border px = 80
size variation = 0.2
exclude overlap px = 100
ignore data = 8.4, 15.2, 18.2, 18.3, 35.2

[specimen]
size um = 13

[sphere]
method = edge
model = projection

The other profiles are identical to the first profile, except for the [sphere] section. First, download all profiles:

and then import them into the local library via:

dm_profile add t4edge t04_profile_edge.cfg
dm_profile add t4proj t04_profile_proj.cfg
dm_profile add t4rytov t04_profile_rytov.cfg
dm_profile add t4rytov-sc t04_profile_rytov-sc.cfg

Once imported in the local library, the downloaded profiles may safely be removed. You can list the available profile with the command dm_profile list, which should yield an output similar to this:

Available profiles:
 - t4edge: C:\\Users\Something\drymass\profile_t4edge.cfg
 - t4proj: C:\\Users\Something\drymass\profile_t4proj.cfg
 - t4rytov-sc: C:\\Users\Something\drymass\profile_t4rytov-sc.cfg
 - t4rytov: C:\\Users\Something\drymass\profile_t4rytov.cfg

Test the analysis pipeline

Before the next level of automation, let us first test the current analysis pipeline:

dm_analyze_sphere --recursive --profile t4edge "C:\\Path\to\data\"

where C:\\Path\to\data\ is the folder containing the experimental data which is searched recursively (--recursive) and t4edge is the name of the profile that employs the edge-detection approach to determine the radius and the refractive index of the cells. Now, verify that all datasets were detected and that the analysis results are identical to those of tutorial 2. The output of the above command should be similar to:

DryMass version 0.8.0
Recursing into directory tree... Done.
Input 1/3: C:\\Path\to\data\DHM_HL60_cells_01.zip
Input 2/3: C:\\Path\to\data\DHM_HL60_cells_02.zip
Input 3/3: C:\\Path\to\data\DHM_HL60_cells_03.zip
Analyzing dataset 1/3.
Converting input data... Done.
Extracting ROIs... Done.
Plotting detected ROIs... Done
Performing sphere analysis... Done.
Plotting sphere images... Done
Analyzing dataset 2/3.
Converting input data... Done.
Extracting ROIs... Done.
Plotting detected ROIs... Done
Performing sphere analysis... Done.
Plotting sphere images... Done
Analyzing dataset 3/3.
Converting input data... Done.
Extracting ROIs... Done.
Plotting detected ROIs... Done
Performing sphere analysis... Done.
Plotting sphere images... Done

Further automation

In principle, we could now run all commands in succession to obtain the fitting results for all model functions:

dm_analyze_sphere --recursive --profile t4edge "C:\\Path\to\data\"
dm_analyze_sphere --recursive --profile t4proj "C:\\Path\to\data\"
dm_analyze_sphere --recursive --profile t4rytov "C:\\Path\to\data\"
dm_analyze_sphere --recursive --profile t4rytov-sc "C:\\Path\to\data\"

However, since these commands have a comparatively long running time, it makes sense to write a script that can run these commands automatically for a given path.

Windows users can create a command-script, a text file with the .cmd extension (e.g. analysis.cmd, with the following content:

dm_analyze_sphere --recursive --profile t4edge $1
dm_analyze_sphere --recursive --profile t4proj $1
dm_analyze_sphere --recursive --profile t4rytov $1
dm_analyze_sphere --recursive --profile t4rytov-sc $1

Linux and MacOS users can create a bash script (analysis.sh), with the following content:

#!/bin/bash
dm_analyze_sphere --recursive --profile t4edge $1
dm_analyze_sphere --recursive --profile t4proj $1
dm_analyze_sphere --recursive --profile t4rytov $1
dm_analyze_sphere --recursive --profile t4rytov-sc $1

To run the full analysis, you now only need to execute a single command:

# Windows users:
cd "C:\\Path\to\script"
.\analysis.cmd "C:\\Path\to\data\"

# Linux/MacOS users:
cd "/path/to/script"
bash analysis.sh "C:\\Path\to\data\"

Plotting the method comparison automatically

We would also like to automatically plot the comparison between the methods, as in tutorial 2. To achieve this, we modify the original python script to accept a path as a command line argument and store the comparison plot as comparison.png in the results directories:

 1import pathlib
 2import sys
 3
 4import matplotlib.pylab as plt
 5import numpy as np
 6
 7
 8def dot_boxplot(ax, data, colors, labels, **kwargs):
 9    """Combined box and scatter plot"""
10    box_list = []
11
12    for ii in range(len(data)):
13        # set same random state for every scatter plot
14        rs = np.random.RandomState(42).get_state()
15        np.random.set_state(rs)
16        y = data[ii]
17        x = np.random.normal(ii+1, 0.15, len(y))
18        plt.plot(x, y, 'o', alpha=0.5, color=colors[ii])
19        box_list.append(y)
20
21    ax.boxplot(box_list,
22               sym="",
23               medianprops={"color": "black", "linestyle": "solid"},
24               widths=0.3,
25               labels=labels,
26               **kwargs)
27    plt.grid(axis="y")
28
29
30def plot_comparison(path):
31    """Comparison plot of analysis results"""
32    ri_data = [
33        np.loadtxt(path / "sphere_image_rytov-sc_statistics.txt",
34                   usecols=(1,)),
35        np.loadtxt(path / "sphere_image_rytov_statistics.txt",
36                   usecols=(1,)),
37        np.loadtxt(path / "sphere_image_projection_statistics.txt",
38                   usecols=(1,)),
39        np.loadtxt(path / "sphere_edge_projection_statistics.txt",
40                   usecols=(1,)),
41    ]
42    colors = ["#E48620", "#DE2400", "#6e559d", "#048E00"]
43    labels = ["image rytov-sc", "image rytov",
44              "image projection", "edge projection"]
45    plt.figure(figsize=(8, 5))
46    ax = plt.subplot(111, title="HL60 (DHM)")
47    ax.set_ylabel("refractive index")
48    dot_boxplot(ax=ax, data=ri_data, colors=colors, labels=labels)
49    plt.tight_layout()
50    plt.savefig(path / "comparison.png", dpi=300)
51
52
53if __name__ == "__main__":
54    path = pathlib.Path(sys.argv[-1])
55    # recursive search for results directories
56    for rp in path.rglob("*_dm"):
57        if rp.is_dir():
58            plot_comparison(path=rp)

Put this python script (t04_method_comparison.py) into the same folder as your analysis script and add the following line to the the analysis script:

python t04_method_comparison.py $1

The final analysis script should now look like this:

This script fully automates the entire analysis from loading raw data to generating a comparison plot.