T4: Automating the analysis of large datasets (CLI)
Introduction
This tutorial showcases the command-line parameters --recursive
and --profile
(see Advanced usage) to automate data
analysis, exemplarily the analysis pipeline of tutorial 2.
After this tutorial, you will be able to automate the application of
multiple analysis pipelines with additional, custom analysis steps to
large datasets.
Prerequisites
For this tutorial, you need:
Python 3.9 or above and DryMass version 0.11.0 or above (see Installing DryMass)
Experimental dataset: DHM_HL60_cells.zip [MSGG19]
Setup example measurements
To simulate the presence of several datasets, create a few copies of
DHM_HL60_cells.zip
in a designated folder (here C:\\Path\to\data\
).
You may also create subfolders and put copies of the dataset there.
The contents of the folder could be three copies:
DHM_HL60_cells_01.zip
DHM_HL60_cells_02.zip
DHM_HL60_cells_03.zip
Create and import analysis profiles
In tutorial 2, the analysis pipeline is executed
four times with a different [sphere]
section. For each of these
runs, we create a separate profile and import it into the local library
of DryMass so that we may use it later on. The contents of the first
profile are:
[bg]
phase border px = 30
phase profile = poly2o
[meta]
medium index = 1.335
pixel size um = 0.107
wavelength nm = 633.0
[roi]
pad border px = 80
size variation = 0.2
exclude overlap px = 100
ignore data = 8.4, 15.2, 18.2, 18.3, 35.2
[specimen]
size um = 13
[sphere]
method = edge
model = projection
The other profiles are identical to the first profile, except
for the [sphere]
section. First, download all profiles:
edge-projection:
t04_profile_edge.cfg
image-projection:
t04_profile_proj.cfg
image-rytov:
t04_profile_rytov.cfg
image-rytov-sc:
t04_profile_rytov-sc.cfg
and then import them into the local library via:
dm_profile add t4edge t04_profile_edge.cfg
dm_profile add t4proj t04_profile_proj.cfg
dm_profile add t4rytov t04_profile_rytov.cfg
dm_profile add t4rytov-sc t04_profile_rytov-sc.cfg
Once imported in the local library, the downloaded profiles may safely
be removed. You can list the available profile with the command
dm_profile list
, which should yield an output similar to this:
Available profiles:
- t4edge: C:\\Users\Something\drymass\profile_t4edge.cfg
- t4proj: C:\\Users\Something\drymass\profile_t4proj.cfg
- t4rytov-sc: C:\\Users\Something\drymass\profile_t4rytov-sc.cfg
- t4rytov: C:\\Users\Something\drymass\profile_t4rytov.cfg
Test the analysis pipeline
Before the next level of automation, let us first test the current analysis pipeline:
dm_analyze_sphere --recursive --profile t4edge "C:\\Path\to\data\"
where C:\\Path\to\data\
is the folder containing the experimental
data which is searched recursively (--recursive
) and t4edge
is the name of the profile that employs the edge-detection approach
to determine the radius and the refractive index of the cells.
Now, verify that all datasets were detected and that the analysis results
are identical to those of tutorial 2. The output of
the above command should be similar to:
DryMass version 0.8.0
Recursing into directory tree... Done.
Input 1/3: C:\\Path\to\data\DHM_HL60_cells_01.zip
Input 2/3: C:\\Path\to\data\DHM_HL60_cells_02.zip
Input 3/3: C:\\Path\to\data\DHM_HL60_cells_03.zip
Analyzing dataset 1/3.
Converting input data... Done.
Extracting ROIs... Done.
Plotting detected ROIs... Done
Performing sphere analysis... Done.
Plotting sphere images... Done
Analyzing dataset 2/3.
Converting input data... Done.
Extracting ROIs... Done.
Plotting detected ROIs... Done
Performing sphere analysis... Done.
Plotting sphere images... Done
Analyzing dataset 3/3.
Converting input data... Done.
Extracting ROIs... Done.
Plotting detected ROIs... Done
Performing sphere analysis... Done.
Plotting sphere images... Done
Further automation
In principle, we could now run all commands in succession to obtain the fitting results for all model functions:
dm_analyze_sphere --recursive --profile t4edge "C:\\Path\to\data\"
dm_analyze_sphere --recursive --profile t4proj "C:\\Path\to\data\"
dm_analyze_sphere --recursive --profile t4rytov "C:\\Path\to\data\"
dm_analyze_sphere --recursive --profile t4rytov-sc "C:\\Path\to\data\"
However, since these commands have a comparatively long running time, it makes sense to write a script that can run these commands automatically for a given path.
Windows users can create a command-script, a text file with the .cmd
extension (e.g. analysis.cmd
, with the following content:
dm_analyze_sphere --recursive --profile t4edge $1
dm_analyze_sphere --recursive --profile t4proj $1
dm_analyze_sphere --recursive --profile t4rytov $1
dm_analyze_sphere --recursive --profile t4rytov-sc $1
Linux and MacOS users can create a bash script (analysis.sh
), with
the following content:
#!/bin/bash
dm_analyze_sphere --recursive --profile t4edge $1
dm_analyze_sphere --recursive --profile t4proj $1
dm_analyze_sphere --recursive --profile t4rytov $1
dm_analyze_sphere --recursive --profile t4rytov-sc $1
To run the full analysis, you now only need to execute a single command:
# Windows users:
cd "C:\\Path\to\script"
.\analysis.cmd "C:\\Path\to\data\"
# Linux/MacOS users:
cd "/path/to/script"
bash analysis.sh "C:\\Path\to\data\"
Plotting the method comparison automatically
We would also like to automatically plot the comparison between the
methods, as in tutorial 2. To achieve this, we
modify the original python script to accept a path as a command line
argument and store the comparison plot as comparison.png
in
the results directories:
1import pathlib
2import sys
3
4import matplotlib.pylab as plt
5import numpy as np
6
7
8def dot_boxplot(ax, data, colors, labels, **kwargs):
9 """Combined box and scatter plot"""
10 box_list = []
11
12 for ii in range(len(data)):
13 # set same random state for every scatter plot
14 rs = np.random.RandomState(42).get_state()
15 np.random.set_state(rs)
16 y = data[ii]
17 x = np.random.normal(ii+1, 0.15, len(y))
18 plt.plot(x, y, 'o', alpha=0.5, color=colors[ii])
19 box_list.append(y)
20
21 ax.boxplot(box_list,
22 sym="",
23 medianprops={"color": "black", "linestyle": "solid"},
24 widths=0.3,
25 labels=labels,
26 **kwargs)
27 plt.grid(axis="y")
28
29
30def plot_comparison(path):
31 """Comparison plot of analysis results"""
32 ri_data = [
33 np.loadtxt(path / "sphere_image_rytov-sc_statistics.txt",
34 usecols=(1,)),
35 np.loadtxt(path / "sphere_image_rytov_statistics.txt",
36 usecols=(1,)),
37 np.loadtxt(path / "sphere_image_projection_statistics.txt",
38 usecols=(1,)),
39 np.loadtxt(path / "sphere_edge_projection_statistics.txt",
40 usecols=(1,)),
41 ]
42 colors = ["#E48620", "#DE2400", "#6e559d", "#048E00"]
43 labels = ["image rytov-sc", "image rytov",
44 "image projection", "edge projection"]
45 plt.figure(figsize=(8, 5))
46 ax = plt.subplot(111, title="HL60 (DHM)")
47 ax.set_ylabel("refractive index")
48 dot_boxplot(ax=ax, data=ri_data, colors=colors, labels=labels)
49 plt.tight_layout()
50 plt.savefig(path / "comparison.png", dpi=300)
51
52
53if __name__ == "__main__":
54 path = pathlib.Path(sys.argv[-1])
55 # recursive search for results directories
56 for rp in path.rglob("*_dm"):
57 if rp.is_dir():
58 plot_comparison(path=rp)
Put this python script (t04_method_comparison.py
) into the
same folder as your analysis script and add the following line to the
the analysis script:
python t04_method_comparison.py $1
The final analysis script should now look like this:
Windows:
t04_analysis.cmd
Linux/MacOS:
t04_analysis.sh
This script fully automates the entire analysis from loading raw data to generating a comparison plot.