WildCat Classifier Training and Application

The WildCat weakly supervised learning algorithm is used for tangle (and other object) detection and burden mapping on whole-slide histology images. This document describes how to train and apply WildCat.

Patch download and organization

Overview

Download patches

  1. Clone the repository https://github.com/pyushkevich/tanglethon-ipynb.

  2. Create a root directory for patch download. Here we refer to it as $PATCH_ROOT

  3. The script scripts/clone_samples is used to download samples from PHAS:

     clone_samples: download training samples from a histoannot server
     usage:
       clone_samples.sh [options]
     options:
       -s <url>    Server URL (default: https://histo.itksnap.org)
       -k <file>   Path to JSON file with your API key
       -t <int>    Task ID on the server
       -o <dir>    Output directory for the samples
    
  4. Obtain a key from your PHAS server by navigating to URL https://my.histoannot.url/auth/api/generate_key and save in a .json file. Here we assume you store it in ~/.histoannot_key.json. Note that when you navigate to this URL your earlier keys are invalidated.

  5. Determine which task to get patches from. On PHAS, the task id is a number that is included in the URL when you select a task. Here we let $TID refer to the task id.

  6. Clone the samples into the directory $PATCH_ROOT

     clone_samples.sh -s https://my.histoannot.url -k ~/.histoannot_key.json -t $TID -o $PATCH_ROOT/patches
    
  7. Verify patch organization. You should see a collection of files like these:

     $PATCH_ROOT/patches/all_patches/23409.png
     $PATCH_ROOT/patches/all_patches/23410.png
     ...
    

Create a WildCat experiment

A training task involves classifying a set of foreground objects vs. a set of background objects. For example, tangle-like objects vs. non-tangles. Both foreground and background can include more than one class from PHAS training.

  1. The script scripts/organize_folds is used to organize samples from PHAS into PyTorch-compatible datasets.

     organize_folds: generate train/val/test folds for samples from histoannot
     usage: 
       organize_folds.sh [options] 
     required options:
       -w <dir>    Working directory (created by clone_samples)
       -e <string> Experiment id/name (folder created in working directory)
       -l <file>   JSON file describing the classes and how they map to labels
     optional:
       -n <int>    Maximum number of samples to include (per fold)
       -t <int>    Max number of samples per class to take for train (2000)
       -v <int>    Max number of samples per class to take for val (1000)
       -T <int>    Max number of samples per class to take for test (0, i.e., all)
       -R <int>    Random seed 
       -s <file>   File listing specimens ids to use for testing. These specimens
                   will not be included in training
    
  2. Determine the set of available labels in your training set. Navigate to (replace [proj_id] with actual project ID).

    https://my.histoannot.url/dtrain/[proj_id]/labelsets
    

    _images/phas_label_listing.png

  3. Create a .json file (e.g., $PATCH_ROOT/exp01.json) for your experiment. This file explains how classes on the PHAS server are organized to form foreground/background classes, and which classes on PHAS are ignored. Note the regular expression ".*" is used to select all PHAS classes that have not been matched to either tangle or ignore categories. The order in which categories are specified matters here.

     [
       {
         "classname": "tangle",
         "labels": [ "GM_tangle", "GM_pretangle" ],
         "ignore": 0
       },
       {
         "classname": "ignore",
         "labels": [ "ask_expert", "ambiguous" ],
         "ignore": 1
       },
       {
         "classname": "nontangle",
         "labels": [ ".*" ],
         "ignore": 0
       }
     ]
    
  4. If you wish to split your specimens into train/test, create a text file $PATCH_ROOT/exp01_test.txt listing specimens to be used for testing. You will notice that clone_samples.sh created a specimen listing specimens.txt it its output directory.

  5. Run the organizing script:

    organize_folds.sh -w $PATCH_ROOT/patches -e exp01 -l $PATCH_ROOT/exp01.json -s $PATCH_ROOT/exp01_test.txt
    

    When completed you should have the following directory structure containing symlinks to patch .png files:

    $PATCH_ROOT/patches/exp01/patches/<train|val|test>/<tangle|nontangle>/12345.png
    

Train WildCat classifier in Jupyter

Alternatively, you can train the classifier from the command line, see below

  1. Start the Jupyter notebook app

  2. Duplicate, edit and run the notebook generic-wcu-train.ipynb.

    • At the beginning of the “Model Setup” section, change exp_dir to point to your experiment data directory ($PATCH_ROOT/patches/exp01)

      • You can also change some model training parameters here

    • Run all cells in the notebook. This should train the classifier for 30 epochs and save the model and parameters to the files

      $PATCH_ROOT/exp01/models/wildcat_upsample.dat
      $PATCH_ROOT/exp01/models/config.json
      
    • You should expect to have about 95% validation accuracy during traning. Could be lower or higher, of course depending on the type of inclusion being analyzed.

Evaluate WildCat classifier on test patches

This step is optional but nice to evaluate classifier accuracy and look at the kinds of patches the classifier is failing on.

  1. In Jupyter, shutdown all kernels to clear NVidia GPU memory

  2. Duplicate, edit and run the notebook generic-wcu-examine.ipynb.

    • At the beginning of the “Validation Setup” section, change exp_dir to point to your experiment data directory ($PATCH_ROOT/patches/exp01)

    • Also set the cname_obj and cname_bkg to your object/non-object classes (e.g., “tangle”/”non-tangle”)

    • Run all the cells in the notebook

      • This will generate the confusion matrix and ROC plot

      • It will visualize examples of false positives and false negatives on the test set

      • You will also see sample heat maps generated by WildCat for sample inputs

Evaluate WildCat classifier on whole-slide image

This step is also optional and will take longer. It allows you to explore whole-slide heat maps.

  1. In Jupyter, shutdown all kernels to clear NVidia GPU memory

  2. Download a whole-slide image that you wish to analyze (we will refer to it as $WSI_DIR/test_wsi.tiff).

  3. Duplicate, edit and run the notebook generic-wcu-scanslide.ipynb.

    • At the beginning of the “Experiment Setup” section, change exp_dir to point to your experiment data directory ($PATCH_ROOT/patches/exp01)

    • In the same cell, change wsi_file to point to the whole-slide image $WSI_DIR/test_wsi.tiff

    • Run the sections of the notebook:

      • In the “Load WSI”, the slide will be loaded

      • In the “Generate burden map” section, the slide will be analyzed patch by patch, generating burden map

      • In the “Visualize burden map” section, different parts of the slide can be plotted with the burden map. You will need to specify the regions of interest for zoomed in burden maps.

      _images/wildcat_burden_example.png

Validate WildCat classifier vs. Manual Ratings and Object Counts

Follow these instructions if you would like to compare the quantitative burden measure to manual counts of objects of interest (e.g., tangle counts), and subjective ratings. In the histology annotation platform, set up a new task, as shown below. Large boxes are used to designate areas for validation. They can be assigned labels corresponding to ratings. Small boxes are used to mark all individual inclusions of each type that are being counted.

_images/wildcat_counting_example.png

When you are ready to perform the evaluation:

  1. Clone the samples into the directory $PATCH_ROOT. Set CTID to the counting task id. Note the -X flag, which will download patches in the size drawn as opposed to default 512x512 patches previously cloned.

     clone_samples.sh -s https://my.histoannot.url -k ~/.histoannot_key.json -t $CTID -o $PATCH_ROOT/counting -X
    
  2. Duplicate, edit and run the notebook generic-wcu-counts.ipynb.

  3. You will need to set exp_dir to the directory where you performed WildCat training, and cnt_dir to $PATCH_ROOT/counting. You will also need to set the dictionaries container_labels and objects_of_interest_labels to match the labels used to mark large boxes and individual inclusions that should be included in the counting. Towards the end of the notebook, you will need to set box_labels to indicate your rating categories for the large boxes.

Training WildCat network from the command line

  1. Clone the repo https://github.com/pyushkevich/tangle-cnn-prod

  2. Install required packages

     pip -r requirements.txt
    
  3. Run training script as follows

     # Run this to see all options for training
     python wildcat_main.py train --help 
    
     # Run this to run with the default options
     python wildcat_main.py train \
        --expdir $PATCH_ROOT/patches/exp01
    

    The script will create a directory $PATCH_ROOT/patches/exp01/models containing your trained classifier

Applying WildCat models in batch mode

  1. Clone the repo https://github.com/pyushkevich/tangle-cnn-prod

  2. Prepare your WSI files for reading by OpenSlide, see WSI Processing

  3. Run the following script:

     # Run the code
     python -u scan_tangles.py apply \
       --slide $wsi_file \
       --output wsi_burden.nii.gz \
       --network $path_to_model