In this example we are going to demonstrate image classification of different flowers. The deep learning will be done on WML-A using its PyTorch framework. In short, a sample dataset and code will be used to train a deep learning model, and then you may upload your some other image to use it.

First we list the necessary steps:

Here we have to write some more description of the different types of Models which are conceptually different. (We have not yet check against IBM documentations to see if they have named these concepts explicitly, or whether they use the same names):

  • Dataset is a set of data, usually consisting of both training data and testing data, and sometimes also validation data.
  • DL Model (Deep Learning Model) is a set of code/program which could carry out a task, for example, image classification.
  • Training Model is a DL Model attached with a Dataset. It can be trained. Training model could also be created from a previously trained Training Model and using that as a partial result/starting point.
  • Inference Model is converted from a trained Training Model and could be used to do inference, e.g., asking the model to classify a particular image.

In the WML-A UI you will see a lot of the word “Model” but they should not be mixed up.

Creating Dataset

  1. Logon the system
  2. Change current working path to the DLI home folder
    • cd /dli/u/$USER
  3. Copy the sample dataset to the home folder
    • cp /dli/ais/samples/flower.tar .
  4. Extract the content of the TAR archive
    • tar xvf flower.tar
    • (Output would continues for pages until the end)
  5. Change permission of the folder such that the WML-A system (the egoadmin user) could read the contents inside
    • setfacl -m u:egoadmin:rwx flower
    • setfacl -m g:egoadmin:rwx flower
  6. We have prepared the file on to the server. Now navigate your browser to and logon
  7. The initial page should look like this
  8. Go to the “Deep Learning” page via selecting in the menu: Workload -> Deep Learning
    • (Then you will see something like below (of course, names are redacted))
  9. Start preparing a new Dataset: Dataset -> New -> Images for Object Classification
  10. Input details for creating the dataset, specifically make the following changes
    • Dataset Name: flower-tfrecords-YYYYMMDD
    • Dataset stores image in: TFRecords
    • Generate record by: Class
    • Training Folder: /dli/u/$username/flower/
    • Portion of training images for validation: 10%
    • Portion of training images for testing: 10%
    • Output image color: Color
    • Make sure to uncheck Image Resize
    • Refer to the image below as a check (and for defaults of other parameters). Then click “Create”.
  11. The expected sequence of events are:
    1. Web interface say the dataset status is “Creating”. Impatient users may click the refresh button on the top () in order to refresh the status.
    2. Web interface say the dataset is “Created”
    3. Curious users may also click the name of the dataset and found that the dataset has been generated into /dli/ais/dli_result_fs/egoadmin/datasets/flower-tfrecords-YYYYMMDD , along with the time needed for the creation. (But this directory is not accessible by users.)

Creating the DL Model

  • Copy the sample DL model to the DLI home folder and extract the contents of that GZ archive
    • cp /dli/ais/samples/wmla-1.2.2_image-classification.tar.gz .
    • tar xvf wmla-1.2.2_image-classification.tar.gz
  • Change permission of the folder so that the WML-A system (the egoadmin user) could read the contents inside
    • setfacl -m u:egoadmin:rwx image-classification
    • setfacl -m g:egoadmin:rwx image-classification
  • Create the DL Model in the Web UI
    • Models -> New -> Add Location
    • Framework: TensorFlow
    • Version: v2
    • Name: TensorFlow-flower-YYYYMMDD
    • Path: /dli/u/$username/image-classification/
    • Click the “Add” button at the end
    • After Adding you will see the these two signs showing successful creation

Instantiating the DL Model into a Training Model

Select the DL Model we have created, and then click Next to add TensorFlow v2 model

Parameters to be used:

  • Model Name: TF-flower-distributed
  • Training engine: Distributed training
  • Distribute Strategy: Multi Worker Mirrored Strategy
  • Training dataset: flower-tfrecords-YYYYMMDD
  • API Type: tf.keras
  • Learing rate policy: Fixed
  • Base learning rate: 0.05
  • Optimizer type: Gradient Descent
  • Uncheck “Using hidden state size”
  • Max iterations or epochs: 100
  • Batch size: 10
  • Add last, click the “Add” button at the bottom

Training the Training Model

  • From the previous step you can see the “Model has been created”, so select the Training Model you just instantiated, and click “Train”
  • Set Number of Worker = 1 and click “Start training”
  • Click the name of your training model, you will see the training interface
  • Select the training tab to check the progress. Click the Refresh() button if necessary.

Convert Training Results into Inference Model

With a successfully trained model (status being “Finished”), we could select it and click “Create Inference Model:

System will give you a recommended name of the inference model based on the date and time. Click “Create an inference model” to accept the recommended name. Note that “-Inference” is added to the end of the name to distinguish it from a Training Model

After that, going back to the Model page, you can see your inference model:

Run the Inference Model with new data

  • Go to the Model page, and click on the name of the Inference Model
  • Click Test in the page
  • Click New Test
  • Set Threshold = 0.1
  • Upload a file for test (any flower image on the web would do), and click “Start Test”
  • The system will run the inference job. Note the status will change from “SUBMITTED” to, in this order: “WAITING”, “RUNNING”, “WAITING”, “FINISHED” (if no error occurs):
  • When finished, you may click on the name of this specific inference instance and see the result (similar to this):

Congratulations! Now you have successfully go through the example. You may take a look at the training dataset’s structure and see how you could fit your problem into the model and use the system to do your research problem.