Don`t Merge Yet - Add Caffe2, PyTorch and GPU support(using Accelerators) by Tomcli · Pull Request #24 · IBM/FfDL

Tomcli · 2018-02-23T23:31:31Z

Add new community CPU images for Caffe2 and PyTorch
Add sample CPU jobs for Caffe2 and PyTorch
Add other community learner images (including GPU images)
Add sample GPU jobs for TensorFlow and Caffe
We need to rebuild ffdl-lcm in order to reflect the new changes for the FfDL DockerHub images.

For GPU usage, temporary solution will be available at gpu-guide.

This PR fixes #16 , #17 , and #20 .

Tomcli · 2018-02-27T18:14:10Z

The new commit allow LCM to use different learner images (including GPU images). Now users can use the Framework version section to select the Framework version of their choice.

Please note that the example jobs are configured to pass for every framework on the list. Thus, it might not demonstrate the real world workload. (e.g. For Caffe2 the example is not performing epochs since the latest CPU image is not capable to do that yet.)

To run workloads using GPU, users must satisfy the following prerequisites. Currently, using feature gate Accelerators is the only option we have.

Please do not merge this PR until we build and upload the new version of ffdl-lcm image on DockerHub since the new example jobs is not backward compatible with the old ffdl-lcm image.

animeshsingh

@Tomcli please look at some comments

animeshsingh · 2018-02-24T02:08:43Z

docs/user-guide.md

 * [Caffe](http://caffe.berkeleyvision.org/) version "1.0-py2"
+* [Caffe2](https://caffe2.ai/) version "0.8.1"
+* [PyTorch](http://pytorch.org/) version "0.2"



Should we also test TF 1.5? Anything holding that?

animeshsingh · 2018-02-24T02:09:25Z

etc/examples/caffe2-model/LICENSE

@@ -0,0 +1,21 @@
+The MIT License (MIT)


Are there other MIT License files in the code?

animeshsingh · 2018-02-24T02:09:46Z

etc/examples/caffe2-model/LICENSE

@@ -0,0 +1,21 @@
+The MIT License (MIT)
+
+Copyright (c) Microsoft Corporation


This says "Copyright (c) Microsoft Corporation" ?

animeshsingh · 2018-02-28T07:40:07Z

etc/examples/caffe-gpu-model/lenet_solver.prototxt

@@ -0,0 +1,25 @@
+# The train/test net protocol buffer definition


Do we need to duplicate this whole file for a minimal change? Can we ask users to change 1/2 params manually - or give a script?
https://github.com/IBM/FfDL/blob/master/etc/examples/caffe-model/lenet_solver.prototxt

animeshsingh · 2018-02-28T07:40:28Z

etc/examples/caffe-gpu-model/lenet_train_test.prototxt

@@ -0,0 +1,168 @@
+name: "LeNet"


Same as above vis a vis duplication

animeshsingh · 2018-02-28T07:41:38Z

etc/examples/caffe-gpu-model/manifest.yml

@@ -0,0 +1,25 @@
+name: mnist-caffe-gpu-model


Same as above - can`t these be instructions like change these params in manifest + proto files, and then run? This way users also learn

animeshsingh · 2018-02-28T07:42:36Z

etc/examples/tf-gpu-model/input_data.py

@@ -1,4 +1,3 @@
-
 #!/usr/bin/env python


Same duplication issue...?

Tomcli · 2018-02-28T17:31:24Z

@animeshsingh For the first three comments I think you are looking at the older commit. Regrading to the duplication, I can merge the cpu and gpu example together and give a more detailed instructions in the gpu-guide.md.

Tomcli · 2018-02-28T22:19:40Z

I put the CPU and GPU in a single example, so for TensorFlow and Caffe, there will be an extra gpu-manifest.yml to guide the user how to deploy with GPU resources. Then, the detailed instructions on converting CPU jobs to GPU is available at gpu-guide.

add caffe2 and pytorch cpu support

5995eb7

Tomcli added the enhancement New feature or request label Feb 23, 2018

update LCM, learner config file, and example jobs

80c321c

Tomcli changed the title ~~Add Caffe2 and PyTorch CPU support~~ Add Caffe2 and PyTorch and GPU support(using Accelerators) Feb 27, 2018

Tomcli changed the title ~~Add Caffe2 and PyTorch and GPU support(using Accelerators)~~ Add Caffe2, PyTorch and GPU support(using Accelerators) Feb 27, 2018

animeshsingh changed the title ~~Add Caffe2, PyTorch and GPU support(using Accelerators)~~ Don`t Merge Yet - Add Caffe2, PyTorch and GPU support(using Accelerators) Feb 27, 2018

fix pytorch example bug

28a13b2

Tomcli mentioned this pull request Feb 27, 2018

PyTorch code from Object storage download not being defined correctly #25

Closed

animeshsingh added 2 commits February 27, 2018 23:28

Update gpu-guide.md

36ce19e

Update gpu-guide.md

f9e73f2

animeshsingh reviewed Feb 28, 2018

View reviewed changes

Tomcli added 4 commits February 28, 2018 10:05

merge CPU and GPU examples into a single example

777a0b4

add more tf framework versions

b01b1c8

fix typo

5d49637

add S3 prereq

00525f2

animeshsingh merged commit 80769d0 into master Feb 28, 2018

This was referenced Feb 28, 2018

Add support to switch to GPU learner images seamlessly #20

Closed

Adding support for Caffe2 and PyTorch #17

Closed

Tomcli deleted the new-frameworks branch February 28, 2018 22:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don`t Merge Yet - Add Caffe2, PyTorch and GPU support(using Accelerators)#24

Don`t Merge Yet - Add Caffe2, PyTorch and GPU support(using Accelerators)#24
animeshsingh merged 9 commits intomasterfrom
new-frameworks

Tomcli commented Feb 23, 2018 •

edited

Loading

Uh oh!

Tomcli commented Feb 27, 2018 •

edited

Loading

Uh oh!

animeshsingh left a comment

Uh oh!

animeshsingh Feb 24, 2018

Uh oh!

animeshsingh Feb 24, 2018

Uh oh!

animeshsingh Feb 24, 2018

Uh oh!

animeshsingh Feb 28, 2018

Uh oh!

animeshsingh Feb 28, 2018

Uh oh!

animeshsingh Feb 28, 2018

Uh oh!

animeshsingh Feb 28, 2018

Uh oh!

Tomcli commented Feb 28, 2018

Uh oh!

Tomcli commented Feb 28, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,21 @@
		The MIT License (MIT)

		Copyright (c) Microsoft Corporation

		@@ -0,0 +1,25 @@
		# The train/test net protocol buffer definition

Conversation

Tomcli commented Feb 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tomcli commented Feb 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

animeshsingh left a comment

Choose a reason for hiding this comment

Uh oh!

animeshsingh Feb 24, 2018

Choose a reason for hiding this comment

Uh oh!

animeshsingh Feb 24, 2018

Choose a reason for hiding this comment

Uh oh!

animeshsingh Feb 24, 2018

Choose a reason for hiding this comment

Uh oh!

animeshsingh Feb 28, 2018

Choose a reason for hiding this comment

Uh oh!

animeshsingh Feb 28, 2018

Choose a reason for hiding this comment

Uh oh!

animeshsingh Feb 28, 2018

Choose a reason for hiding this comment

Uh oh!

animeshsingh Feb 28, 2018

Choose a reason for hiding this comment

Uh oh!

Tomcli commented Feb 28, 2018

Uh oh!

Tomcli commented Feb 28, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Tomcli commented Feb 23, 2018 •

edited

Loading

Tomcli commented Feb 27, 2018 •

edited

Loading