-
Notifications
You must be signed in to change notification settings - Fork 74
Fine-tuning e2e integration test #372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
f71f441
make test work
tiffzhao5 998ce60
add status checking
tiffzhao5 7382423
fix
tiffzhao5 bb94ae0
test
tiffzhao5 e780b88
wget fix
tiffzhao5 e33c29a
final fixes
tiffzhao5 7ef9723
move namespace
tiffzhao5 9bae3aa
Merge branch 'main' into tiffany/fine-tune-e2e
tiffzhao5 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -121,15 +121,16 @@ jobs: | |
| command: | | ||
| sudo apt-get update && sudo apt-get install -y expect | ||
| pushd $HOME/project/.circleci/resources | ||
| kubectl create namespace model-engine | ||
| kubectl apply -f redis-k8s.yaml | ||
| kubectl apply -f postgres-k8s.yaml | ||
| kubectl create secret generic model-engine-postgres-credentials --from-literal=database_url=postgresql://postgres:circle_test@postgres.default:5432/circle_test | ||
| kubectl create secret generic model-engine-postgres-credentials --from-literal=database_url=postgresql://postgres:circle_test@postgres.default:5432/circle_test -n model-engine | ||
| export ISTIO_VERSION=1.15.0 | ||
| popd | ||
| curl -L https://istio.io/downloadIstio | TARGET_ARCH=x86_64 sh - | ||
| install istio-${ISTIO_VERSION}/bin/istioctl $HOME/bin | ||
| $HOME/bin/istioctl install --set profile=demo -y | ||
| kubectl create namespace model-engine | ||
| kubectl create configmap default-config --from-literal=config="$(cat $HOME/project/.circleci/resources/.minikube-config-map | envsubst)" | ||
| kubectl create configmap default-config --namespace model-engine --from-literal=config="$(cat $HOME/project/.circleci/resources/.minikube-config-map | envsubst)" | ||
| cat $HOME/project/.circleci/resources/.minikube-registry-creds | envsubst | expect | ||
|
|
@@ -142,7 +143,7 @@ jobs: | |
| name: Pre-load integration test images to minikube | ||
| command: | | ||
| docker build -f model-engine/model_engine_server/inference/pytorch_or_tf.base.Dockerfile \ | ||
| --build-arg BASE_IMAGE=pytorch/pytorch:1.7.1-cuda11.0-cudnn8-runtime \ | ||
| --build-arg BASE_IMAGE=python:3.8-slim \ | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. interesting, does it mean we could use this for torch images in docker-compose.yml?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hm I'm not sure actually |
||
| --build-arg REQUIREMENTS_FILE="$CIRCLE_SHA1-base-requirements.txt" \ | ||
| -t temp:1.7.1-cuda11.0-cudnn8-runtime-$CIRCLE_SHA1 . | ||
|
|
||
|
|
@@ -179,7 +180,10 @@ jobs: | |
| command: | | ||
| pushd $HOME/project | ||
| kubectl port-forward svc/model-engine 5001:80 & | ||
| GIT_TAG=$CIRCLE_SHA1 pytest integration_tests | ||
| export AWS_ACCESS_KEY_ID=$CIRCLECI_AWS_ACCESS_KEY | ||
| export AWS_SECRET_ACCESS_KEY=$CIRCLECI_AWS_SECRET_KEY | ||
| export GIT_TAG=$CIRCLE_SHA1 | ||
| pytest integration_tests | ||
|
|
||
| executors: | ||
| ubuntu-large: | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,31 +1,68 @@ | ||
| import pytest | ||
| import json | ||
| import os | ||
| import time | ||
|
|
||
| from .rest_api_utils import ( # CREATE_FINE_TUNE_DI_BATCH_JOB_BUNDLE_REQUEST, CREATE_FINE_TUNE_REQUEST, USER_ID_0, cancel_fine_tune_by_id, create_docker_image_batch_job_bundle, create_fine_tune, get_fine_tune_by_id, | ||
| import boto3 | ||
| import smart_open | ||
|
|
||
| from .rest_api_utils import ( | ||
| CREATE_FINE_TUNE_DI_BATCH_JOB_BUNDLE_REQUEST, | ||
| CREATE_FINE_TUNE_REQUEST, | ||
| USER_ID_0, | ||
| cancel_fine_tune_by_id, | ||
| create_docker_image_batch_job_bundle, | ||
| create_fine_tune, | ||
| get_fine_tune_by_id, | ||
| list_fine_tunes, | ||
| ) | ||
|
|
||
| MAX_RETRIES = 10 | ||
|
|
||
|
|
||
| @pytest.mark.skip(reason="test doesn't currently work, needs to be implemented correctly") | ||
| def test_fine_tunes() -> None: | ||
| # TODO: get this test to work (move LLM fine tune repository to database rather than in S3) | ||
| di_batch_job_id = create_docker_image_batch_job_bundle( | ||
| CREATE_FINE_TUNE_DI_BATCH_JOB_BUNDLE_REQUEST, USER_ID_0 | ||
| )["docker_image_batch_job_bundle_id"] | ||
| data = { | ||
| "test_base_model-lora": { | ||
| "docker_image_batch_job_bundle_id": di_batch_job_id, | ||
| "launch_bundle_config": {}, | ||
| "launch_endpoint_config": {}, | ||
| "default_hparams": {}, | ||
| "required_params": [], | ||
| } | ||
| } | ||
|
|
||
| # di_batch_job_id = create_docker_image_batch_job_bundle( | ||
| # CREATE_FINE_TUNE_DI_BATCH_JOB_BUNDLE_REQUEST, USER_ID_0 | ||
| # )["docker_image_batch_job_bundle_id"] | ||
| if os.getenv("CIRCLECI") == "true": | ||
| session = boto3.Session() | ||
| aws_s3_bucket = os.getenv("CIRCLECI_AWS_S3_BUCKET") | ||
| client = session.client("s3") | ||
| with smart_open.open( | ||
| f"s3://{aws_s3_bucket}/fine_tune_repository", | ||
| "w", | ||
| transport_params={"client": client}, | ||
| ) as f: | ||
| json.dump(data, f) | ||
|
|
||
| # create_response = create_fine_tune(CREATE_FINE_TUNE_REQUEST, USER_ID_0) | ||
| # fine_tune_id = create_response["id"] | ||
| create_response = create_fine_tune(CREATE_FINE_TUNE_REQUEST, USER_ID_0) | ||
| fine_tune_id = create_response["id"] | ||
|
|
||
| # get_response = get_fine_tune_by_id(fine_tune_id, USER_ID_0) | ||
| # assert get_response["id"] == fine_tune_id | ||
| get_response = get_fine_tune_by_id(fine_tune_id, USER_ID_0) | ||
| num_retries = 0 | ||
| while get_response["status"] not in ["SUCCESS", "FAILURE"]: | ||
| if num_retries >= MAX_RETRIES: | ||
| raise Exception("Fine tune job did not complete in time.") | ||
| num_retries += 1 | ||
| get_response = get_fine_tune_by_id(fine_tune_id, USER_ID_0) | ||
| time.sleep(10) | ||
| assert get_response["id"] == fine_tune_id | ||
| assert get_response["status"] == "SUCCESS" | ||
|
|
||
| # list_response_0_before = list_fine_tunes(USER_ID_0) | ||
| # num_jobs = len(list_response_0_before["jobs"]) | ||
| # assert num_jobs >= 1 | ||
| list_response_0_before = list_fine_tunes(USER_ID_0) | ||
| num_jobs = len(list_response_0_before["jobs"]) | ||
| assert num_jobs >= 1 | ||
|
|
||
| list_response_1 = list_fine_tunes(USER_ID_0) | ||
| assert len(list_response_1["jobs"]) == 0 | ||
| cancel_fine_tune_by_id(fine_tune_id, USER_ID_0) | ||
|
|
||
| # list_response_0_after = list_fine_tunes(USER_ID_0) | ||
| # assert len(list_response_0_after["jobs"]) == num_jobs - 1 | ||
| list_response_0_after = list_fine_tunes(USER_ID_0) | ||
| assert len(list_response_0_after["jobs"]) == num_jobs - 1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is integration test model-engine actually deployed in this namespace?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like from this
llm-engine/.circleci/config.yml
Line 164 in 4e2ea6c
llm-engine/charts/model-engine/values_circleci.yaml
Line 97 in 7ef9723
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
helm install model-engine model-engine --values model-engine/values_circleci_subst.yaml --set tag=$CIRCLE_SHA1 --atomic --debugdoes not specify which namespace to install the chart. it installs a chart defined in./model-enginefolder, with namemodel-engineendpoint and model-engine don't need to be in the same namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you try to remove these if this is actually not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually the namespace
model-engineis used for endpoints and batch jobs which was why I needed to create a new postgres secret thereThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this specific for finetune integration test? i don't think this blocks any other integration tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the postgres secret is specific because we're actually checking if the batch job gets created in the fine-tune integration test. there's one batch job integration test but it doesn't actually check for a successful creation which is why we didn't need this secret before.