Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -121,15 +121,16 @@ jobs:
command: |
sudo apt-get update && sudo apt-get install -y expect
pushd $HOME/project/.circleci/resources
kubectl create namespace model-engine
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is integration test model-engine actually deployed in this namespace?

Copy link
Copy Markdown
Contributor Author

@tiffzhao5 tiffzhao5 Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like from this

helm install model-engine model-engine --values model-engine/values_circleci_subst.yaml --set tag=$CIRCLE_SHA1 --atomic --debug
that the helm chart is installed in the default namespace but from here
endpoint_namespace: model-engine
the endpoints are deployed in the model-engine namespace

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helm install model-engine model-engine --values model-engine/values_circleci_subst.yaml --set tag=$CIRCLE_SHA1 --atomic --debug does not specify which namespace to install the chart. it installs a chart defined in ./model-engine folder, with name model-engine

endpoint and model-engine don't need to be in the same namespace

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you try to remove these if this is actually not used?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually the namespace model-engine is used for endpoints and batch jobs which was why I needed to create a new postgres secret there

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this specific for finetune integration test? i don't think this blocks any other integration tests

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the postgres secret is specific because we're actually checking if the batch job gets created in the fine-tune integration test. there's one batch job integration test but it doesn't actually check for a successful creation which is why we didn't need this secret before.

kubectl apply -f redis-k8s.yaml
kubectl apply -f postgres-k8s.yaml
kubectl create secret generic model-engine-postgres-credentials --from-literal=database_url=postgresql://postgres:circle_test@postgres.default:5432/circle_test
kubectl create secret generic model-engine-postgres-credentials --from-literal=database_url=postgresql://postgres:circle_test@postgres.default:5432/circle_test -n model-engine
export ISTIO_VERSION=1.15.0
popd
curl -L https://istio.io/downloadIstio | TARGET_ARCH=x86_64 sh -
install istio-${ISTIO_VERSION}/bin/istioctl $HOME/bin
$HOME/bin/istioctl install --set profile=demo -y
kubectl create namespace model-engine
kubectl create configmap default-config --from-literal=config="$(cat $HOME/project/.circleci/resources/.minikube-config-map | envsubst)"
kubectl create configmap default-config --namespace model-engine --from-literal=config="$(cat $HOME/project/.circleci/resources/.minikube-config-map | envsubst)"
cat $HOME/project/.circleci/resources/.minikube-registry-creds | envsubst | expect
Expand All @@ -142,7 +143,7 @@ jobs:
name: Pre-load integration test images to minikube
command: |
docker build -f model-engine/model_engine_server/inference/pytorch_or_tf.base.Dockerfile \
--build-arg BASE_IMAGE=pytorch/pytorch:1.7.1-cuda11.0-cudnn8-runtime \
--build-arg BASE_IMAGE=python:3.8-slim \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, does it mean we could use this for torch images in docker-compose.yml?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm I'm not sure actually

--build-arg REQUIREMENTS_FILE="$CIRCLE_SHA1-base-requirements.txt" \
-t temp:1.7.1-cuda11.0-cudnn8-runtime-$CIRCLE_SHA1 .

Expand Down Expand Up @@ -179,7 +180,10 @@ jobs:
command: |
pushd $HOME/project
kubectl port-forward svc/model-engine 5001:80 &
GIT_TAG=$CIRCLE_SHA1 pytest integration_tests
export AWS_ACCESS_KEY_ID=$CIRCLECI_AWS_ACCESS_KEY
export AWS_SECRET_ACCESS_KEY=$CIRCLECI_AWS_SECRET_KEY
export GIT_TAG=$CIRCLE_SHA1
pytest integration_tests

executors:
ubuntu-large:
Expand Down
2 changes: 1 addition & 1 deletion charts/model-engine/values_circleci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ config:

billing_queue_arn: none
cache_redis_url: redis://redis-message-broker-master.default/15
s3_file_llm_fine_tune_repository: "s3://$CIRCLECI_AWS_S3_BUCKET"
s3_file_llm_fine_tune_repository: "s3://$CIRCLECI_AWS_S3_BUCKET/fine_tune_repository"
dd_trace_enabled: false
istio_enabled: true
tgi_repository: "text-generation-inference"
Expand Down
73 changes: 55 additions & 18 deletions integration_tests/test_fine_tunes.py
Original file line number Diff line number Diff line change
@@ -1,31 +1,68 @@
import pytest
import json
import os
import time

from .rest_api_utils import ( # CREATE_FINE_TUNE_DI_BATCH_JOB_BUNDLE_REQUEST, CREATE_FINE_TUNE_REQUEST, USER_ID_0, cancel_fine_tune_by_id, create_docker_image_batch_job_bundle, create_fine_tune, get_fine_tune_by_id,
import boto3
import smart_open

from .rest_api_utils import (
CREATE_FINE_TUNE_DI_BATCH_JOB_BUNDLE_REQUEST,
CREATE_FINE_TUNE_REQUEST,
USER_ID_0,
cancel_fine_tune_by_id,
create_docker_image_batch_job_bundle,
create_fine_tune,
get_fine_tune_by_id,
list_fine_tunes,
)

MAX_RETRIES = 10


@pytest.mark.skip(reason="test doesn't currently work, needs to be implemented correctly")
def test_fine_tunes() -> None:
# TODO: get this test to work (move LLM fine tune repository to database rather than in S3)
di_batch_job_id = create_docker_image_batch_job_bundle(
CREATE_FINE_TUNE_DI_BATCH_JOB_BUNDLE_REQUEST, USER_ID_0
)["docker_image_batch_job_bundle_id"]
data = {
"test_base_model-lora": {
"docker_image_batch_job_bundle_id": di_batch_job_id,
"launch_bundle_config": {},
"launch_endpoint_config": {},
"default_hparams": {},
"required_params": [],
}
}

# di_batch_job_id = create_docker_image_batch_job_bundle(
# CREATE_FINE_TUNE_DI_BATCH_JOB_BUNDLE_REQUEST, USER_ID_0
# )["docker_image_batch_job_bundle_id"]
if os.getenv("CIRCLECI") == "true":
session = boto3.Session()
aws_s3_bucket = os.getenv("CIRCLECI_AWS_S3_BUCKET")
client = session.client("s3")
with smart_open.open(
f"s3://{aws_s3_bucket}/fine_tune_repository",
"w",
transport_params={"client": client},
) as f:
json.dump(data, f)

# create_response = create_fine_tune(CREATE_FINE_TUNE_REQUEST, USER_ID_0)
# fine_tune_id = create_response["id"]
create_response = create_fine_tune(CREATE_FINE_TUNE_REQUEST, USER_ID_0)
fine_tune_id = create_response["id"]

# get_response = get_fine_tune_by_id(fine_tune_id, USER_ID_0)
# assert get_response["id"] == fine_tune_id
get_response = get_fine_tune_by_id(fine_tune_id, USER_ID_0)
num_retries = 0
while get_response["status"] not in ["SUCCESS", "FAILURE"]:
if num_retries >= MAX_RETRIES:
raise Exception("Fine tune job did not complete in time.")
num_retries += 1
get_response = get_fine_tune_by_id(fine_tune_id, USER_ID_0)
time.sleep(10)
assert get_response["id"] == fine_tune_id
assert get_response["status"] == "SUCCESS"

# list_response_0_before = list_fine_tunes(USER_ID_0)
# num_jobs = len(list_response_0_before["jobs"])
# assert num_jobs >= 1
list_response_0_before = list_fine_tunes(USER_ID_0)
num_jobs = len(list_response_0_before["jobs"])
assert num_jobs >= 1

list_response_1 = list_fine_tunes(USER_ID_0)
assert len(list_response_1["jobs"]) == 0
cancel_fine_tune_by_id(fine_tune_id, USER_ID_0)

# list_response_0_after = list_fine_tunes(USER_ID_0)
# assert len(list_response_0_after["jobs"]) == num_jobs - 1
list_response_0_after = list_fine_tunes(USER_ID_0)
assert len(list_response_0_after["jobs"]) == num_jobs - 1
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ RUN apt-get update && apt-get install -y \
&& rm -rf /var/lib/apt/lists/*

# Apparently wget has a vulnerability so we remove it here
RUN apt-get remove wget -y
RUN dpkg -l | grep wget && apt-get remove wget -y || echo "wget not installed, skipping removal"

# Create a virtualenv for python so we install our packages in the right place
# Not sure how useful the existing contents of the pytorch image are anymore :/ Maybe it's used for cuda/cudnn installs
Expand Down