Skip to content

WIP: Retrieval Augmented Diffusion Models#1846

Closed
isamu-isozaki wants to merge 96 commits intohuggingface:mainfrom
isamu-isozaki:rdm_retrieval
Closed

WIP: Retrieval Augmented Diffusion Models#1846
isamu-isozaki wants to merge 96 commits intohuggingface:mainfrom
isamu-isozaki:rdm_retrieval

Conversation

@isamu-isozaki
Copy link
Contributor

@isamu-isozaki isamu-isozaki commented Dec 27, 2022

Pulled code from patil's branch to start making the retriever class and training script for rdm. I'll base this code on

https://github.com/afiaka87/retrieval-augmented-diffusion

and

CompVis/latent-diffusion#111

@isamu-isozaki
Copy link
Contributor Author

I found that huggingface datasets library already has faiss integration. Trying to figure out how to combine with CLIPVisionModel now

@isamu-isozaki
Copy link
Contributor Author

@patil-suraj Hi! I moved to this branch. I started working on the retrieval class. Will keep working/testing on it tomorrow

@isamu-isozaki
Copy link
Contributor Author

I think I almost got the retrieval class working. Will prob finish today and push then work on the training script tomorrow. Will also post example results

@isamu-isozaki
Copy link
Contributor Author

Success! I'll update scripts and push

@isamu-isozaki
Copy link
Contributor Author

example
This is an example! Currently I embed the oxford pet database with the faiss index and I can use knn to get the 10 nearest neighbors using the text clip.

The code I used for this example can be found in examples/rdm in a jupyter notebook. I'll remove the notebook later on for clean code.

@isamu-isozaki
Copy link
Contributor Author

Next I'll start making a training script

@isamu-isozaki
Copy link
Contributor Author

I'll try using clip-retrieval tool by laion too as an option for training. I'll double check paper on the implementation then I'll go ahead and train

@neverix
Copy link
Contributor

neverix commented Dec 31, 2022

I get

ValueError: The component <class 'transformers.models.clip.image_processing_clip.CLIPImageProcessor'>
of <class 'diffusers.pipelines.rdm.pipeline_rdm.RDMPipeline'> cannot be loaded as it does not seem to have
any of the loading methods defined in {'ModelMixin': ['save_pretrained', 'from_pretrained'], 'SchedulerMixin': 
['save_config', 'from_config'], 'DiffusionPipeline': ['save_pretrained', 'from_pretrained'], 'OnnxRuntimeModel': 
['save_pretrained', 'from_pretrained'], 'PreTrainedTokenizer': ['save_pretrained', 'from_pretrained'], 
'PreTrainedTokenizerFast': ['save_pretrained', 'from_pretrained'], 'PreTrainedModel': 
['save_pretrained', 'from_pretrained'], 'FeatureExtractionMixin': ['save_pretrained', 'from_pretrained']}.

with fusing/rdm. What could cause this?

@isamu-isozaki
Copy link
Contributor Author

@neverix
Is this the error you get when doing

from diffusers import RDMPipeline

pipe = RDMPipeline.from_pretrained("fusing/rdm")
pipe.to("cuda")

prompt = "a happy pineapple" 
images = pipe(prompt).images

or

retrieved_images = # a list of PIL images 
images = pipe(prompt, retrieved_images=retrieved_images).images

?

@isamu-isozaki
Copy link
Contributor Author

I'll try to reproduce the problem and let you know. It is pretty weird since the ImageProcessor does not have a load from pretrained method so it's pretty weird trying to load it.

@neverix
Copy link
Contributor

neverix commented Dec 31, 2022

@neverix Is this the error you get when doing

from diffusers import RDMPipeline

pipe = RDMPipeline.from_pretrained("fusing/rdm")
pipe.to("cuda")

prompt = "a happy pineapple" 
images = pipe(prompt).images

or

retrieved_images = # a list of PIL images 
images = pipe(prompt, retrieved_images=retrieved_images).images

?

I get it in the model creation step, so there's not much of a difference

@isamu-isozaki
Copy link
Contributor Author

@neverix tnx was able to reproduce. I'll try figuring out a fix tomorrow

@isamu-isozaki
Copy link
Contributor Author

Interesting I got the same result when I did

pipeline = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")

@isamu-isozaki
Copy link
Contributor Author

ok! I made some changes so the retriever can index with a general model if given the argument given moco, simclr, ibot etc. Next, I will wrap out the training/inference

@isamu-isozaki
Copy link
Contributor Author

once that's done I'll fix the checks!

@isamu-isozaki
Copy link
Contributor Author

This is for my personal research but I also will try adding ibot embeddings support too. Honestly I doubt anything will need changing

@isamu-isozaki
Copy link
Contributor Author

I think I'll remove clip-retrieval from the script for now since

  1. Training with this script will end up doing a lot of requests to their web service which is probably not their intended use case
  2. The code will be way cleaner without supporting both clip-retrieval and Retriever
    Let me know if anyone has a demand for having clip-retrieval in their script and I'll add it back in!

@isamu-isozaki
Copy link
Contributor Author

Cleaned up the inference script some more. I think a lot of the common funcs I'll abstract away into some common files like datasets just to avoid copying code wrongly.

@isamu-isozaki
Copy link
Contributor Author

tomorrow I'll hopefully finish cleaning up the training scripts and might ask for a review again!

@isamu-isozaki
Copy link
Contributor Author

Not finished yet but some notes

  1. I can offload a lot of the processes in the dataset by having a pre-computation step where we compute the top k nearest neighbors beforehand and put them in a column. In that way, we save computation on the retrieval
  2. The same can be done with a lot of the processes in diffusers, not just here. Like for example, precomputing text embeddings, vae embeddings etc.

@isamu-isozaki
Copy link
Contributor Author

Anyways stopping a bit here for now but will resume tomorrow!

@isamu-isozaki
Copy link
Contributor Author

isamu-isozaki commented Mar 2, 2023

@patrickvonplaten @patil-suraj Hi! I think the training scripts might take a while so I can move them to a separate pr for an easier review!

@isamu-isozaki
Copy link
Contributor Author

For now, will be cleaning anyway!

@williamberman
Copy link
Contributor

hey @isamu-isozaki if we could isolate the PR to just the pipeline and remove the collosalai pipeline and the training scripts that would be helpful for getting the PR merged

@isamu-isozaki
Copy link
Contributor Author

@williamberman Got it! Sounds good. Will do tomorrow

@isamu-isozaki
Copy link
Contributor Author

@williamberman Hi! Just confirming but do you think I should keep the inference scripts? Can remove them too!

@isamu-isozaki
Copy link
Contributor Author

Let me remove it for now.

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Apr 15, 2023
@isamu-isozaki
Copy link
Contributor Author

Ah, let me clean up a bit more

@isamu-isozaki
Copy link
Contributor Author

sorry got a bit preoccupied. Let me close this pr and open it once I clean up some things

@un1tz3r0
Copy link

Hi so i believe i implemented something similar using clip retrieval back in the days of disco-diffusion (we've come such a long way since then)

I took care to implement an asyncio based high performance paralell downloader, it can grab thousands of images from the URLs returned by clip-retrieval in pretty reasonable time.

The repo is here un1tz3r0/anythingdiffusion

@isamu-isozaki
Copy link
Contributor Author

isamu-isozaki commented Apr 27, 2023

@un1tz3r0 nice! Looks awesome thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Issues that haven't received updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants