WIP: Retrieval Augmented Diffusion Models#1846
WIP: Retrieval Augmented Diffusion Models#1846isamu-isozaki wants to merge 96 commits intohuggingface:mainfrom isamu-isozaki:rdm_retrieval
Conversation
|
I found that huggingface datasets library already has faiss integration. Trying to figure out how to combine with CLIPVisionModel now |
|
@patil-suraj Hi! I moved to this branch. I started working on the retrieval class. Will keep working/testing on it tomorrow |
|
I think I almost got the retrieval class working. Will prob finish today and push then work on the training script tomorrow. Will also post example results |
|
Success! I'll update scripts and push |
|
Next I'll start making a training script |
|
I'll try using clip-retrieval tool by laion too as an option for training. I'll double check paper on the implementation then I'll go ahead and train |
|
I get ValueError: The component <class 'transformers.models.clip.image_processing_clip.CLIPImageProcessor'>
of <class 'diffusers.pipelines.rdm.pipeline_rdm.RDMPipeline'> cannot be loaded as it does not seem to have
any of the loading methods defined in {'ModelMixin': ['save_pretrained', 'from_pretrained'], 'SchedulerMixin':
['save_config', 'from_config'], 'DiffusionPipeline': ['save_pretrained', 'from_pretrained'], 'OnnxRuntimeModel':
['save_pretrained', 'from_pretrained'], 'PreTrainedTokenizer': ['save_pretrained', 'from_pretrained'],
'PreTrainedTokenizerFast': ['save_pretrained', 'from_pretrained'], 'PreTrainedModel':
['save_pretrained', 'from_pretrained'], 'FeatureExtractionMixin': ['save_pretrained', 'from_pretrained']}.with |
|
@neverix or ? |
|
I'll try to reproduce the problem and let you know. It is pretty weird since the ImageProcessor does not have a load from pretrained method so it's pretty weird trying to load it. |
I get it in the model creation step, so there's not much of a difference |
|
@neverix tnx was able to reproduce. I'll try figuring out a fix tomorrow |
|
Interesting I got the same result when I did |
|
ok! I made some changes so the retriever can index with a general model if given the argument given moco, simclr, ibot etc. Next, I will wrap out the training/inference |
|
once that's done I'll fix the checks! |
|
This is for my personal research but I also will try adding ibot embeddings support too. Honestly I doubt anything will need changing |
|
I think I'll remove clip-retrieval from the script for now since
|
|
Cleaned up the inference script some more. I think a lot of the common funcs I'll abstract away into some common files like datasets just to avoid copying code wrongly. |
|
tomorrow I'll hopefully finish cleaning up the training scripts and might ask for a review again! |
|
Not finished yet but some notes
|
|
Anyways stopping a bit here for now but will resume tomorrow! |
|
@patrickvonplaten @patil-suraj Hi! I think the training scripts might take a while so I can move them to a separate pr for an easier review! |
|
For now, will be cleaning anyway! |
|
hey @isamu-isozaki if we could isolate the PR to just the pipeline and remove the collosalai pipeline and the training scripts that would be helpful for getting the PR merged |
|
@williamberman Got it! Sounds good. Will do tomorrow |
|
@williamberman Hi! Just confirming but do you think I should keep the inference scripts? Can remove them too! |
|
Let me remove it for now. |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
|
Ah, let me clean up a bit more |
|
sorry got a bit preoccupied. Let me close this pr and open it once I clean up some things |
|
Hi so i believe i implemented something similar using clip retrieval back in the days of disco-diffusion (we've come such a long way since then) I took care to implement an asyncio based high performance paralell downloader, it can grab thousands of images from the URLs returned by clip-retrieval in pretty reasonable time. The repo is here un1tz3r0/anythingdiffusion |
|
@un1tz3r0 nice! Looks awesome thanks |
Pulled code from patil's branch to start making the retriever class and training script for rdm. I'll base this code on
https://github.com/afiaka87/retrieval-augmented-diffusion
and
CompVis/latent-diffusion#111