Skip to content

Add vit attribution hugo#154

Open
HugoDeBosschere wants to merge 25 commits into
add_vit_attribution_devfrom
add_vit_attribution_hugo
Open

Add vit attribution hugo#154
HugoDeBosschere wants to merge 25 commits into
add_vit_attribution_devfrom
add_vit_attribution_hugo

Conversation

@HugoDeBosschere

Copy link
Copy Markdown
Collaborator

Description

Added a small MVP for ViT attribution. For now no vizualisation but there is a Saliency method that executes.
I did not modify any preexisting code, it's only bonus.

Creation of:

  1. ImageGranularity mirroring Granularity with PATCH (default) and PIXEL granularities in interpreto/commons/image_granularity.py
  2. ImageClassificationInferenceWrapper in interpreto/model_wrapping/image_classification_inference_wrapper.py.
    It returns a flattened 2D tensors to be able to reuse for free the methods developped for text. In this class 3 methods were overriden:
    • init
    • _prepare_inputs
    • _compute_gradients
  3. ImageAttributionOutput and ImageAttributionExplainer in interpreto/attributions/image_base.py. ImageAttributionOutput has the same field as AttributionOutput except granularity is ImageGranularity and it stores the raw_image in raw_image for vizualisation purposes. ImageAttributionExplainer does almost exactly the same things as AttributionExplainer but some name change to adapt to image language and the explain function stores the raw_image in the raw_image field of AttributionOutput
    4.ImagePerturbator in interpreto/attributions/perturbations/image_base.py. Only the base class for the MVP so the perturbator is a no-op. Since we don't need to check for model_inputs or model_embeds it just returns the intputs. Allows to instantiate ImageSaliency to check if the pipeline runs.
  4. ImageSaliency in interpreto/attributions/methods/image_saliency.py. Same as Saliency but tokenizer -> image_processor
  5. first_tests/first_test_image.py to test the whole pipeline with a cat image. The cat image is not provided (sorry)
  • 🚀 New feature (non-breaking change which adds functionality)

Checklist

  • [ x] I've read the CODE_OF_CONDUCT.md document.
  • [x ] I've read the CONTRIBUTING.md guide.
  • I've successfully run the style checks using make lint.
  • I've written tests for all new methods and classes that I created and successfully ran make test.
  • I've written the docstring in Google format for all the methods and classes that I used.

HugoDeBosschere and others added 15 commits May 26, 2026 17:40
Additive image-modality extension on top of the merged attr-inference-refacto:

- ImageClassificationInferenceWrapper (ClassificationInferenceWrapper subclass):
  __init__ drops pad_token_id lookup; _prepare_inputs stacks pixel tensors
  without padding; _compute_gradients differentiates w.r.t. pixel_values and
  collapses channels via .abs().mean(dim=1).flatten() to fit the 1D `l` contract
  (l = H*W). Runtime assert on (3, H, W) channel dim in _prepare_inputs.

- ImageGranularity (standalone Enum, can't subclass Granularity): PIXEL/PATCH
  with DEFAULT=PATCH; duck-typed get_indices, get_association_matrix,
  granularity_score_aggregation (no generation branch), and get_decomposition
  returning (row, col) int tuples instead of strings. Generation/text-only
  branches are stripped; PATCH aggregation asserts >=2 pixels per unit.

- ImageAttributionOutput + ImageClassificationAttributionExplainer in a new
  attributions/image_base.py: AttributionOutput mirror with ImageGranularity
  default and tuple-coordinate elements; explainer subclasses
  ClassificationAttributionExplainer, swaps tokenizer for image_processor,
  drops the text-side setup_token_ids call (no pad/mask tokens for ViT),
  adds a preprocess flag, accepts PIL/numpy/torch.Tensor/BatchFeature in
  process_model_inputs, and rewrites explain() with patch_size in place of
  tokenizer and ImageAttributionOutput as the output type.

No tests, no perturbator, no visualization yet — gradient-only MVP.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…on method)

- ImagePerturbator: no-op image-side perturbator. Subclass of Perturbator that
  returns (model_inputs, None), replacing the text-keyed default that would
  KeyError on a ViT BatchFeature.
- ImageSaliency: thin subclass of ImageClassificationAttributionExplainer with
  use_gradient=True, input_x_gradient=True; no MultitaskExplainerMixin
  (classification-only MVP). Defaults to ImagePerturbator + default Aggregator.
- Wire ImagePerturbator as the default fallback in ImageClassificationAttributionExplainer.
- Re-exports through perturbations/, methods/, attributions/, and top-level
  interpreto/ (alongside ImageGranularity).
- Sanity-runs end-to-end against hf-internal-testing/tiny-random-vit: returns
  (1, 225) attributions matching the model's 15x15 patch grid.
- first_tests/first_test_image.py: ad-hoc sanity script (not a pytest test).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ded the firs_test_image.py file to check that it works
…near_interpolation_image_perturbation and the downstream methods that depend on it: ImageSmoothGrad, ImageIntergratedGradients and ImageGradientShap. I also added the tests of the methods in first_tests/first_test_image.py and plotted the results of the methods on a similar graph
…s ie the counterpart of IdsPerturbator for images. Added the specific logic for the perturbator and the actual explainer method for sobol, lime, occlusion, var_grad and square_grad. Also added it for kernel_shap but modified compared to the text version because the weighted sampling of the text version does not correspond to the actual shapley kernel. Modified first_tests/ to check with a tiny random vit that it prints something. also modified the relevant __init__.py files to add the newly defined techniques
…default number of perturbations to 10 for easier testing. for image_base.py the source of truth for patch_size is int the attributionexplainer
…the same plot and decided to leave the rescaling to the imshow function rather than leaving it to _prepare_heatmap so that we may plot a color bar on the side of each image that corresponds to the unormalized scores. adapted the actual_vit.py example to work with the new function to plot several different techniques on the same plot

@AntoninPoche AntoninPoche left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit messages are clear, so I can review them commit by commit. Its nice! I guess thanks, Claude ^^

Overall, it is clear, and it is nice that you already managed to have something running. My main comment is that ImageGranularity is far too complex for what you need for images. But you will really know after trying perturbations at both granularities.

In any case, you can discuss all comments; they are my opinions, not based on having tried the thing myself.

Comment thread interpreto/attributions/image_base.py Outdated
inference_mode: Callable[[torch.Tensor], torch.Tensor] = InferenceModes.LOGITS,
use_gradient: bool = False,
input_x_gradient: bool = True,
preprocess: bool = True,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure preprocessing by default is the way to go?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have said yes because in my head the typical use case would be using this on a PIL image where this requires preprocessing. Of course maybe I'm wrong and it will be more generally used in already preprocessed tensors in which case it may be better to put preprocess = False.
I think this has to be resolved by how it will most often be used, which I have little visibility on.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we process already processed inputs? Does it do something weird?

If so, then it might be tricky, otherwise, default preprocess is the way to go.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most cases it will do something weird indeed. But if the user does not preprocess the inputs it also does something weird (the ViT will not function as expected). So I don't know which one is best.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current behavior looks good to me. It is just that I was not familiar with how things work in ViTs, and I wanted the default choice to preprocess or not to be made consciously.

Comment thread interpreto/attributions/image_base.py Outdated
inference_mode: Callable[[torch.Tensor], torch.Tensor] = InferenceModes.LOGITS


class ImageClassificationAttributionExplainer(ClassificationAttributionExplainer):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot inherit directly from the ClassificationAttributionExplainer class. In particular, if your goal is just to reuse the target's preprocessing.

The goal is to find a way to use or adapt the MultitaskExplainerMixin that kind of serves as a factory. When you instantiate Saliency, it checks the model class and assigns the base task-explainer to inherit from both and have what you need.

The thing is that this hinders clarity. IMO, they should all be at the same level.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for now, it can be kept like this if you want easy tests.

Comment thread interpreto/attributions/image_base.py Outdated
Comment on lines +300 to +317
# Coordinate labels per granularity unit (replaces text's decoded-token strings).
# All samples share the same H, W after image_processor normalization, so the
# decomposition is identical across samples — compute it once on the first sample
# and share the reference. Text iterates per-sample because each sample can have
# a different sequence length; for image that variation doesn't exist.
# TODO: revisit — see project_vit_explainer_decomposition_refactor in auto-memory.
# `get_decomposition` already replicates internally based on the input's batch dim
# (`pixel_values.shape[0]`), but our per-sample BatchFeatures all have batch=1, so
# the internal replication is a no-op and we redo it here. Cleaner long-term: either
# strip the internal replication (return one decomposition) or concat samples into a
# single batched BatchFeature upstream and let `get_decomposition` produce the full
# `n_samples` list directly.
shared_coords: list[tuple[int, int]] = self.granularity.get_decomposition(
model_inputs_to_explain[0],
patch_size=self.patch_size,
return_coordinates=True,
)[0] # type: ignore
granular_inputs_coords: list[list[tuple[int, int]]] = [shared_coords for _ in model_inputs_to_explain]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you use this for.

This was used in the text to know to which words or tokens attributions correspond.

For images, you can just store the images in their original (h, w) and the attributions in the granularity (h, w). Then, for visualizations, we resize (which does nothing for pixels and something for patches). (Note that the way of resizing can be a visualization parameter.)

In summary, if you already store the image, I would set None for the elements or omit it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed and this outdated

Comment thread interpreto/attributions/image_base.py Outdated
sanitized_targets,
strict=True,
):
model_task, clean_contribution = self.post_processing(contribution)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do not implement self.post_processing. However, the parent class just does return ModelTask.CLASSIFICATION, contribution.

I do not think you should use the same ModelTask, at least for now.

So you can remove this line and just create an ImageClassification task to pass to your ImageAttributionOutput. Or just omit it.

If you omit everything you do not need, it might be easier to identify what's common and not for later merging.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I removed it.

Comment on lines +84 to +92
for row_start in range(0, h, patch_size):
for col_start in range(0, w, patch_size):
patch = []
for i in range(patch_size):
for j in range(patch_size):
# pixel positions in this patch, starting from row_start and col_start and moving by patch_size pixels
patch.append((i+row_start) * w + j + col_start)

per_sample.append(patch)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a simpler solution than 4 nested for loops.

In any case, my first intuition is that you do not the get_indices. At least for the two granularities you have here.

The granularity is used at two steps:

  • For perturbations, either your perturb the pixel_values or the patch so no need for granularity.
  • For Aggregation if your attributions are not already at the correct granularity. In this case, for now, you just have the case were you request patch for gradients where you need to downscale the attributions from pixel-wise to patch-wise.

So IMO, you do not need most of the granularity elements for images. It is useful for text because each sample has a different size, with words and sentences distributed across the whole sample. It is much more consistent with images.

Well, the above comments do not make sense if you have special patch/tokens which do not correspond to pixels but to constants (to my understanding, these are used in ViTs and such).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially outdated

# text does `.abs().mean(dim=-1)` to collapse the d=768/4096 width dim for memory;
# here we collapse channels for the 1D l contract (not memory) and flatten H,W to l=H*W
target_wise_mean_grads: Float[torch.Tensor, f"{c} l"] = (
target_wise_grads.abs().mean(dim=1).flatten(start_dim=1)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why flatten? I know it breaks the shape from text, but it makes much more sense to have an (H,W) shape IMO.

This depends on how the rest is done.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You told it was for aggregation right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows us to reuse aggregator entirely without changing anything. It also allowed me to copy with minimal changes a lot of the functions that were already written for text (such as the perturbators for example, or the granularity) so that we may more easily refactor after.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it is a good choice!

image_processor: BaseImageProcessor,
batch_size: int = 4,
granularity: ImageGranularity = ImageGranularity.DEFAULT,
granularity_aggregation_strategy: GranularityAggregationStrategy = GranularityAggregationStrategy.MEAN,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine that for images, this should correspond to the interpolation mode.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I correctly understand the comment but granularity_aggregation_strategy is the strategy used when use_gradient = True to go from pixel scores to granularity scores.

If the interpolation mode you reference here is the one in the aggregator then, from what I understand, this aggregates on the perturbations, not the granularity.

So I'm not sure about what to respond.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am talking about the granularity of the aggregation here.

After gradients on the pixel_values, if you want to go to the patch granularity, you need to resize the explanations.

The image resizing operation takes an interpolation parameter. This parameter is a kind of choice we make with the GranularityAggregationStrategy. Well, you could also call this pooling, but resize allows you to think at the image level.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I undestand better.
Interpolation and pooling both work but I have a preference for pooling since this looks a lot like your typical max pooling operation (when the strategy is MAX) and those terms are generally used in CNNs (here it would be a patch_size * patch_size pooling operation). If you think interpolation is better I can put interpolation.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think resizing is better adapted, indeed, because it considers the image as a whole. While pooling makes small windows and extracts a value from them.

Here, taking the maximum, mean, or any other value does not really make sense. Well, it kind of does the job, but resizing is a much better way to change an image's size. (Which also applies to attributions with shape (h, w)).

Some methods, like the CAM family or Rise, provide smooth, human-friendly attributions as a result.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'll look into replacing GranularityAggregationStrategy with an InterpolationStrategy and to define several stategies of interpolation

Comment thread pyproject.toml
"nvidia-cusparse-cu11>=11.7.4.91; sys_platform=='Linux'",
"nvidia-nccl-cu11>=2.14.3; sys_platform=='Linux'",
"nvidia-nvtx-cu11>=11.7.91; sys_platform=='Linux'",
"torchvision>=0.27.0",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should discuss with @fanny-jourdan if we want torchvision to be a main dependency or in something like pip install interpreto[vision].

Comment thread interpreto/attributions/image_base.py Outdated
The output of `image_processor(image, return_tensors="pt")` (a `BatchFeature`,
satisfies `TensorMapping`). Holds `pixel_values` of shape `(1, 3, H, W)`.

raw_image (PIL.Image | np.ndarray | torch.Tensor | None):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be harmonized before being saved. It will be much easier for visualizations.

So it should have a single type.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had kept it this way because I didn't want to plot a modified image for the user (I thought it was best if he saw the same image he had first loaded). I already did the visualization script so : do I still need to modify this field ?
If the concern was that the visualization was going to be hard then maybe I can keep it.
If it was simplicity then I can just use model_inputs_to_explain as the base image onto which to display the heatmap.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you are saying is that once processed, if we plot the image, it looks weird, so we need to keep a raw image, right?

If that is the case, I agree with you.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't actually tried to plot it with the processed image. Since the comment had not been submitted before I actually wrote the visualization functions, I used the raw_image and it works. Do you want me to try and see if I can also make it work with the processed image so that we may drop this field that may not be useful ?

Comment thread interpreto/attributions/image_base.py Outdated
Comment on lines +278 to +292
# Preserve each user-supplied raw image alongside its sanitized BatchFeature so
# the per-sample ImageAttributionOutput can carry it for visualization. The
# post-normalization pixel_values in model_inputs_to_explain are not directly
# displayable. None for samples that came in as BatchFeature or under preprocess=False.
raw_images: list[PILImage | np.ndarray | torch.Tensor | None]
if isinstance(model_inputs, list):
raw_images = [
m if self.preprocess and isinstance(m, (PILImage, np.ndarray, torch.Tensor)) else None
for m in model_inputs
]
elif self.preprocess and isinstance(model_inputs, (PILImage, np.ndarray, torch.Tensor)):
raw_images = [model_inputs]
else:
raw_images = [None] * len(model_inputs_to_explain)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should go in process_model_inputs. So you know that you return None if inputs are already processed.

Comment thread interpreto/attributions/perturbations/image_base.py
@AntoninPoche

AntoninPoche commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Hi @HugoDeBosschere, with @thomas-mullor we discussed how to tackle the perturbators problem.

@fanny-jourdan do not hesitate to give your opinion.

To summarize, the problem was that perturbators depend both on the method and the modality.
But we do not want to implement each combination of method and modality.

So we found a way to dynamically create the necessary perturbator by inheriting from both the method and modality specific classes.

Here is a summary of the modifications:

# attributions/perturbations/base.py

class Perturbator(ABC):
    @abstractmethod
    def perturb(self, inputs):
        pass

class TensorPerturbator(Perturbator):  # new class (just for typing and clarity)
    @abstractmethod
    def perturb_tensor(self):  # renaming of `perturb_embeds`
        pass

class TextTensorPerturbator(TensorPerturbator):  # renaming of `EmbeddingsPerturbator`
    def perturb(self, inputs):  # already exists
        # calls `perturb_tensor`

class ImageTensorPerturbator(TensorPerturbator):  # renaming of your `ImageEmbeddingsPerturbator`
    def perturb(self, inputs):  # already exists but can surely be simplified
        # calls `perturb_tensor`

class MaskPerturbator(Perturbator):  # new class (just for typing and clarity)
    @abstractmethod
    def get_mask(self):
        pass

class TextMaskPerturbator(MaskPerturbator):  # renaming of `IdsPerturbator`
    def perturb(self, inputs):  # already exists
        # calls `get_mask`

class ImageMaskPerturbator(MaskPerturbator):  # renaming of your `ImageIdsPerturbator`
    def perturb(self, inputs):  # already exists
        # calls `get_mask`
# attributions/perturbations/occlusion_perturbation.py

from .base import MaskPerturbator

class OcclusionPerturbator(MaskPerturbator):  # change inheritance, might make the type checker unhappy
    def get_mask(self, mask_dim):
        # should support both `TextMaskPerturbator` and `ImageMaskPerturbator` requirements
        # it might be nothing at first
# attributions/perturbations/gaussian_noise_perturbation.py

from .base import TensorPerturbator

class GaussianNoisePerturbator(TensorPerturbator):  # change inheritance, might make the type checker unhappy
    def perturb_tensor(self, inputs_embeds):
        # should support both `TextTensorPerturbator` and `ImageTensorPerturbator` requirements
        # therefore, support (1, l, d) and (1, c, h, w) shapes
        # (1, ...) -> (p, ...)
# attributions/base.py

from .perturbations.base import (
    Perturbator,
    TensorPerturbator,
    TextTensorPerturbator,
    ImageTensorPerturbator,
    MaskPerturbator,
    TextMaskPerturbator,
    ImageMaskPerturbator
)

class AttributionExplainer(ABC):
    associated_inference_wrapper: InferenceWrapper
    base_tensor_perturbator_class: type[TensorPerturbator]
    base_mask_perturbator_class: type[MaskPerturbator]

    # does not impact the methods

class TextClassificationExplainer(AttributionExplainer):
    associated_inference_wrapper: TextClassificationInferenceWrapper
    base_tensor_perturbator_class = TextTensorPerturbator
    base_mask_perturbator_class = TextMaskPerturbator

class TextGenerationExplainer(AttributionExplainer):
    associated_inference_wrapper: TextGenerationInferenceWrapper
    base_tensor_perturbator_class = TextTensorPerturbator
    base_mask_perturbator_class = TextMaskPerturbator

class ImageClassificationExplainer(AttributionExplainer):
    associated_inference_wrapper: ImageClassificationInferenceWrapper
    base_tensor_perturbator_class = ImageTensorPerturbator
    base_mask_perturbator_class = ImageMaskPerturbator

class MultitaskExplainerMixin:
    # no modifications specific to the new perturbator classes
    # still, it should include the ImageClassificationExplainer at some point
# attributions/methods/occlusion.py

class Occlusion(MultitaskExplainerMixin, AttributionExplainer):
    def __init__(...):
        # create the perturbator dynamically by inheriting from both the method and modality specific classes
        perturbator_class = type(
            "ModalitySpecific" + self.__class__.__name__,  # name
            (OcclusionPerturbator, self.base_mask_perturbator_class,),  # parent classes
            {}
        )
        perturbator = perturbator_class(
            tokenizer=tokenizer,
            granularity=granularity,
            replace_token_id=replace_token_id,
        )
        ...
# attributions/methods/smooth_grad.py

class SmoothGrad(MultitaskExplainerMixin, AttributionExplainer):
    def __init__(...):
        # create the perturbator dynamically by inheriting from both the method and modality specific classes
        perturbator_class = type(
            "ModalitySpecific" + self.__class__.__name__,  # name
            (GaussianNoisePerturbator, self.base_tensor_perturbator_class,),  # parent classes
            {}
        )
        perturbator = perturbator_class(
            inputs_embedder=model.get_input_embeddings(),
            n_perturbations=n_perturbations,
            std=noise_std
        )
        ...

…ngsPerturbator now becomes TensorPerturbator. perturb_embeds thus become perturb_tensor. IdsPerturbator becomes MaskPerturbator. Both kinds of perturbation have (task,modality) tuple children (Image Generation does not exist). There is also a new ImageInferenceWrapper and TextInferenceWrapper that both inherit from the InferenceWrapper class (which may need to be abstracted). AttributionExplainer has been abstracted and there are the same (task, modality) children as for the perturbations. All the other changes are just casading from these modifications (import changes, inheritance changes). I also had to copy and paste process_targets and process_inputs_to_explain_and_targets from TextClassificationAttributionExplainer to ImageClassificationAttributionExplainer since the latter used to inherit from the former and needed those methods to function correctly.
…he necessary imports into the files that were affected by this change
…of FactoryGeneratedMeta to avoid metaclass conflict. Changed the attributions test and ran them to ensure that nothing was broken by the new modifications
…y and created a new parent class called Granularity to harmonize typing in order to be able to merge the image attribution methods with the text attribution methods. Executed the pytest tests and they still work.
… for images in order to have a more image point of view on the resize explanations. Added 3 types of interpolation: BILINEAR, BICUBIC and AREA (which is just a mean) all derived from the torch library in order to be able to do the interpolation on gpu (I also changed the moment where contributions was put on cpu in order to be stored in ImageAttributionOutput). GranularityAggregationStrategy and GranularityResizeStrategy now both inherit from GranularityCombinationStrategy following the same pattern as the one from Granularity. This is done in order to then be able to implement all the methods in one and only class. All the Granularity related classes / methods have been put on the granularity.py file though the image_granularity.py files remain because I have not yet tested the changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants