Describe the bug
When I was training the ViT with torch DistributedDataParallel, during backward, torch raises error and reports that
Parameters which did not receive grad for rank 0: vit.patch_embedding.cls_token
which means that the cls_token did not participate in the backward process.
I checked the implementation of ViT and PatchEmbeddingBlock and found the unused cls_token in monai.networks.blocks.patchembedding.py: PatchEmbeddingBlock.

To Reproduce
Steps to reproduce the behavior:
- set environment variable in shell
TORCH_DISTRIBUTED_DEBUG=INFO
- train
ViT with torch DistributedDataParallel
Describe the bug
When I was training the
ViTwithtorch DistributedDataParallel, during backward,torchraises error and reports thatwhich means that the
cls_tokendid not participate in the backward process.I checked the implementation of

ViTandPatchEmbeddingBlockand found the unusedcls_tokeninmonai.networks.blocks.patchembedding.py: PatchEmbeddingBlock.To Reproduce
Steps to reproduce the behavior:
TORCH_DISTRIBUTED_DEBUG=INFOViTwithtorch DistributedDataParallel