Adapt current implementation from [SABlock](https://github.com/Project-MONAI/MONAI/blob/2d0e0214ac80d97e6585edf59ffc6eed96bcfcdb/monai/networks/blocks/selfattention.py#L22) to include mask and be able to use in the autoregressive transformer
Adapt current implementation from SABlock to include mask and be able to use in the autoregressive transformer