Learning representations on graphs is foundational for many downstream tasks, and its synergy with diffusion models has emerged as a promising direction. However, diffusion-based methods for heterogeneous graphs remain underexplored, confronting two principal challenges: (1) The presence of noise and structural heterogeneity in graphs makes it challenging to accurately capture semantic transitions among diverse relation types. (2) The isotropic Gaussian noise used in forward diffusion fails to reflect graphs' inherent semantics and structural anisotropy. To address these, we propose ARDiff, a novel framework that integrates residual diffusion with anisotropic noise for heterogeneous graph learning. Specifically, we propose a semantic residual diffusion mechanism that progressively refines node embeddings by orchestrating transitions from low-semantic (high-noise) to high-semantic (low-noise) relational contexts, thus enabling step-wise distillation of task-relevant information. In addition, to address the limitations of conventional diffusion, we introduce an anisotropic diffusion strategy: in the forward process, noise injection is oriented by structural and semantic priors; in the denoising step, a conditional diffusion mechanism is guided by a random walk encoding, enhancing both topological consistency and semantic alignment. Extensive evaluation on heterogeneous graph datasets demonstrates that ARDiff significantly surpasses current leading methods in link prediction and node classification, setting a new paradigm and benchmark in heterogeneous graph representation learning.
- PyTorch 2.1.2
- Python 3.10(ubuntu22.04)
- CUDA 11.8
- dgl 2.1.0+cu118
The folder ARDiff-LP presents the code and datasets for link prediction, while ARDiff_NC for node classification.
.
├──ARDiff-LP
├── DataHandler.py
├── main.py
├── param.py
├── Utils
│ ├── TimeLogger.py
│ ├── Utils.py
├── Model.py
├──ARDiff-NC
├──DataHandler.py
├── main.py
├── param.py
├── Utils
│ ├── TimeLogger.py
│ ├── Utils.py
├── Model.py
└── README
We evaluate ARDiff on both the link prediction and node classification tasks. For link prediction, we utilize three publicly available datasets collected from real-world commercial platforms: Tmall, Retailrocket, and IJCAI. For the node classification task, we use two public datasets, DBLP and AMiner, which focus on publications and academic social ties. Below are the detailed descriptions for the experimental datasets.
| Dataset | User # | Item # | Link # | Interaction Types |
|---|---|---|---|---|
| Retail Rocket | 2174 | 30113 | 97,381 | View, Cart, Transaction |
| IJCAI | 17435 | 35920 | 799,368 | View, Favorite, Cart, Purchase |
| Tmall | 31882 | 31232 | 1,451,29 | View, Favorite, Cart, Purchase |
| Node | Metapath | Node | Metapath | ||
|---|---|---|---|---|---|
| DBLP | Author:4057 | APA | AMiner | paper:6564 | PAP |
| Paper:14328 | APCPA | author:13329 | PRP | ||
| Conference:20 | APTPA | Reference:35890 | POS | ||
| Term:7723 |
