Skip to content

Commit

Permalink
Version 0.1 加了一点注释
Browse files Browse the repository at this point in the history
  • Loading branch information
sybs5968 committed Jan 10, 2024
0 parents commit ea3f852
Show file tree
Hide file tree
Showing 55 changed files with 1,125,313 additions and 0 deletions.
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.idea/*
__pycache__/*
notebooks/.ipynb_checkpoints/*
log/*
data/node_classification/foodweb/.ipynb_checkpoints/*
data/link_prediction/celegans_small/.ipynb_checkpoints/
models/__pycacche__/*
*pyc
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2020 Yanbang Wang

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
146 changes: 146 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# Distance-encoding for GNN design
This repository is the official PyTorch implementation of the *DEGNN* and *DEAGNN* framework reported in the paper: <br>
[*Distance-Encoding -- Design Provably More PowerfulGNNs for Structural Representation Learning*](https://arxiv.org/abs/2009.00142), to appear in NeurIPS 2020.

The project's home page is: <http://snap.stanford.edu/distance-encoding/>

## Authors & Contact
Pan Li, Yanbang Wang, Hongwei Wang, Jure Leskovec

Questions on this repo can be emailed to <[email protected]> (Yanbang Wang)

## Installation
Requirements: Python >= 3.5, [Anaconda3](https://www.anaconda.com/)

- Update conda:
```bash
conda update -n base -c defaults conda
```

- Install basic dependencies to virtual environment and activate it:
```bash
conda env create -f environment.yml
conda activate degnn-env
```

- Install PyTorch >= 1.4.0 and torch-geometric >= 1.5.0 (please refer to the [PyTorch](https://pytorch.org/) and [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/) official websites for more details). Commands examples are:
```bash
conda install pytorch=1.4.0 torchvision cudatoolkit=10.1 -c pytorch
pip install torch-scatter==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip install torch-sparse==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip install torch-cluster==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip install torch-spline-conv==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip install torch-geometric
```

The latest tested combination is: Python 3.8.2 + Pytorch 1.4.0 + torch-geometric 1.5.0.

## Quick Start
- To train **DEGNN-SPD** Task 2 (link prediction) on [C.elegans dataset](https://snap.stanford.edu/data/C-elegans-frontal.html):
```bash
python main.py --dataset celegans --feature sp --hidden_features 100 --prop_depth 1 --test_ratio 0.1 --epoch 300
```
&nbsp;&nbsp;&nbsp; This uses 100-dimensional hidden features, 80/10/10 split of train/val/test set, and trains for 300 epochs.

- To train **DEAGNN-SPD** for Task 3 (node-triads prediction) on C.elegans dataset:
```bash
python main.py --dataset celegans_tri --hidden_features 100 --prop_depth 2 --epoch 300 --feature sp --max_sp 5 --l2 1e-3 --test_ratio 0.1 --seed 9
```
&nbsp;&nbsp;&nbsp; This enables 2-hop propagation per layer, truncates distance encoding at 5, and uses random seed 9.

- To train **DEGNN-LP** (i.e. the random walk variant) for Task 1 (node-level prediction) on usa-airports using average accuracy as evaluation metric:
```bash
python main.py --dataset usa-airports --metric acc --hidden_features 100 --feature rw --rw_depth 2 --epoch 500 --bs 128 --test_ratio 0.1
```

Note that here the `test_ratio` currently contains both validation set and the actual test set, and will be changed to contain only test set.

- To generate **Figure2 LEFT** of the paper (Simulation to validate Theorem 3.3):
```bash
python main.py --dataset simulation --max_sp 10
```
&nbsp;&nbsp;&nbsp; The result will be plot to `./simulation_results.png`.


- All detailed training logs can be found at `<log_dir>/<dataset>/<training-time>.log`. A one-line summary will also be appended to `<log_dir>/result_summary.log` for each training instance.

## Usage Summary
```
Interface for DE-GNN framework [-h] [--dataset DATASET] [--test_ratio TEST_RATIO]
[--model {DE-GNN,GIN,GCN,GraphSAGE,GAT}] [--layers LAYERS]
[--hidden_features HIDDEN_FEATURES] [--metric {acc,auc}] [--seed SEED] [--gpu GPU]
[--data_usage DATA_USAGE] [--directed DIRECTED] [--parallel] [--prop_depth PROP_DEPTH]
[--use_degree USE_DEGREE] [--use_attributes USE_ATTRIBUTES] [--feature FEATURE]
[--rw_depth RW_DEPTH] [--max_sp MAX_SP] [--epoch EPOCH] [--bs BS] [--lr LR]
[--optimizer OPTIMIZER] [--l2 L2] [--dropout DROPOUT] [--k K] [--n [N [N ...]]]
[--N N] [--T T] [--log_dir LOG_DIR] [--summary_file SUMMARY_FILE] [--debug]
```

## Optinal Arguments
```
-h, --help show this help message and exit
# general settings
--dataset DATASET dataset name
--test_ratio TEST_RATIO
ratio of the test against whole
--model {DE-GCN,GIN,GAT,GCN,GraphSAGE}
model to use
--layers LAYERS largest number of layers
--hidden_features HIDDEN_FEATURES
hidden dimension
--metric {acc,auc} metric for evaluating performance
--seed SEED seed to initialize all the random modules
--gpu GPU gpu id
--adj_norm {asym,sym,None}
how to normalize adj
--data_usage DATA_USAGE
use partial dataset
--directed DIRECTED (Currently unavailable) whether to treat the graph as directed
--parallel (Currently unavailable) whether to use multi cpu cores to prepare data
# positional encoding settings
--prop_depth PROP_DEPTH
propagation depth (number of hops) for one layer
--use_degree USE_DEGREE
whether to use node degree as the initial feature
--use_attributes USE_ATTRIBUTES
whether to use node attributes as the initial feature
--feature FEATURE distance encoding category: shortest path or random walk (landing probabilities)
--rw_depth RW_DEPTH random walk steps
--max_sp MAX_SP maximum distance to be encoded for shortest path feature
# training settings
--epoch EPOCH number of epochs to train
--bs BS minibatch size
--lr LR learning rate
--optimizer OPTIMIZER
optimizer to use
--l2 L2 l2 regularization weight
--dropout DROPOUT dropout rate
# imulation settings (valid only when dataset == 'simulation')
--k K node degree (k) or synthetic k-regular graph
--n [N [N ...]] a list of number of nodes in each connected k-regular subgraph
--N N total number of nodes in simultation
--T T largest number of layers to be tested
# logging
--log_dir LOG_DIR log directory
--summary_file SUMMARY_FILE
brief summary of training result
--debug whether to use debug mode
```


## Reference
If you make use of the code/experiment of Distance-encoding in your work, please cite our paper:
```text
@article{li2020distance,
title={Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning},
author={Li, Pan and Wang, Yanbang and Wang, Hongwei and Leskovec, Jure},
journal={Advances in Neural Information Processing Systems},
volume={33},
year={2020}
}
```
3 changes: 3 additions & 0 deletions data/link_prediction/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
=== ABOUT ===

This folder contains the raw network data. For txt files, every row represents two vertices that form a link.
Loading

0 comments on commit ea3f852

Please sign in to comment.