Merge branch 'xsw_doc' into 'master'

Xsw doc See merge request wjie98/starrygl!7

Merge branch 'xsw_doc' into 'master'
Xsw doc See merge request wjie98/starrygl!7
6d2c3ab8 · Wenjie Huang · 36c9a61a · 96ccc310 · 6d2c3ab8 · 36c9a61a
Commit 6d2c3ab8 authored Jan 25, 2024 by Wenjie Huang
6 changed files
--- a/docs/source/tutorial/dataset.rst
+++ b/docs/source/tutorial/dataset.rst
@@ -3,7 +3,12 @@ Preparing the Temporal Graph Dataset
 In this tutorial, we will show the preparation process of the temporal graph datase that can be used by StarryGL.
-Read Raw Data
+1 Preparing the Temporal Graph Dataset for CTDG
+-------------
+This section writes the steps to prepare the dataset for CTDG.
+1.1 Read Raw Data
 -------------
 Take Wikipedia dataset as an example, the raw data files are as follows:
@@ -37,7 +42,7 @@ Here is an example to read the raw data files:
    print('the number of nodes in graph is {}, \
        the number of edges in graph is {}'.format(num_nodes, num_edges))
-Preprocess Data
+1.2 Preprocess Data
 ---------------
 After reading the raw data, we need to preprocess the data to get the data format that can be used by StarryGL. The following code shows the preprocessing process:
@@ -114,3 +119,157 @@ Finally, we can partition the graph and save the data:
    partition_save('./dataset/here/'+data_name, data, 16, 'metis_for_tgnn',
               edge_weight_dict=edge_weight_dict)
+2 Preparing the Temporal Graph Dataset for DTDG
+-------------
+This section writes the steps to prepare the dataset for DTDG.
+2.1 Processing the raw data
+-------------
+Take elliptic dataset as an example, the raw data files are as follows:
+- `elliptic_txs_features.csv`: the node features of the graph
+- `elliptic_txs_edgelist.csv`: the edges of the graph of all the time
+- `elliptic_txs_classes.csv`: the class of all the nodes of the graph
+To better use this dataset on a discrete-time dynamic graph model, we applied some data processing to it, and end up with 3 more files:
+- `elliptic_txs_orig2contiguos.csv`: the mapping relationship between the original node and the successive nodes
+- `elliptic_txs_nodetime.csv`: the time stamps of nodes of the graph
+- `elliptic_txs_edgelist_timed.csv`: the temporal edges of the graph
+This dataset is then called elliptic_temporal.The process of getting the most important file `elliptic_txs_edgelist_timed.csv` is as follows:
+.. code-block:: python
+    import pandas as pd
+    mapping = pd.read_csv('elliptic_txs_orig2contiguos.csv')
+    edgelist = pd.read_csv('elliptic_txs_edgelist.csv')
+    nodetime = pd.read_csv('elliptic_txs_nodetime.csv')
+    temp1 = pd.merge(edgelist, mapping, left_on='txId1', right_on='originalId', how='left')
+    temp1.drop(['originalId'], axis=1, inplace=True)
+    temp1.drop(['txId1'], axis=1, inplace=True)
+    temp1 = temp1[['contiguosId', 'txId2']]
+    temp1.columns = ['txId1', 'txId2']
+    temp2 = pd.merge(temp1, mapping, left_on='txId2', right_on='originalId', how='left')
+    temp2.drop(['originalId'], axis=1, inplace=True)
+    temp2.drop(['txId2'], axis=1, inplace=True)
+    temp2.columns = ['txId1', 'txId2']
+    temp3 = pd.merge(temp2, nodetime, left_on='txId1', right_on='txId', how='left')
+    temp3.drop(['txId'], axis=1, inplace=True)
+    temp3.columns = ['txId1', 'txId2', 'timestep1']
+    edgelist_timed = pd.merge(temp3, nodetime, left_on='txId2', right_on='txId', how='left')
+    edgelist_timed.drop(['txId'], axis=1, inplace=True)
+    edgelist_timed.columns = ['txId1', 'txId2', 'timestep1', 'timestep2']
+    edgelist_timed.drop(['timestep2'], axis=1, inplace=True)
+    edgelist_timed.columns = ['txId1', 'txId2', 'timestep']
+    edgelist_timed.to_csv('elliptic_txs_edgelist_timed.csv', index=False)
+2.2 Read raw data and preprocess Data
+---------------
+After the previous step, we will read in our dataset and use a separate wrapped class to process the corresponding data:
+.. code-block:: python
+    class Elliptic_Temporal_Dataset():
+        def __init__(self, path):
+            tar_file = os.path.join(path, 'elliptic.tar.gz')
+            tar_archive = tarfile.open(tar_file, 'r:gz')
+            self.nodes_labels_times = self.load_node_labels(tar_archive)
+            self.edges = self.load_transactions(tar_archive)
+            self.nodes, self.nodes_feats = self.load_node_feats(tar_archive)
+            self.max_degree = get_max_degs(self)
+        def load_node_feats(self, tar_archive):
+            data = load_data_from_tar('elliptic_txs_features.csv', tar_archive, starting_line=0)
+            nodes = data
+            nodes_feats = nodes[:,1:]
+            self.num_nodes = len(nodes)
+            self.feats_per_node = data.size(1) - 1
+            return nodes, nodes_feats.float()
+        def load_node_labels(self, tar_archive):
+            labels = load_data_from_tar('elliptic_txs_classes.csv', tar_archive, replace_unknow=True).long()
+            times = load_data_from_tar('elliptic_txs_nodetime.csv', tar_archive, replace_unknow=True).long()
+            lcols = Namespace({'nid': 0, 'label': 1})
+            tcols = Namespace({'nid':0, 'time':1})
+            nodes_labels_times =[]
+            for i in range(len(labels)):
+                label = labels[i,[lcols.label]].long()
+                if label>=0:
+                    nid=labels[i,[lcols.nid]].long()
+                    time=times[nid,[tcols.time]].long()
+                    nodes_labels_times.append([nid , label, time])
+            nodes_labels_times = torch.tensor(nodes_labels_times)
+            return nodes_labels_times
+        def load_transactions(self, tar_archive):
+            data = load_data_from_tar('elliptic_txs_edgelist_timed.csv', tar_archive, type_fn=float, tensor_const=torch.LongTensor)
+            tcols = Namespace({'source': 0,
+                                 'target': 1,
+                                 'time': 2})
+            data = torch.cat([data,data[:,[1,0,2]]])
+            self.max_time = data[:,tcols.time].max()
+            self.min_time = data[:,tcols.time].min()
+            return {'idx': data, 'vals': torch.ones(data.size(0))}
+We construct a wrapped Elliptic_Temporal_Dataset object to store the data. The data object contains the following attributes:
+- `nodes_labels_times`: a tensor that contains the label and time information of each node. Each element is a list containing the ID of the node, a label, and the time.
+- `edges`: a dictionary that contains 2 keys: idx and vals. The value of the idx key is a tensor containing the source node, the destination node, and the time, and the value of the vals key is a tensor with 1s.
+- `nodes`: a tensor that contains the features of each node. Each element is a list of node ids and attributes.
+- `nodes_feats`: a tensor that contains only the features for each node.
+- `max_degree`: a tensor that stores the maximum out-degree over all time steps in the dataset.
+2.3 Generate a graph from graph data
+---------------
+In order to facilitate further processing later, the corresponding graph is generated from the graph dataset using the encapsulated function:
+.. code-block:: python
+    graph, dataset = prepare_data2(args, data_root, dist.get_world_size(group), dataset)
+    def prepare_data2(args, root, num_partitions, dataset):
+        hist_adj_list, hist_ndFeats_list, hist_mask_list, existing_nodes = preprocess(dataset)
+        edge_index, edge_attr, edge_times, x, exists = [], [], [], [], []
+        num_snapshots = len(hist_adj_list)
+        for i in range(num_snapshots):
+            edge_index.append(hist_adj_list[i]['idx'].t())
+            edge_attr.append(hist_adj_list[i]['vals'])
+            edge_times.append(torch.full_like(edge_attr[i], i))
+            x.append(make_sparse_tensor(hist_ndFeats_list[i], tensor_type='float',
+                                            torch_size=[dataset.num_nodes, dataset.feats_per_node]).to_dense()[:, None, :])
+        edge_index = torch.cat(edge_index, dim=1)
+        edge_times = torch.cat(edge_times, dim=0)
+        x = torch.cat(x, dim=1)
+        edge_attr = torch.cat(edge_attr, dim=0).type_as(x)
+        g = GraphData(edge_index, num_nodes=x.size(0))
+        g.node()["x"] = x
+        g.edge()["time"] = edge_times
+        g.edge()["attr"] = edge_attr
+        g.meta()["num_nodes"] = x.size(0)
+        g.meta()["num_snapshots"] = num_snapshots
+        g.save_partition(root, num_partitions, algorithm="random")
+        return g, dataset
+Finally, a GraphData object g is obtained, at the same time it is partitioned and saved.It contains the following attributes:
+- `x`: the attributes of the nodes
+- `edge_times`: the time steps of the edges
+- `edge_attr`: the attributes of the edges
+- `num_nodes`: the global number of nodes
+- `num_snapshots`: the number of snapshots globally
--- a/docs/source/tutorial/dataset_DTDG.rst
+++ b/docs/source/tutorial/dataset_DTDG.rst
-Preparing the Temporal Graph Dataset for DTDG
-====================================
-In this tutorial, we will show the preparation process of the temporal graph dataset that can be used by StarryGL.
-Processing the raw data
-------------
-Take elliptic dataset as an example, the raw data files are as follows:
- `elliptic_txs_features.csv`: the node features of the graph
- `elliptic_txs_edgelist.csv`: the edges of the graph of all the time
- `elliptic_txs_classes.csv`: the class of all the nodes of the graph
-To better use this dataset on a discrete-time dynamic graph model, we applied some data processing to it, and end up with 3 more files:
- `elliptic_txs_orig2contiguos.csv`: the mapping relationship between the original node and the successive nodes
- `elliptic_txs_nodetime.csv`: the time stamps of nodes of the graph
- `elliptic_txs_edgelist_timed.csv`: the temporal edges of the graph
-This dataset is then called elliptic_temporal.The process of getting the most important file `elliptic_txs_edgelist_timed.csv` is as follows:
-.. code-block:: python
-    import pandas as pd
-    mapping = pd.read_csv('elliptic_txs_orig2contiguos.csv')
-    edgelist = pd.read_csv('elliptic_txs_edgelist.csv')
-    nodetime = pd.read_csv('elliptic_txs_nodetime.csv')
-    temp1 = pd.merge(edgelist, mapping, left_on='txId1', right_on='originalId', how='left')
-    temp1.drop(['originalId'], axis=1, inplace=True)
-    temp1.drop(['txId1'], axis=1, inplace=True)
-    temp1 = temp1[['contiguosId', 'txId2']]
-    temp1.columns = ['txId1', 'txId2']
-    temp2 = pd.merge(temp1, mapping, left_on='txId2', right_on='originalId', how='left')
-    temp2.drop(['originalId'], axis=1, inplace=True)
-    temp2.drop(['txId2'], axis=1, inplace=True)
-    temp2.columns = ['txId1', 'txId2']
-    temp3 = pd.merge(temp2, nodetime, left_on='txId1', right_on='txId', how='left')
-    temp3.drop(['txId'], axis=1, inplace=True)
-    temp3.columns = ['txId1', 'txId2', 'timestep1']
-    edgelist_timed = pd.merge(temp3, nodetime, left_on='txId2', right_on='txId', how='left')
-    edgelist_timed.drop(['txId'], axis=1, inplace=True)
-    edgelist_timed.columns = ['txId1', 'txId2', 'timestep1', 'timestep2']
-    edgelist_timed.drop(['timestep2'], axis=1, inplace=True)
-    edgelist_timed.columns = ['txId1', 'txId2', 'timestep']
-    edgelist_timed.to_csv('elliptic_txs_edgelist_timed.csv', index=False)
-Read raw data and preprocess Data
---------------
-After the previous step, we will read in our dataset and use a separate wrapped class to process the corresponding data:
-.. code-block:: python
-    class Elliptic_Temporal_Dataset():
-        def __init__(self, path):
-            tar_file = os.path.join(path, 'elliptic.tar.gz')
-            tar_archive = tarfile.open(tar_file, 'r:gz')
-            self.nodes_labels_times = self.load_node_labels(tar_archive)
-            self.edges = self.load_transactions(tar_archive)
-            self.nodes, self.nodes_feats = self.load_node_feats(tar_archive)
-            self.max_degree = get_max_degs(self)
-        def load_node_feats(self, tar_archive):
-            data = load_data_from_tar('elliptic_txs_features.csv', tar_archive, starting_line=0)
-            nodes = data
-            nodes_feats = nodes[:,1:]
-            self.num_nodes = len(nodes)
-            self.feats_per_node = data.size(1) - 1
-            return nodes, nodes_feats.float()
-        def load_node_labels(self, tar_archive):
-            labels = load_data_from_tar('elliptic_txs_classes.csv', tar_archive, replace_unknow=True).long()
-            times = load_data_from_tar('elliptic_txs_nodetime.csv', tar_archive, replace_unknow=True).long()
-            lcols = Namespace({'nid': 0, 'label': 1})
-            tcols = Namespace({'nid':0, 'time':1})
-            nodes_labels_times =[]
-            for i in range(len(labels)):
-                label = labels[i,[lcols.label]].long()
-                if label>=0:
-                    nid=labels[i,[lcols.nid]].long()
-                    time=times[nid,[tcols.time]].long()
-                    nodes_labels_times.append([nid , label, time])
-            nodes_labels_times = torch.tensor(nodes_labels_times)
-            return nodes_labels_times
-        def load_transactions(self, tar_archive):
-            data = load_data_from_tar('elliptic_txs_edgelist_timed.csv', tar_archive, type_fn=float, tensor_const=torch.LongTensor)
-            tcols = Namespace({'source': 0,
-                                 'target': 1,
-                                 'time': 2})
-            data = torch.cat([data,data[:,[1,0,2]]])
-            self.max_time = data[:,tcols.time].max()
-            self.min_time = data[:,tcols.time].min()
-            return {'idx': data, 'vals': torch.ones(data.size(0))}
-We construct a wrapped Elliptic_Temporal_Dataset object to store the data. The data object contains the following attributes:
- `nodes_labels_times`: a tensor that contains the label and time information of each node. Each element is a list containing the ID of the node, a label, and the time.
- `edges`: a dictionary that contains 2 keys: idx and vals. The value of the idx key is a tensor containing the source node, the destination node, and the time, and the value of the vals key is a tensor with 1s.
- `nodes`: a tensor that contains the features of each node. Each element is a list of node ids and attributes.
- `nodes_feats`: a tensor that contains only the features for each node.
- `max_degree`: a tensor that stores the maximum out-degree over all time steps in the dataset.
-Generate a graph from graph data
---------------
-In order to facilitate further processing later, the corresponding graph is generated from the graph dataset using the encapsulated function:
-.. code-block:: python
-    graph, dataset = prepare_data2(args, data_root, dist.get_world_size(group), dataset)
-    def prepare_data2(args, root, num_partitions, dataset):
-        hist_adj_list, hist_ndFeats_list, hist_mask_list, existing_nodes = preprocess(dataset)
-        edge_index, edge_attr, edge_times, x, exists = [], [], [], [], []
-        num_snapshots = len(hist_adj_list)
-        for i in range(num_snapshots):
-            edge_index.append(hist_adj_list[i]['idx'].t())
-            edge_attr.append(hist_adj_list[i]['vals'])
-            edge_times.append(torch.full_like(edge_attr[i], i))
-            x.append(make_sparse_tensor(hist_ndFeats_list[i], tensor_type='float',
-                                            torch_size=[dataset.num_nodes, dataset.feats_per_node]).to_dense()[:, None, :])
-        edge_index = torch.cat(edge_index, dim=1)
-        edge_times = torch.cat(edge_times, dim=0)
-        x = torch.cat(x, dim=1)
-        edge_attr = torch.cat(edge_attr, dim=0).type_as(x)
-        g = GraphData(edge_index, num_nodes=x.size(0))
-        g.node()["x"] = x
-        g.edge()["time"] = edge_times
-        g.edge()["attr"] = edge_attr
-        g.meta()["num_nodes"] = x.size(0)
-        g.meta()["num_snapshots"] = num_snapshots
-        g.save_partition(root, num_partitions, algorithm="random")
-        return g, dataset
-Finally, a GraphData object g is obtained, at the same time it is partitioned and saved.It contains the following attributes:
- `x`: the attributes of the nodes
- `edge_times`: the time steps of the edges
- `edge_attr`: the attributes of the edges
- `num_nodes`: the global number of nodes
- `num_snapshots`: the number of snapshots globally
--- a/docs/source/tutorial/distributed.rst
+++ b/docs/source/tutorial/distributed.rst
 Distributed Training
 ====================
-Preparation For Distributed Environment
+1. Preparation distributed environment for CTDG
 ---------------------------------------
 Before start training, we need to prepare the environment for distributed training, inluding the following steps:
@@ -201,5 +201,237 @@ Before start training, we need to prepare the environment for distributed traini
        ap, auc = eval('val')
        print('\ttest AP:{:4f}  test MRR:{:4f}'.format(ap, auc))
+2. Preparation distributed environment for DTDG
+---------------------------------------
+Before start training, we need to prepare the environment for distributed training, including the following steps:
+1. Initialize the Distributed context
+    .. code-block:: python
+        ctx = DistributedContext.init(backend="nccl", use_gpu=True)
+        group = ctx.get_default_group()
+2. Import the partitioned dataset using the wrapped function, and let the main process (ctx.rank=0) do the data preparation
+    .. code-block:: python
+        data_root = "./dataset"
+        dataset = build_dataset(args)
+        if ctx.rank == 0:
+            graph, dataset = prepare_data(args, data_root, dist.get_world_size(group), dataset)
+        dist.barrier()
+        g = get_graph(data_root, group).to(ctx.device)
+        def prepare_data(root: str, num_parts):
+            dataset = TwitterTennisDatasetLoader().get_dataset()
+            x = []
+            y = []
+            edge_index = []
+            edge_times = []
+            edge_attr = []
+            snapshot_count = 0
+            for i, data in enumerate(dataset):
+                x.append(data.x[:,None,:])
+                y.append(data.y[:,None])
+                edge_index.append(data.edge_index)
+                print(data.edge_index.shape)
+                exit(0)
+                edge_times.append(torch.full_like(data.edge_index[0], i))
+                edge_attr.append(data.edge_attr)
+                snapshot_count += 1
+            x = torch.cat(x, dim=1)
+            y = torch.cat(y, dim=1)
+            edge_index = torch.cat(edge_index, dim=1)
+            edge_times = torch.cat(edge_times, dim=0)
+            edge_attr = torch.cat(edge_attr, dim=0)
+            g = GraphData(edge_index, num_nodes=x.size(0))
+            g.node()["x"] = x
+            g.node()["y"] = y
+            g.edge()["time"] = edge_times
+            g.edge()["attr"] = edge_attr
+            g.meta()["num_nodes"] = x.size(0)
+            g.meta()["num_snapshots"] = snapshot_count
+            logging.info(f"GraphData.meta().keys(): {g.meta().keys()}")
+            logging.info(f"GraphData.node().keys(): {g.node().keys()}")
+            logging.info(f"GraphData.edge().keys(): {g.edge().keys()}")
+            g.save_partition(root, num_parts, algorithm="random")
+            return g
+3. Creating a partitioned parallel-based GNN model :code:`sync_gnn`, and create a classifier and a splitter
+    .. code-block:: python
+        sync_gnn = build_model(args, graph=g, group=group)
+        sync_gnn = sync_gnn.to(ctx.device)
+        classifier = Classifier(args.hidden_dim, args.hidden_dim)
+        classifier = classifier.to(ctx.device)
+        spl = splitter(args, min_time, max_time)
+4.Start to train our model
+    .. code-block:: python
+        trainer = Trainer(args, spl, sync_gnn, classifier, dataset, ctx)
+        trainer.train()
+        class Trainer():
+            def __init__(self, args, splitter, gcn, classifier, dataset, ctx):
+                self.args = args
+                self.splitter = splitter
+                self.gcn = gcn
+                self.classifier = classifier
+                self.comp_loss = nn.BCELoss()
+                self.group = self.gcn.group
+                self.graph = self.gcn.graph
+                self.ctx = ctx
+                self.logger = logger.Logger(args, 1)
+                self.num_nodes = dataset.num_nodes
+                self.data = dataset
+                self.time = {'TRAIN': [], 'VALID': [], 'TEST':[]}
+                self.init_optimizers(args)
+            def init_optimizers(self, args):
+                params = self.gcn.parameters()
+                self.gcn_opt = torch.optim.Adam(params, lr=args.learning_rate)
+                params = self.classifier.parameters()
+                self.classifier_opt = torch.optim.Adam(params, lr=args.learning_rate)
+                self.gcn_opt.zero_grad()
+                self.classifier_opt.zero_grad()
+            def save_checkpoint(self, state, filename='checkpoint.pth.tar'):
+                torch.save(state, filename)
+            def load_checkpoint(self, filename, model):
+                if os.path.isfile(filename):
+                    print("=> loading checkpoint '{}'".format(filename))
+                    checkpoint = torch.load(filename)
+                    epoch = checkpoint['epoch']
+                    self.gcn.load_state_dict(checkpoint['gcn_dict'])
+                    self.classifier.load_state_dict(checkpoint['classifier_dict'])
+                    self.gcn_opt.load_state_dict(checkpoint['gcn_optimizer'])
+                    self.classifier_opt.load_state_dict(checkpoint['classifier_optimizer'])
+                    self.logger.log_str("=> loaded checkpoint '{}' (epoch {})".format(filename, checkpoint['epoch']))
+                    return epoch
+                else:
+                    self.logger.log_str("=> no checkpoint found at '{}'".format(filename))
+                    return 0
+            def train(self):
+                self.tr_step = 0
+                best_eval_valid = 0
+                eval_valid = 0
+                epochs_without_impr = 0
+                for e in range(self.args.num_epochs):
+                    eval_train = self.run_epoch(self.splitter.train, e, 'TRAIN', grad=True)
+                    if len(self.splitter.dev) > 0 and e > self.args.eval_after_epochs:
+                        eval_valid = self.run_epoch(self.splitter.dev, e, 'VALID', grad=False)
+                        eval_test = self.run_epoch(self.splitter.test, e, 'TEST', grad=False)
+                        if eval_valid > best_eval_valid:
+                            best_eval_valid = eval_valid
+                            best_test = eval_test
+                            epochs_without_impr = 0
+                for tmp in self.time.keys():
+                    self.ctx.sync_print(tmp, np.mean(self.time[tmp]))
+                print(eval_test)
+            def run_epoch(self, split, epoch, set_name, grad):
+                t0 = time.time()
+                log_interval = 1
+                if set_name == 'TEST':
+                    log_interval = 1
+                self.logger.log_epoch_start(epoch, len(split), set_name, minibatch_log_interval=log_interval)
+                torch.set_grad_enabled(grad)
+                for s in split:
+                    hist_snap_ids = s['hist_ts']
+                    label_snap_id = s['label_ts']
+                    predictions, labels, label_edge = self.predict(hist_snap_ids, label_snap_id, set_name)
+                    loss = self.comp_loss(predictions, labels)
+                    if set_name == 'TRAIN':
+                        loss.backward()
+                        all_reduce_gradients(self.gcn)
+                        all_reduce_buffers(self.gcn)
+                        all_reduce_gradients(self.classifier)
+                        all_reduce_buffers(self.classifier)
+                        self.gcn_opt.step()
+                        self.classifier_opt.step()
+                        self.gcn_opt.zero_grad()
+                        self.classifier_opt.zero_grad()
+                    if set_name in ['TEST', 'VALID'] and self.args.task == 'link_pred':
+                        self.logger.log_minibatch(predictions, labels, loss.detach(), adj=label_edge)
+                        dist.barrier()
+                    else:
+                        self.logger.log_minibatch(predictions, labels, loss.detach())
+                torch.set_grad_enabled(True)
+                eval_measure = self.logger.log_epoch_done()
+                t1 = time.time()
+                self.time[set_name].append(t1-t0)
+                return eval_measure
+            def predict(self, hist_snap_ids, label_snap_id, set_name):
+                nodes_embs_dst = self.gcn(hist_snap_ids)
+                num_dst = nodes_embs_dst.shape[0]
+                nodes_embs_src = self.gcn.route.apply(nodes_embs_dst)
+                num_src = nodes_embs_src.shape[0]
+                num_nodes, x, pos_edge_index, edge_attr = self.gcn.get_snapshot(label_snap_id)
+                neg_edge_index = self.negative_sampling(num_src, num_dst, edge_attr.shape[0], set_name)
+                pos_cls_input = self.gather_node_embs(nodes_embs_src, pos_edge_index, nodes_embs_dst)
+                neg_cls_input = self.gather_node_embs(nodes_embs_src, neg_edge_index, nodes_embs_dst)
+                pos_predictions = self.classifier(pos_cls_input)
+                neg_predictions = self.classifier(neg_cls_input)
+                pos_label = torch.ones_like(pos_predictions)
+                neg_label = torch.zeros_like(neg_predictions)
+                pred = torch.cat([pos_predictions, neg_predictions], dim=0)
+                label = torch.cat([pos_label, neg_label], dim=0)
+                label_edge = torch.cat([pos_edge_index, neg_edge_index], dim=1)
+                return pred.sigmoid(), label, label_edge
+            def gather_node_embs(self, nodes_embs_src, node_indices, nodes_embs_dist):
+                return torch.cat([nodes_embs_src[node_indices[0,:]], nodes_embs_dist[node_indices[1,:]]], dim=1)
+            def optim_step(self, loss):
+                self.tr_step += 1
+                loss.backward()
+                if self.tr_step % self.args.steps_accum_gradients == 0:
+                    self.gcn_opt.step()
+                    self.classifier_opt.step()
+                    self.gcn_opt.zero_grad()
+                    self.classifier_opt.zero_grad()
+            def negative_sampling(self, num_src, num_dst, num_edge, set_name):
+                if set_name == 'TRAIN':
+                    num_sample = num_edge * self.args.negative_mult_training
+                else:
+                    num_sample = num_edge * self.args.negative_mult_test
+                src = torch.randint(low=0, high=num_src, size=(num_sample,))
+                dst = torch.randint(low=0, high=num_dst, size=(num_sample,))
+                return torch.vstack([src, dst]).to(self.ctx.device)
--- a/docs/source/tutorial/distributed_DTDG.rst
+++ b/docs/source/tutorial/distributed_DTDG.rst
-Distributed Training
-====================
-Preparation For Distributed Environment
---------------------------------------
-Before start training, we need to prepare the environment for distributed training, including the following steps:
-1. Initialize the Distributed context
-    .. code-block:: python
-        ctx = DistributedContext.init(backend="nccl", use_gpu=True)
-        group = ctx.get_default_group()
-2. Import the partitioned dataset using the wrapped function, and let the main process (ctx.rank=0) do the data preparation
-    .. code-block:: python
-        data_root = "./dataset"
-        dataset = build_dataset(args)
-        if ctx.rank == 0:
-            graph, dataset = prepare_data(args, data_root, dist.get_world_size(group), dataset)
-        dist.barrier()
-        g = get_graph(data_root, group).to(ctx.device)
-        def prepare_data(root: str, num_parts):
-            dataset = TwitterTennisDatasetLoader().get_dataset()
-            x = []
-            y = []
-            edge_index = []
-            edge_times = []
-            edge_attr = []
-            snapshot_count = 0
-            for i, data in enumerate(dataset):
-                x.append(data.x[:,None,:])
-                y.append(data.y[:,None])
-                edge_index.append(data.edge_index)
-                print(data.edge_index.shape)
-                exit(0)
-                edge_times.append(torch.full_like(data.edge_index[0], i))
-                edge_attr.append(data.edge_attr)
-                snapshot_count += 1
-            x = torch.cat(x, dim=1)
-            y = torch.cat(y, dim=1)
-            edge_index = torch.cat(edge_index, dim=1)
-            edge_times = torch.cat(edge_times, dim=0)
-            edge_attr = torch.cat(edge_attr, dim=0)
-            g = GraphData(edge_index, num_nodes=x.size(0))
-            g.node()["x"] = x
-            g.node()["y"] = y
-            g.edge()["time"] = edge_times
-            g.edge()["attr"] = edge_attr
-            g.meta()["num_nodes"] = x.size(0)
-            g.meta()["num_snapshots"] = snapshot_count
-            logging.info(f"GraphData.meta().keys(): {g.meta().keys()}")
-            logging.info(f"GraphData.node().keys(): {g.node().keys()}")
-            logging.info(f"GraphData.edge().keys(): {g.edge().keys()}")
-            g.save_partition(root, num_parts, algorithm="random")
-            return g
-3. Creating a partitioned parallel-based GNN model :code:`sync_gnn`, and create a classifier and a splitter
-    .. code-block:: python
-        sync_gnn = build_model(args, graph=g, group=group)
-        sync_gnn = sync_gnn.to(ctx.device)
-        classifier = Classifier(args.hidden_dim, args.hidden_dim)
-        classifier = classifier.to(ctx.device)
-        spl = splitter(args, min_time, max_time)
-4.Start to train our model
-    .. code-block:: python
-        trainer = Trainer(args, spl, sync_gnn, classifier, dataset, ctx)
-        trainer.train()
-        class Trainer():
-            def __init__(self, args, splitter, gcn, classifier, dataset, ctx):
-                self.args = args
-                self.splitter = splitter
-                self.gcn = gcn
-                self.classifier = classifier
-                self.comp_loss = nn.BCELoss()
-                self.group = self.gcn.group
-                self.graph = self.gcn.graph
-                self.ctx = ctx
-                self.logger = logger.Logger(args, 1)
-                self.num_nodes = dataset.num_nodes
-                self.data = dataset
-                self.time = {'TRAIN': [], 'VALID': [], 'TEST':[]}
-                self.init_optimizers(args)
-            def init_optimizers(self, args):
-                params = self.gcn.parameters()
-                self.gcn_opt = torch.optim.Adam(params, lr=args.learning_rate)
-                params = self.classifier.parameters()
-                self.classifier_opt = torch.optim.Adam(params, lr=args.learning_rate)
-                self.gcn_opt.zero_grad()
-                self.classifier_opt.zero_grad()
-            def save_checkpoint(self, state, filename='checkpoint.pth.tar'):
-                torch.save(state, filename)
-            def load_checkpoint(self, filename, model):
-                if os.path.isfile(filename):
-                    print("=> loading checkpoint '{}'".format(filename))
-                    checkpoint = torch.load(filename)
-                    epoch = checkpoint['epoch']
-                    self.gcn.load_state_dict(checkpoint['gcn_dict'])
-                    self.classifier.load_state_dict(checkpoint['classifier_dict'])
-                    self.gcn_opt.load_state_dict(checkpoint['gcn_optimizer'])
-                    self.classifier_opt.load_state_dict(checkpoint['classifier_optimizer'])
-                    self.logger.log_str("=> loaded checkpoint '{}' (epoch {})".format(filename, checkpoint['epoch']))
-                    return epoch
-                else:
-                    self.logger.log_str("=> no checkpoint found at '{}'".format(filename))
-                    return 0
-            def train(self):
-                self.tr_step = 0
-                best_eval_valid = 0
-                eval_valid = 0
-                epochs_without_impr = 0
-                for e in range(self.args.num_epochs):
-                    eval_train = self.run_epoch(self.splitter.train, e, 'TRAIN', grad=True)
-                    if len(self.splitter.dev) > 0 and e > self.args.eval_after_epochs:
-                        eval_valid = self.run_epoch(self.splitter.dev, e, 'VALID', grad=False)
-                        eval_test = self.run_epoch(self.splitter.test, e, 'TEST', grad=False)
-                        if eval_valid > best_eval_valid:
-                            best_eval_valid = eval_valid
-                            best_test = eval_test
-                            epochs_without_impr = 0
-                for tmp in self.time.keys():
-                    self.ctx.sync_print(tmp, np.mean(self.time[tmp]))
-                print(eval_test)
-            def run_epoch(self, split, epoch, set_name, grad):
-                t0 = time.time()
-                log_interval = 1
-                if set_name == 'TEST':
-                    log_interval = 1
-                self.logger.log_epoch_start(epoch, len(split), set_name, minibatch_log_interval=log_interval)
-                torch.set_grad_enabled(grad)
-                for s in split:
-                    hist_snap_ids = s['hist_ts']
-                    label_snap_id = s['label_ts']
-                    predictions, labels, label_edge = self.predict(hist_snap_ids, label_snap_id, set_name)
-                    loss = self.comp_loss(predictions, labels)
-                    if set_name == 'TRAIN':
-                        loss.backward()
-                        all_reduce_gradients(self.gcn)
-                        all_reduce_buffers(self.gcn)
-                        all_reduce_gradients(self.classifier)
-                        all_reduce_buffers(self.classifier)
-                        self.gcn_opt.step()
-                        self.classifier_opt.step()
-                        self.gcn_opt.zero_grad()
-                        self.classifier_opt.zero_grad()
-                    if set_name in ['TEST', 'VALID'] and self.args.task == 'link_pred':
-                        self.logger.log_minibatch(predictions, labels, loss.detach(), adj=label_edge)
-                        dist.barrier()
-                    else:
-                        self.logger.log_minibatch(predictions, labels, loss.detach())
-                torch.set_grad_enabled(True)
-                eval_measure = self.logger.log_epoch_done()
-                t1 = time.time()
-                self.time[set_name].append(t1-t0)
-                return eval_measure
-            def predict(self, hist_snap_ids, label_snap_id, set_name):
-                nodes_embs_dst = self.gcn(hist_snap_ids)
-                num_dst = nodes_embs_dst.shape[0]
-                nodes_embs_src = self.gcn.route.apply(nodes_embs_dst)
-                num_src = nodes_embs_src.shape[0]
-                num_nodes, x, pos_edge_index, edge_attr = self.gcn.get_snapshot(label_snap_id)
-                neg_edge_index = self.negative_sampling(num_src, num_dst, edge_attr.shape[0], set_name)
-                pos_cls_input = self.gather_node_embs(nodes_embs_src, pos_edge_index, nodes_embs_dst)
-                neg_cls_input = self.gather_node_embs(nodes_embs_src, neg_edge_index, nodes_embs_dst)
-                pos_predictions = self.classifier(pos_cls_input)
-                neg_predictions = self.classifier(neg_cls_input)
-                pos_label = torch.ones_like(pos_predictions)
-                neg_label = torch.zeros_like(neg_predictions)
-                pred = torch.cat([pos_predictions, neg_predictions], dim=0)
-                label = torch.cat([pos_label, neg_label], dim=0)
-                label_edge = torch.cat([pos_edge_index, neg_edge_index], dim=1)
-                return pred.sigmoid(), label, label_edge
-            def gather_node_embs(self, nodes_embs_src, node_indices, nodes_embs_dist):
-                return torch.cat([nodes_embs_src[node_indices[0,:]], nodes_embs_dist[node_indices[1,:]]], dim=1)
-            def optim_step(self, loss):
-                self.tr_step += 1
-                loss.backward()
-                if self.tr_step % self.args.steps_accum_gradients == 0:
-                    self.gcn_opt.step()
-                    self.classifier_opt.step()
-                    self.gcn_opt.zero_grad()
-                    self.classifier_opt.zero_grad()
-            def negative_sampling(self, num_src, num_dst, num_edge, set_name):
-                if set_name == 'TRAIN':
-                    num_sample = num_edge * self.args.negative_mult_training
-                else:
-                    num_sample = num_edge * self.args.negative_mult_test
-                src = torch.randint(low=0, high=num_src, size=(num_sample,))
-                dst = torch.randint(low=0, high=num_dst, size=(num_sample,))
-                return torch.vstack([src, dst]).to(self.ctx.device)
--- a/docs/source/tutorial/module.rst
+++ b/docs/source/tutorial/module.rst
 Creating Temporal GNN Models
 ============================
-Continuous-time Temporal GNN Models
+1. Continuous-time Temporal GNN Models
 -----------------------------------
 To create a continuous-time temporal GNN model, we first need to define a configuration file with the suffix yml to specify the model structures and parameters. Here we use the configuration file :code:`TGN.yml` for TGN model as an example:
@@ -72,4 +72,139 @@ Then a :code:`GeneralModel` object is created. If needed, we can adjust the mode
 - :code:`DyRep`: The DyRep model proposed in `Representation Learning and Reasoning on Temporal Knowledge Graphs <https://arxiv.org/abs/1803.04051>`__.
 - :code:`TIGER`: The TIGER model proposed in `TIGER: A Transformer-Based Framework for Temporal Knowledge Graph Completion <https://arxiv.org/abs/2302.06057>`__.
 - :code:`Jodie`: The Jodie model proposed in `JODIE: Joint Optimization of Dynamics and Importance for Online Embedding <https://arxiv.org/abs/1908.01207>`__.
 - :code:`TGAT`: The TGAT model proposed in `Temporal Graph Attention for Deep Temporal Modeling <https://arxiv.org/abs/2002.07962>`__.
\ No newline at end of file
+2. Discrete-time Temporal GNN Models
+-----------------------------------
+To create a discrete-time temporal GNN model, we first need to define a configuration file with the suffix yml to specify the model structures and parameters. Here we use the configuration file :code:`parameters_elliptic_egcn_o.yaml` for egcn_o model as an example:
+.. code-block:: yaml
+    dataset_args:
+        data: elliptic_temporal
+        elliptic_args:
+          folder: ./data/elliptic_temporal
+          tar_file: elliptic_bitcoin_dataset_cont.tar.gz
+          feats_file: elliptic_bitcoin_dataset_cont/elliptic_txs_features.csv
+          edges_file: elliptic_bitcoin_dataset_cont/elliptic_txs_edgelist_timed.csv
+          classes_file: elliptic_bitcoin_dataset_cont/elliptic_txs_classes.csv
+          times_file: elliptic_bitcoin_dataset_cont/elliptic_txs_nodetime.csv
+          aggr_time: 1
+    train:
+        use_cuda: True
+        use_logfile: True
+        model: egcn_o
+        task: node_cls
+        class_weights: [ 0.35, 0.65]
+        use_2_hot_node_feats: False
+        use_1_hot_node_feats: False
+        save_node_embeddings: True
+        train_proportion: 0.65
+        dev_proportion: 0.1
+        num_epochs: 800
+        steps_accum_gradients: 1
+        learning_rate: 0.001
+        learning_rate_min: 0.001
+        learning_rate_max: 0.02
+        negative_mult_training: 20
+        negative_mult_test: 100
+        smart_neg_sampling: False
+        seed: 1234
+        target_measure: F1
+        target_class: 1
+        early_stop_patience: 100
+        eval_after_epochs: 5
+        adj_mat_time_window: 1
+        adj_mat_time_window_min: 1
+        adj_mat_time_window_max: 10
+        num_hist_steps: 5 # number of previous steps used for prediction
+        num_hist_steps_min: 3 # only used if num_hist_steps: None
+        num_hist_steps_max: 10 # only used if num_hist_steps: None
+        data_loading_params:
+          batch_size: 1
+          num_workers: 6
+    gcn_parameters:
+      feats_per_node: 50
+      feats_per_node_min: 30
+      feats_per_node_max: 312
+      layer_1_feats: 256
+      layer_1_feats_min: 30
+      layer_1_feats_max: 500
+      layer_2_feats: None
+      layer_2_feats_same_as_l1: True
+      k_top_grcu: 200
+      num_layers: 2
+      lstm_l1_layers: 125
+      lstm_l1_feats: 100
+      lstm_l1_feats_min: 50
+      lstm_l1_feats_max: 500
+      lstm_l2_layers: 1
+      lstm_l2_feats: 400
+      lstm_l2_feats_same_as_l1: True
+      cls_feats: 307
+      cls_feats_min: 100
+      cls_feats_max: 700
+The configuration file is composed of three parts: :code:`dataset_args`, :code:`train` and :code:`gcn_parameters`. Here are their meanings:
+- :code:`dataset_args`: This part specifies some configurations for the dataset used. :code:`data` specifies the name of the dataset. :code:`elliptic_args` contains parameters related to the dataset, including the folder location of the dataset, the name of the data file, and so on.
+- :code:`train`: This part specifies the training parameters. :code:`use_cuda` indicates whether cuda is used for computation. :code:`use_logfile` indicates whether log files are used to record running processes. :code:`model` indicates the model name to use. :code:`task` indicates the type of the task. :code:`class_weights` is the class weight, which deals with class imbalance. :code:`use_2_hot_node_feats` and :code:`use_2_hot_node_feats` indicates whether one-hot encoding is used. :code:`save_node_embeddings` indicates whether the node embedding is saved. :code:`train_proportion` and :code:`dev_proportion` indicates the ratio of training set and validation set. :code:`num_epochs` indicates the total number of rounds of training. :code:`steps_accum_gradients` indicates the number of steps for gradient accumulation, which is used to implement gradient accumulation. :code:`learning_rate` indicates learning rate. :code:`negative_mult_training` and :code:`negative_mult_test` indicates the multiple of negative sampling at training and test time. :code:`smart_neg_sampling` indicates whether to use negative-only sampling. :code:`seed` indicates the random number seed. :code:`target_measure` and :code:`target_class` denote the target evaluation metric and target category respectively. :code:`early_stop_patience` is the patience value of early stopping, stopping the training if the performance on the validation set does not improve within a certain number of rounds. :code:`eval_after_epochs` indicates how many rounds the evaluation should be performed. :code:`adj_mat_time_window` indicates the time window of the adjacency matrix. :code:`num_hist_steps` indicates the number of historical steps used for prediction. :code:`data_loading_params` indicates data loading parameters, including batch size and number of worker threads.
+- :code:`gcn_parameters`: This part specifies the GCN module.These include the number of features per node, the number of features per layer, and the parameters of the LSTM layer. Notice that there are parameters with the suffixes min and max, which means that the parameter values will be randomly generated between min and max based on the random number seed.
+After defining the configuration file, we can firstly read the parameters from the configuration file and create a GNN model that supports partition parallelism and use it for later training:
+.. code-block:: python
+    def create_parser():
+        parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter)
+        parser.add_argument('--config_file', default='experiments/parameters_elliptic_egcn_o.yaml', type=argparse.FileType(mode='r'), help='optional, yaml file containing parameters to be used, overrides command line parameters')
+        parser.add_argument('--gpu', default=0, type=int, help='gpu id')
+        return parser
+    def parse_args(parser):
+        args = parser.parse_args()
+        if args.config_file:
+            data = yaml.load(args.config_file, Loader=yaml.FullLoader)
+            delattr(args, 'config_file')
+            arg_dict = args.__dict__
+            for key, value in data.items():
+                arg_dict[key] = value
+        if args.model in ['dysat', 'gcrn']:
+            return args
+        args.learning_rate = random_param_value(args.learning_rate, args.learning_rate_min, args.learning_rate_max, type='logscale')
+        args.num_hist_steps = random_param_value(args.num_hist_steps, args.num_hist_steps_min, args.num_hist_steps_max, type='int')
+        args.gcn_parameters['feats_per_node'] = random_param_value(args.gcn_parameters['feats_per_node'], args.gcn_parameters['feats_per_node_min'], args.gcn_parameters['feats_per_node_max'], type='int')
+        args.gcn_parameters['layer_1_feats'] = random_param_value(args.gcn_parameters['layer_1_feats'], args.gcn_parameters['layer_1_feats_min'], args.gcn_parameters['layer_1_feats_max'], type='int')
+        if args.gcn_parameters['layer_2_feats_same_as_l1'] or args.gcn_parameters['layer_2_feats_same_as_l1'].lower()=='true':
+            args.gcn_parameters['layer_2_feats'] = args.gcn_parameters['layer_1_feats']
+        else:
+            args.gcn_parameters['layer_2_feats'] = random_param_value(args.gcn_parameters['layer_2_feats'], args.gcn_parameters['layer_1_feats_min'], args.gcn_parameters['layer_1_feats_max'], type='int')
+        args.gcn_parameters['lstm_l1_feats'] = random_param_value(args.gcn_parameters['lstm_l1_feats'], args.gcn_parameters['lstm_l1_feats_min'], args.gcn_parameters['lstm_l1_feats_max'], type='int')
+        if args.gcn_parameters['lstm_l2_feats_same_as_l1'] or args.gcn_parameters['lstm_l2_feats_same_as_l1'].lower()=='true':
+            args.gcn_parameters['lstm_l2_feats'] = args.gcn_parameters['lstm_l1_feats']
+        else:
+            args.gcn_parameters['lstm_l2_feats'] = random_param_value(args.gcn_parameters['lstm_l2_feats'], args.gcn_parameters['lstm_l1_feats_min'], args.gcn_parameters['lstm_l1_feats_max'], type='int')
+        args.gcn_parameters['cls_feats'] = random_param_value(args.gcn_parameters['cls_feats'], args.gcn_parameters['cls_feats_min'], args.gcn_parameters['cls_feats_max'], type='int')
+        return args
+        parser = u.create_parser()
+        args = u.parse_args(parser)
+        sync_gnn = build_model(args, graph=g, group=group)
+Then a :code:`build_model` object is created. If needed, we can adjust the model's parameters by modifying the contents of the configuration file. Here we provide 4 models for discrete-time temporal GNNs:
+- :code:`EloveGCN`: The EGCN model proposed in `Evolving graph convolutional networks for dynamic graphs <https://ojs.aaai.org/index.php/AAAI/article/download/5984/5840>`__.
+- :code:`DySAT`: The DySAT model proposed in `Deep neural representation learning on dynamic graphs via self-attention networks <https://sci-hub.yncjkj.com/10.1145/3336191.3371845>`__.
+- :code:`GCRN`: The GCRN model proposed in `Structured sequence modeling with graph convolutional recurrent networks <https://arxiv.dosf.top/pdf/1612.07659.pdf>`__.
+- :code:`TGCN`: The TGCN model proposed in `Tag graph convolutional network for tag-aware recommendation <https://xinxin-me.github.io/papers/TGCN.pdf>`__.
--- a/docs/source/tutorial/module_DTDG.rst
+++ b/docs/source/tutorial/module_DTDG.rst
-Creating Temporal GNN Models
-============================
-Discrete-time Temporal GNN Models
-----------------------------------
-To create a discrete-time temporal GNN model, we first need to define a configuration file with the suffix yml to specify the model structures and parameters. Here we use the configuration file :code:`parameters_elliptic_egcn_o.yaml` for egcn_o model as an example:
-.. code-block:: yaml
-    dataset_args:
-        data: elliptic_temporal
-        elliptic_args:
-          folder: ./data/elliptic_temporal
-          tar_file: elliptic_bitcoin_dataset_cont.tar.gz
-          feats_file: elliptic_bitcoin_dataset_cont/elliptic_txs_features.csv
-          edges_file: elliptic_bitcoin_dataset_cont/elliptic_txs_edgelist_timed.csv
-          classes_file: elliptic_bitcoin_dataset_cont/elliptic_txs_classes.csv
-          times_file: elliptic_bitcoin_dataset_cont/elliptic_txs_nodetime.csv
-          aggr_time: 1
-    train:
-        use_cuda: True
-        use_logfile: True
-        model: egcn_o
-        task: node_cls
-        class_weights: [ 0.35, 0.65]
-        use_2_hot_node_feats: False
-        use_1_hot_node_feats: False
-        save_node_embeddings: True
-        train_proportion: 0.65
-        dev_proportion: 0.1
-        num_epochs: 800
-        steps_accum_gradients: 1
-        learning_rate: 0.001
-        learning_rate_min: 0.001
-        learning_rate_max: 0.02
-        negative_mult_training: 20
-        negative_mult_test: 100
-        smart_neg_sampling: False
-        seed: 1234
-        target_measure: F1
-        target_class: 1
-        early_stop_patience: 100
-        eval_after_epochs: 5
-        adj_mat_time_window: 1
-        adj_mat_time_window_min: 1
-        adj_mat_time_window_max: 10
-        num_hist_steps: 5 # number of previous steps used for prediction
-        num_hist_steps_min: 3 # only used if num_hist_steps: None
-        num_hist_steps_max: 10 # only used if num_hist_steps: None
-        data_loading_params:
-          batch_size: 1
-          num_workers: 6
-    gcn_parameters:
-      feats_per_node: 50
-      feats_per_node_min: 30
-      feats_per_node_max: 312
-      layer_1_feats: 256
-      layer_1_feats_min: 30
-      layer_1_feats_max: 500
-      layer_2_feats: None
-      layer_2_feats_same_as_l1: True
-      k_top_grcu: 200
-      num_layers: 2
-      lstm_l1_layers: 125
-      lstm_l1_feats: 100
-      lstm_l1_feats_min: 50
-      lstm_l1_feats_max: 500
-      lstm_l2_layers: 1
-      lstm_l2_feats: 400
-      lstm_l2_feats_same_as_l1: True
-      cls_feats: 307
-      cls_feats_min: 100
-      cls_feats_max: 700
-The configuration file is composed of three parts: :code:`dataset_args`, :code:`train` and :code:`gcn_parameters`. Here are their meanings:
- :code:`dataset_args`: This part specifies some configurations for the dataset used. :code:`data` specifies the name of the dataset. :code:`elliptic_args` contains parameters related to the dataset, including the folder location of the dataset, the name of the data file, and so on.
- :code:`train`: This part specifies the training parameters. :code:`use_cuda` indicates whether cuda is used for computation. :code:`use_logfile` indicates whether log files are used to record running processes. :code:`model` indicates the model name to use. :code:`task` indicates the type of the task. :code:`class_weights` is the class weight, which deals with class imbalance. :code:`use_2_hot_node_feats` and :code:`use_2_hot_node_feats` indicates whether one-hot encoding is used. :code:`save_node_embeddings` indicates whether the node embedding is saved. :code:`train_proportion` and :code:`dev_proportion` indicates the ratio of training set and validation set. :code:`num_epochs` indicates the total number of rounds of training. :code:`steps_accum_gradients` indicates the number of steps for gradient accumulation, which is used to implement gradient accumulation. :code:`learning_rate` indicates learning rate. :code:`negative_mult_training` and :code:`negative_mult_test` indicates the multiple of negative sampling at training and test time. :code:`smart_neg_sampling` indicates whether to use negative-only sampling. :code:`seed` indicates the random number seed. :code:`target_measure` and :code:`target_class` denote the target evaluation metric and target category respectively. :code:`early_stop_patience` is the patience value of early stopping, stopping the training if the performance on the validation set does not improve within a certain number of rounds. :code:`eval_after_epochs` indicates how many rounds the evaluation should be performed. :code:`adj_mat_time_window` indicates the time window of the adjacency matrix. :code:`num_hist_steps` indicates the number of historical steps used for prediction. :code:`data_loading_params` indicates data loading parameters, including batch size and number of worker threads.
- :code:`gcn_parameters`: This part specifies the GCN module.These include the number of features per node, the number of features per layer, and the parameters of the LSTM layer. Notice that there are parameters with the suffixes min and max, which means that the parameter values will be randomly generated between min and max based on the random number seed.
-After defining the configuration file, we can firstly read the parameters from the configuration file and create a GNN model that supports partition parallelism and use it for later training:
-.. code-block:: python
-    def create_parser():
-        parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter)
-        parser.add_argument('--config_file', default='experiments/parameters_elliptic_egcn_o.yaml', type=argparse.FileType(mode='r'), help='optional, yaml file containing parameters to be used, overrides command line parameters')
-        parser.add_argument('--gpu', default=0, type=int, help='gpu id')
-        return parser
-    def parse_args(parser):
-        args = parser.parse_args()
-        if args.config_file:
-            data = yaml.load(args.config_file, Loader=yaml.FullLoader)
-            delattr(args, 'config_file')
-            arg_dict = args.__dict__
-            for key, value in data.items():
-                arg_dict[key] = value
-        if args.model in ['dysat', 'gcrn']:
-            return args
-        args.learning_rate = random_param_value(args.learning_rate, args.learning_rate_min, args.learning_rate_max, type='logscale')
-        args.num_hist_steps = random_param_value(args.num_hist_steps, args.num_hist_steps_min, args.num_hist_steps_max, type='int')
-        args.gcn_parameters['feats_per_node'] = random_param_value(args.gcn_parameters['feats_per_node'], args.gcn_parameters['feats_per_node_min'], args.gcn_parameters['feats_per_node_max'], type='int')
-        args.gcn_parameters['layer_1_feats'] = random_param_value(args.gcn_parameters['layer_1_feats'], args.gcn_parameters['layer_1_feats_min'], args.gcn_parameters['layer_1_feats_max'], type='int')
-        if args.gcn_parameters['layer_2_feats_same_as_l1'] or args.gcn_parameters['layer_2_feats_same_as_l1'].lower()=='true':
-            args.gcn_parameters['layer_2_feats'] = args.gcn_parameters['layer_1_feats']
-        else:
-            args.gcn_parameters['layer_2_feats'] = random_param_value(args.gcn_parameters['layer_2_feats'], args.gcn_parameters['layer_1_feats_min'], args.gcn_parameters['layer_1_feats_max'], type='int')
-        args.gcn_parameters['lstm_l1_feats'] = random_param_value(args.gcn_parameters['lstm_l1_feats'], args.gcn_parameters['lstm_l1_feats_min'], args.gcn_parameters['lstm_l1_feats_max'], type='int')
-        if args.gcn_parameters['lstm_l2_feats_same_as_l1'] or args.gcn_parameters['lstm_l2_feats_same_as_l1'].lower()=='true':
-            args.gcn_parameters['lstm_l2_feats'] = args.gcn_parameters['lstm_l1_feats']
-        else:
-            args.gcn_parameters['lstm_l2_feats'] = random_param_value(args.gcn_parameters['lstm_l2_feats'], args.gcn_parameters['lstm_l1_feats_min'], args.gcn_parameters['lstm_l1_feats_max'], type='int')
-        args.gcn_parameters['cls_feats'] = random_param_value(args.gcn_parameters['cls_feats'], args.gcn_parameters['cls_feats_min'], args.gcn_parameters['cls_feats_max'], type='int')
-        return args
-        parser = u.create_parser()
-        args = u.parse_args(parser)
-        sync_gnn = build_model(args, graph=g, group=group)
-Then a :code:`build_model` object is created. If needed, we can adjust the model's parameters by modifying the contents of the configuration file. Here we provide 4 models for discrete-time temporal GNNs:
- :code:`EloveGCN`: The EGCN model proposed in `Evolving graph convolutional networks for dynamic graphs <https://ojs.aaai.org/index.php/AAAI/article/download/5984/5840>`__.
- :code:`DySAT`: The DySAT model proposed in `Deep neural representation learning on dynamic graphs via self-attention networks <https://sci-hub.yncjkj.com/10.1145/3336191.3371845>`__.
- :code:`GCRN`: The GCRN model proposed in `Structured sequence modeling with graph convolutional recurrent networks <https://arxiv.dosf.top/pdf/1612.07659.pdf>`__.
- :code:`TGCN`: The TGCN model proposed in `Tag graph convolutional network for tag-aware recommendation <https://xinxin-me.github.io/papers/TGCN.pdf>`__.