Commit 71cb0206 by zlj

add examples

parent 16cff1d2
LOCAL RANK 0, RANK0
use cuda on 0
9228
get_neighbors consume: 0.0103759s
Epoch 0:
train loss:377.5712 train ap:0.903848 val ap:0.886584 val auc:0.904656
total time:11.40s prep time:9.88s
fetch time:0.00s write back time:0.00s
Epoch 1:
train loss:329.1190 train ap:0.920000 val ap:0.885216 val auc:0.904735
total time:11.32s prep time:9.79s
fetch time:0.00s write back time:0.00s
Epoch 2:
train loss:316.1359 train ap:0.924376 val ap:0.895123 val auc:0.912622
total time:11.49s prep time:9.95s
fetch time:0.00s write back time:0.00s
Epoch 3:
train loss:311.4889 train ap:0.926138 val ap:0.893922 val auc:0.912589
total time:11.50s prep time:9.97s
fetch time:0.00s write back time:0.00s
Epoch 4:
train loss:302.2057 train ap:0.929684 val ap:0.889695 val auc:0.909766
total time:11.48s prep time:9.95s
fetch time:0.00s write back time:0.00s
Epoch 5:
train loss:300.2464 train ap:0.931034 val ap:0.897774 val auc:0.916421
total time:11.48s prep time:9.95s
fetch time:0.00s write back time:0.00s
Epoch 6:
train loss:293.5465 train ap:0.934657 val ap:0.896159 val auc:0.914983
total time:11.55s prep time:10.02s
fetch time:0.00s write back time:0.00s
Epoch 7:
train loss:285.9396 train ap:0.937834 val ap:0.905351 val auc:0.922268
total time:11.52s prep time:9.99s
fetch time:0.00s write back time:0.00s
Epoch 8:
train loss:281.7048 train ap:0.941035 val ap:0.909690 val auc:0.924262
total time:11.51s prep time:9.98s
fetch time:0.00s write back time:0.00s
Epoch 9:
train loss:273.8330 train ap:0.945250 val ap:0.913860 val auc:0.928068
total time:11.56s prep time:10.00s
fetch time:0.00s write back time:0.00s
Epoch 10:
train loss:268.6164 train ap:0.947141 val ap:0.917379 val auc:0.930309
total time:11.77s prep time:10.19s
fetch time:0.00s write back time:0.00s
Epoch 11:
train loss:265.0121 train ap:0.949457 val ap:0.918648 val auc:0.931452
total time:11.62s prep time:10.08s
fetch time:0.00s write back time:0.00s
Epoch 12:
train loss:255.6320 train ap:0.953506 val ap:0.919272 val auc:0.932783
total time:11.50s prep time:9.98s
fetch time:0.00s write back time:0.00s
Epoch 13:
train loss:252.6296 train ap:0.954798 val ap:0.924649 val auc:0.936515
total time:11.50s prep time:9.96s
fetch time:0.00s write back time:0.00s
Epoch 14:
train loss:248.4476 train ap:0.956243 val ap:0.925952 val auc:0.938199
total time:11.53s prep time:10.00s
fetch time:0.00s write back time:0.00s
Epoch 15:
train loss:243.4459 train ap:0.958749 val ap:0.929440 val auc:0.940865
total time:11.54s prep time:10.01s
fetch time:0.00s write back time:0.00s
Epoch 16:
train loss:238.6286 train ap:0.960667 val ap:0.936339 val auc:0.946161
total time:17.48s prep time:15.12s
fetch time:0.00s write back time:0.00s
Epoch 17:
train loss:234.5283 train ap:0.961787 val ap:0.933828 val auc:0.944680
total time:18.09s prep time:15.69s
fetch time:0.00s write back time:0.00s
Epoch 18:
train loss:227.3527 train ap:0.964591 val ap:0.932110 val auc:0.943765
total time:18.17s prep time:15.46s
fetch time:0.00s write back time:0.00s
Epoch 19:
train loss:223.7772 train ap:0.965486 val ap:0.937780 val auc:0.947312
total time:17.80s prep time:15.43s
fetch time:0.00s write back time:0.00s
Epoch 20:
train loss:221.9428 train ap:0.966139 val ap:0.938104 val auc:0.948022
total time:18.31s prep time:15.82s
fetch time:0.00s write back time:0.00s
Epoch 21:
train loss:216.8870 train ap:0.968285 val ap:0.942088 val auc:0.950660
total time:18.14s prep time:15.48s
fetch time:0.00s write back time:0.00s
Epoch 22:
train loss:213.5077 train ap:0.968911 val ap:0.944023 val auc:0.951869
total time:18.09s prep time:15.56s
fetch time:0.00s write back time:0.00s
Epoch 23:
train loss:210.1412 train ap:0.970743 val ap:0.944840 val auc:0.952554
total time:17.74s prep time:15.47s
fetch time:0.00s write back time:0.00s
Epoch 24:
train loss:208.9109 train ap:0.971101 val ap:0.944029 val auc:0.952720
total time:18.47s prep time:15.73s
fetch time:0.00s write back time:0.00s
Epoch 25:
train loss:207.5198 train ap:0.970606 val ap:0.944518 val auc:0.952912
total time:17.97s prep time:15.66s
fetch time:0.00s write back time:0.00s
Epoch 26:
train loss:203.6585 train ap:0.971611 val ap:0.940218 val auc:0.949371
total time:17.70s prep time:15.42s
fetch time:0.00s write back time:0.00s
Epoch 27:
train loss:203.3531 train ap:0.972317 val ap:0.949000 val auc:0.956595
total time:18.01s prep time:15.33s
fetch time:0.00s write back time:0.00s
Epoch 28:
train loss:198.1525 train ap:0.973525 val ap:0.948420 val auc:0.955604
total time:17.78s prep time:15.31s
fetch time:0.00s write back time:0.00s
Epoch 29:
train loss:197.6365 train ap:0.973818 val ap:0.944911 val auc:0.953313
total time:17.74s prep time:15.49s
fetch time:0.00s write back time:0.00s
Epoch 30:
train loss:197.7800 train ap:0.973573 val ap:0.950356 val auc:0.958595
total time:18.24s prep time:15.60s
fetch time:0.00s write back time:0.00s
Epoch 31:
train loss:194.4391 train ap:0.974730 val ap:0.952775 val auc:0.959729
total time:17.84s prep time:15.23s
fetch time:0.00s write back time:0.00s
Epoch 32:
train loss:190.1150 train ap:0.976038 val ap:0.953111 val auc:0.959360
total time:17.72s prep time:15.46s
fetch time:0.00s write back time:0.00s
Epoch 33:
train loss:185.7417 train ap:0.976925 val ap:0.954769 val auc:0.961057
total time:18.04s prep time:15.56s
fetch time:0.00s write back time:0.00s
Epoch 34:
train loss:189.0004 train ap:0.976267 val ap:0.954641 val auc:0.961198
total time:17.89s prep time:15.12s
fetch time:0.00s write back time:0.00s
Epoch 35:
train loss:185.4487 train ap:0.977420 val ap:0.954675 val auc:0.960969
total time:17.65s prep time:15.13s
fetch time:0.00s write back time:0.00s
Epoch 36:
train loss:185.9187 train ap:0.977260 val ap:0.955284 val auc:0.961039
total time:17.67s prep time:15.36s
fetch time:0.00s write back time:0.00s
Epoch 37:
train loss:184.6686 train ap:0.977626 val ap:0.955124 val auc:0.961923
total time:17.90s prep time:15.42s
fetch time:0.00s write back time:0.00s
Epoch 38:
train loss:183.1190 train ap:0.977930 val ap:0.956069 val auc:0.962114
total time:18.10s prep time:15.26s
fetch time:0.00s write back time:0.00s
Epoch 39:
train loss:179.3445 train ap:0.978350 val ap:0.958382 val auc:0.963833
total time:18.05s prep time:15.60s
fetch time:0.00s write back time:0.00s
Epoch 40:
train loss:174.6380 train ap:0.980014 val ap:0.956793 val auc:0.963013
total time:18.28s prep time:15.77s
fetch time:0.00s write back time:0.00s
Epoch 41:
train loss:178.2737 train ap:0.979067 val ap:0.958580 val auc:0.964227
total time:18.24s prep time:15.51s
fetch time:0.00s write back time:0.00s
Epoch 42:
train loss:175.7294 train ap:0.979611 val ap:0.960288 val auc:0.965754
total time:17.98s prep time:15.62s
fetch time:0.00s write back time:0.00s
Epoch 43:
train loss:173.2326 train ap:0.980324 val ap:0.960428 val auc:0.965867
total time:18.21s prep time:15.60s
fetch time:0.00s write back time:0.00s
Epoch 44:
train loss:172.3492 train ap:0.980196 val ap:0.962143 val auc:0.966774
total time:18.35s prep time:15.80s
fetch time:0.00s write back time:0.00s
Epoch 45:
train loss:168.8601 train ap:0.981180 val ap:0.963014 val auc:0.968132
total time:17.73s prep time:15.50s
fetch time:0.00s write back time:0.00s
Epoch 46:
train loss:169.5997 train ap:0.981473 val ap:0.961124 val auc:0.966405
total time:13.20s prep time:11.67s
fetch time:0.00s write back time:0.00s
Epoch 47:
train loss:167.5232 train ap:0.981394 val ap:0.961333 val auc:0.966534
total time:11.49s prep time:9.96s
fetch time:0.00s write back time:0.00s
Epoch 48:
train loss:165.6863 train ap:0.981684 val ap:0.960024 val auc:0.965201
total time:11.50s prep time:9.97s
fetch time:0.00s write back time:0.00s
Epoch 49:
train loss:165.3790 train ap:0.981795 val ap:0.962299 val auc:0.967019
total time:11.54s prep time:9.98s
fetch time:0.00s write back time:0.00s
Loading the best model at epoch 45
test AP:0.946485 test AUC:0.954197
test_dataset 23621 avg_time 13.31522078514099
import argparse
import os
import sys
from os.path import abspath, join, dirname
from starrygl.distributed.context import DistributedContext
from starrygl.distributed.utils import DistIndex
from starrygl.evaluation.get_evalute_data import get_link_prediction_data
from starrygl.module.modules import GeneralModel
from pathlib import Path
from pathlib import Path
from starrygl.module.utils import parse_config
from starrygl.sample.cache.fetch_cache import FetchFeatureCache
from starrygl.sample.graph_core import DataSet, DistributedGraphStore, TemporalNeighborSampleGraph
from starrygl.module.utils import parse_config, EarlyStopMonitor
from starrygl.sample.graph_core import DataSet, DistributedGraphStore, TemporalNeighborSampleGraph
from starrygl.sample.memory.shared_mailbox import SharedMailBox
from starrygl.sample.sample_core.EvaluateNegativeSampling import EvaluateNegativeSampling
from starrygl.sample.sample_core.base import NegativeSampling
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
from starrygl.sample.part_utils.partition_tgnn import partition_load
import torch
import time
import torch
import torch.nn.functional as F
import torch.distributed as dist
import torch.multiprocessing as mp
from torch.nn.parallel import DistributedDataParallel as DDP
from torch.distributed import init_process_group, destroy_process_group
import os
from starrygl.sample.data_loader import DistributedDataLoader
from starrygl.sample.batch_data import SAMPLE_TYPE
from starrygl.sample.stream_manager import getPipelineManger
parser = argparse.ArgumentParser(
description="RPC Reinforcement Learning Example",
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
)
parser.add_argument('--rank', default=0, type=int, metavar='W',
help='name of dataset')
parser.add_argument('--patience', type=int, default=5, help='Patience for early stopping')
parser.add_argument('--world_size', default=1, type=int, metavar='W',
help='number of negative samples')
parser.add_argument('--dataname', default=1, type=str, metavar='W',
help='name of dataset')
parser.add_argument('--model', default='TGN', type=str, metavar='W',
help='name of model')
parser.add_argument('--negative_sample_strategy', default='random', type=str, metavar='W',
help='name of negative sample strategy')
parser.add_argument('--negative_sample_strategy', default='random', type=str, metavar='W',
help='name of negative sample strategy')
args = parser.parse_args()
from sklearn.metrics import average_precision_score, roc_auc_score
import torch
import time
import random
import dgl
import numpy as np
from sklearn.metrics import average_precision_score, roc_auc_score
from torch.nn.parallel import DistributedDataParallel as DDP
#os.environ['CUDA_VISIBLE_DEVICES'] = str(args.rank)
#os.environ["RANK"] = str(args.rank)
#os.environ["WORLD_SIZE"] = str(args.world_size)
#os.environ["LOCAL_RANK"] = str(0)
torch.cuda.set_device(int(os.environ["LOCAL_RANK"]))
os.environ["MASTER_ADDR"] = '10.214.211.187'
os.environ["MASTER_PORT"] = '9337'
def seed_everything(seed=42):
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
seed_everything(1234)
def main():
print('main')
use_cuda = True
sample_param, memory_param, gnn_param, train_param = parse_config('./config/{}.yml'.format(args.model))
torch.set_num_threads(12)
ctx = DistributedContext.init(backend="nccl", use_gpu=True)
device_id = torch.cuda.current_device()
pdata = partition_load("/mnt/data/part_data/evaluate/{}".format(args.dataname), algo="metis_for_tgnn")
graph = DistributedGraphStore(pdata = pdata,uvm_edge = False)
gnn_param['dyrep'] = True if args.model == 'DyRep' else False
use_src_emb = gnn_param['use_src_emb'] if 'use_src_emb' in gnn_param else False
use_dst_emb = gnn_param['use_dst_emb'] if 'use_dst_emb' in gnn_param else False
gnn_dim_node = 0 if graph.x is None else pdata.x.shape[1]
gnn_dim_edge = 0 if graph.edge_attr is None else pdata.edge_attr.shape[1]
print(gnn_dim_node,gnn_dim_edge)
avg_time = 0
MODEL_SAVE_PATH = f'./saved_models/{args.model}-{args.dataname}.pth'
if use_cuda:
model = GeneralModel(gnn_dim_node, gnn_dim_edge, sample_param, memory_param, gnn_param, train_param).cuda()
device = torch.device('cuda')
else:
model = GeneralModel(gnn_dim_node, gnn_dim_edge, sample_param, memory_param, gnn_param, train_param)
device = torch.device('cpu')
model.load_state_dict(torch.load(MODEL_SAVE_PATH))
sample_graph = TemporalNeighborSampleGraph(sample_graph = pdata.sample_graph,mode = 'full')
if memory_param['type'] != 'none':
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat = pdata.edge_attr.shape[1] if pdata.edge_attr is not None else 0)
else:
mailbox = None
fanout = []
num_layers = sample_param['layer'] if 'layer' in sample_param else 1
fanout = sample_param['neighbor'] if 'neighbor' in sample_param else [10]
policy = sample_param['strategy'] if 'strategy' in sample_param else 'recent'
sampler = NeighborSampler(num_nodes=graph.num_nodes, num_layers=num_layers, fanout=fanout,graph_data=sample_graph, workers=10,policy = policy, graph_name = "wiki_train")
train_data = torch.masked_select(graph.edge_index,pdata.train_mask.to(device)).reshape(2,-1)
train_ts = torch.masked_select(graph.edge_ts,pdata.train_mask.to(device))
val_data = torch.masked_select(graph.edge_index,pdata.val_mask.to(device)).reshape(2,-1)
val_ts = torch.masked_select(graph.edge_ts,pdata.val_mask.to(device))
test_data = torch.masked_select(graph.edge_index,pdata.test_mask.to(device)).reshape(2,-1)
test_ts = torch.masked_select(graph.edge_ts,pdata.test_mask.to(device))
#print(train_data.shape[1],val_data.shape[1],test_data.shape[1])
train_data = DataSet(edges = train_data,ts =train_ts,eids = torch.nonzero(pdata.train_mask).view(-1))
test_data = DataSet(edges = test_data,ts =test_ts,eids = torch.nonzero(pdata.test_mask).view(-1))
val_data = DataSet(edges = val_data,ts = val_ts,eids = torch.nonzero(pdata.val_mask).view(-1))
new_node_val_data = torch.masked_select(graph.edge_index,pdata.new_node_val_mask.to(device)).reshape(2,-1)
new_node_val_ts = torch.masked_select(graph.edge_ts,pdata.new_node_val_mask.to(device))
new_node_test_data = torch.masked_select(graph.edge_index,pdata.new_node_test_mask.to(device)).reshape(2,-1)
new_node_test_ts = torch.masked_select(graph.edge_ts,pdata.new_node_test_mask.to(device))
new_node_val_data = DataSet(edges = new_node_val_data, ts = new_node_val_ts, edis = torch.nonzero(pdata.new_node_val_mask).view(-1))
new_node_test_data = DataSet(edges = new_node_test_data, ts = new_node_test_ts, edis = torch.nonzero(pdata.new_node_test_mask).view(-1))
if args.negative_sample_strategy != 'random':
val_neg_edge_sampler = EvaluateNegativeSampling(src_node_ids=graph.edge_index[0,:], dst_node_ids=graph.edge_index[1,:],
interact_times=graph.edge_ts, last_observed_time=train_data.ts[-1],
negative_sample_strategy=args.negative_sample_strategy, seed=0)
new_node_val_neg_edge_sampler = EvaluateNegativeSampling(src_node_ids=new_node_val_data.edges[0,:], dst_node_ids=new_node_val_data.edges[1,:],
interact_times=new_node_val_data.ts, last_observed_time=train_data.ts[-1],
negative_sample_strategy=args.negative_sample_strategy, seed=1)
test_neg_edge_sampler = EvaluateNegativeSampling(src_node_ids=graph.edge_index[0,:], dst_node_ids=graph.edge_index[1,:],
interact_times=graph.edge_ts, last_observed_time=val_data.ts[-1],
negative_sample_strategy=args.negative_sample_strategy, seed=2)
new_node_test_neg_edge_sampler = EvaluateNegativeSampling(src_node_ids=new_node_test_data.edges[0,:], dst_node_ids=new_node_test_data.edges[1,:],
interact_times=new_node_test_data.ts, last_observed_time=val_data.ts[-1],
negative_sample_strategy=args.negative_sample_strategy, seed=3)
else:
val_neg_edge_sampler = EvaluateNegativeSampling(src_node_ids=graph.edge_index[0,:], dst_node_ids=graph.edge_index[1,:], seed=0)
new_node_val_neg_edge_sampler = EvaluateNegativeSampling(src_node_ids=new_node_val_data.edges[0,:], dst_node_ids=new_node_val_data.edges[1,:], seed=1)
test_neg_edge_sampler =EvaluateNegativeSampling(src_node_ids=graph.edge_index[0,:], dst_node_ids=graph.edge_index[1,:], seed=2)
new_node_test_neg_edge_sampler = EvaluateNegativeSampling(src_node_ids=new_node_test_data.edges[0,:], dst_node_ids=new_node_test_data.edges[1,:], seed=3)
import itertools
import torch
import torch.nn as nn
import torch.nn.functional as F
import dgl.function as fn
from dgl.nn import SAGEConv
import numpy as np
class TimeEncode(torch.nn.Module):
def __init__(self, dim):
super(TimeEncode, self).__init__()
self.dim = dim
self.w = torch.nn.Linear(1, dim)
self.w.weight = torch.nn.Parameter((torch.from_numpy(1 / 10 ** np.linspace(0, 9, dim, dtype=np.float32))).reshape(dim, -1))
self.w.bias = torch.nn.Parameter(torch.zeros(dim))
def forward(self, t):
output = torch.cos(self.w(t.float().reshape((-1, 1))))
return output
class GraphSAGE(nn.Module):
def __init__(self, in_feats, h_feats):
super(GraphSAGE, self).__init__()
self.conv1 = SAGEConv(in_feats, h_feats, "mean")
self.conv2 = SAGEConv(h_feats, h_feats, "mean")
def forward(self, g, in_feat):
h = self.conv1(g, in_feat)
h = F.relu(h)
h = self.conv2(g, h)
return h
class TSAGELayer(nn.Module):
def __init__(self, in_dim = 0, edge_dim = 0, time_dim = 0, h_feats = 0):
super(TSAGELayer, self).__init__()
assert in_dim + time_dim != 0 and h_feats != 0
self.time_dim = time_dim
self.time_enc = TimeEncode(time_dim)
self.sage = SAGEConv(in_dim,h_feats,"mean"),
def forward(self,b):
time_f = self.time_enc(b.edata['dt'])
time_f = torch.cat((torch.zeros(b.num_dst_nodes(),self.time_dim,dtype = time_f.dtype,
device = time_f.device),time_f),dim = 1)
if 'f' in b.edata:
edge_f = torch.cat((torch.zeros(b.num_dst_nodes(),b.edata['f'].shape[1],
dtype = b.edata['f'].dtype,device = b.edata['f'].device),
b.edata['f']),dim = 1)
if 'h' in b.srcdata:
b.srcdata['h'] = torch.cat((b.srcnode['h'],edge_f,time_f),dim = 1)
else:
b.srcdata['h'] = torch.cat((edge_f,time_f),dim = 1)
else:
if 'h' in b.srcdata:
b.srcdata['h'] = torch.cat((b.srcnode['h'],time_f),dim = 1)
else:
b.srcdata['h'] = time_f
return F.relu(self.sage(b,b.src_data['h']))
class TSAGEModel(nn.Module):
def __init__(self, num_layer, node_dim, edge_dim, time_dim, h_dim):
super(TSAGEModel, self).__init__()
self.num_layer = num_layer
layer = []
for i in range(num_layer):
if i != 0:
layer.append(TSAGELayer(h_dim,edge_dim,time_dim,h_dim))
else:
layer.append(TSAGELayer(node_dim,edge_dim,time_dim,h_dim))
self.layers = layer
def forward(self,mfgs):
for l in range(len(mfgs)):
for h in range(len(mfgs[l])):
if l < self.num_layer - 1:
mfgs[l+1][h].srcdata['h'] = self.layers[l](self,mfgs[l][h])
else:
return self.layers[l](self,mfgs[l][h])
class DotPredictor(nn.Module):
def forward(self, g, h):
with g.local_scope():
g.ndata["h"] = h
# Compute a new edge feature named 'score' by a dot-product between the
# source node feature 'h' and destination node feature 'h'.
g.apply_edges(fn.u_dot_v("h", "h", "score"))
# u_dot_v returns a 1-element vector for each edge so you need to squeeze it.
return g.edata["score"][:, 0]
# Thumbnail credits: Link Prediction with Neo4j, Mark Needham
# sphinx_gallery_thumbnail_path = '_static/blitz_4_link_predict.png'
\ No newline at end of file
#torchrun --standalone --nproc-per-node 1 train_tgnn.py --dataname tgbl-wiki --model TGN > tgbl_wiki_train.out &
#wait
#torchrun --standalone --nproc-per-node 4 train_tgnn.py --dataname tgbl-wiki --model TGN > tgbl_wiki_train_4.out &
#wait
#torchrun --standalone --nproc-per-node 1 train_tgnn.py --dataname tgbl-review --model TGN > tgbl_review_train.out &
#wait
#torchrun --standalone --nproc-per-node 4 train_tgnn.py --dataname tgbl-review --model TGN > tgbl_review_train_4.out &
#wait
torchrun --standalone --nproc-per-node 1 train_tgnn.py --dataname tgbl-coin --model TGN_600 > tgbl_coin_train.out &
wait
torchrun --standalone --nproc-per-node 4 train_tgnn.py --dataname tgbl-coin --model TGN_600 > tgbl_coin_train_4.out &
wait
torchrun --standalone --nproc-per-node 1 train_tgnn.py --dataname tgbl-comment --model TGN_600 > tgbl_comment_train.out &
wait
torchrun --standalone --nproc-per-node 4 train_tgnn.py --dataname tgbl-comment --model TGN_600 > tgbl_comment_train_4.out &
wait
torchrun --standalone --nproc-per-node 1 train_tgnn.py --dataname tgbl-flight --model TGN_600 > tgbl_flight_train.out &
wait
torchrun --standalone --nproc-per-node 4 train_tgnn.py --dataname tgbl-flight --model TGN_600 > tgbl_flight_train_4.out &
wait
#torchrun --standalone --nproc-per-node 1 evaluate_tgbl_predict.py --dataname tgbl-wiki --model TGN --train_world_size 1 > tgbl_wiki_4.out &
#wait
#torchrun --standalone --nproc-per-node 1 evaluate_tgbl_predict.py --dataname tgbl-wiki --model TGN --train_world_size 4 > tgbl_wiki.out &
#wait
torchrun --standalone --nproc-per-node 1 evaluate_tgbl_predict.py --dataname tgbl-review --model TGN --train_world_size 1 > tgbl_review.out &
wait
torchrun --standalone --nproc-per-node 1 evaluate_tgbl_predict.py --dataname tgbl-review --model TGN --train_world_size 4 > tgbl_review_4.out &
wait
torchrun --standalone --nproc-per-node 1 evaluate_tgbl_predict.py --dataname tgbl-coin --model TGN_600 --train_world_size 1 > tgbl_coin.out &
wait
torchrun --standalone --nproc-per-node 1 evaluate_tgbl_predict.py --dataname tgbl-coin --model TGN_600 --train_world_size 4 > tgbl_coin_4.out &
wait
torchrun --standalone --nproc-per-node 1 evaluate_tgbl_predict.py --dataname tgbl-comment --model TGN_600 --train_world_size 1 > tgbl_comment.out &
wait
torchrun --standalone --nproc-per-node 1 evaluate_tgbl_predict.py --dataname tgbl-comment --model TGN_600 --train_world_size 4 > tgbl_comment_4.out &
wait
torchrun --standalone --nproc-per-node 1 evaluate_tgbl_predict.py --dataname tgbl-flight --model TGN_600 --train_world_size 1 > tgbl_flight.out &
wait
torchrun --standalone --nproc-per-node 1 evaluate_tgbl_predict.py --dataname tgbl-flight --model TGN_600 --train_world_size 4 > tgbl_flight_4.out &
wait
\ No newline at end of file
LOCAL RANK 0, RANK0
initlize distributed
use cuda on 0
638486
get_neighbors consume: 4.52747s
raw file found, skipping download
Dataset directory is /home/zlj/.miniconda3/envs/dgnn-3.10/lib/python3.10/site-packages/tgb/datasets/tgbl_coin
loading processed file
tensor([1648811421, 1648811421, 1648811424, ..., 1667278439, 1667278439,
1667278439], device='cuda:0') tensor([1648811421, 1648811421, 1648811424, ..., 1667278439, 1667278439,
1667278439])
tensor([1662096249, 1662096249, 1662096249, 1662096249, 1662096249, 1662096249,
1662096249, 1662096249, 1662096249, 1662096249, 1662096254, 1662096254,
1662096254, 1662096254, 1662096254, 1662096254, 1662096254, 1662096254,
1662096254, 1662096254, 1662096254, 1662096254, 1662096254, 1662096254,
1662096254, 1662096254, 1662096254, 1662096254, 1662096254, 1662096254,
1662096254, 1662096254, 1662096254, 1662096254, 1662096254, 1662096254,
1662096276, 1662096276, 1662096276, 1662096276, 1662096276, 1662096286,
1662096290, 1662096290, 1662096290, 1662096290, 1662096290, 1662096290,
1662096290, 1662096290, 1662096293, 1662096293, 1662096293, 1662096293,
1662096293, 1662096293, 1662096297, 1662096297, 1662096297, 1662096297,
1662096297, 1662096297, 1662096297, 1662096297, 1662096297, 1662096297,
1662096297, 1662096297, 1662096297, 1662096297, 1662096297, 1662096297,
1662096297, 1662096297, 1662096297, 1662096297, 1662096297, 1662096297,
1662096297, 1662096297, 1662096297, 1662096297, 1662096297, 1662096297,
1662096297, 1662096297, 1662096325, 1662096325, 1662096325, 1662096325,
1662096325, 1662096325, 1662096325, 1662096325, 1662096325, 1662096325,
1662096325, 1662096325, 1662096325, 1662096325], device='cuda:0') tensor([1664482319, 1664482319, 1664482319, ..., 1667278439, 1667278439,
1667278439], device='cuda:0') tensor([1648811421, 1648811421, 1648811424, ..., 1662096217, 1662096217,
1662096217], device='cuda:0')
This source diff could not be displayed because it is too large. You can view the blob instead.
LOCAL RANK 0, RANK0
use cuda on 0
638486
get_neighbors consume: 4.12395s
Epoch 0:
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
LOCAL RANK 0, RANK0
use cuda on 0
994790
get_neighbors consume: 6.18508s
Epoch 0:
train loss:12236.7578 train ap:0.986976 val ap:0.934674 val auc:0.946284
total time:630.37s prep time:545.79s
fetch time:0.00s write back time:0.00s
Epoch 1:
train loss:11833.1818 train ap:0.987815 val ap:0.960581 val auc:0.965728
total time:628.44s prep time:542.56s
fetch time:0.00s write back time:0.00s
Epoch 2:
train loss:11622.9559 train ap:0.988244 val ap:0.956752 val auc:0.963083
total time:622.89s prep time:538.77s
fetch time:0.00s write back time:0.00s
Epoch 3:
train loss:11679.1400 train ap:0.988072 val ap:0.929351 val auc:0.943797
total time:681.88s prep time:569.50s
fetch time:0.00s write back time:0.00s
Epoch 4:
train loss:11676.1710 train ap:0.988098 val ap:0.936353 val auc:0.948531
total time:849.98s prep time:741.47s
fetch time:0.00s write back time:0.00s
Epoch 5:
train loss:11745.6001 train ap:0.987897 val ap:0.950828 val auc:0.958958
total time:862.77s prep time:750.90s
fetch time:0.00s write back time:0.00s
Epoch 6:
Early stopping at epoch 6
Loading the best model at epoch 1
0.9248434901237488 0.929413378238678
0.8653780221939087 0.861071765422821
test AP:0.847958 test AUC:0.837159
test_dataset 6647176 avg_time 87.00003329753876
LOCAL RANK 0, RANK0
LOCAL RANK 2, RANK2LOCAL RANK 1, RANK1
LOCAL RANK 3, RANK3
use cuda on 0
use cuda on 1
use cuda on 2
use cuda on 3
994790
994790
994790
994790
get_neighbors consume: 6.11692s
get_neighbors consume: 6.12671s
get_neighbors consume: 6.03983s
get_neighbors consume: 6.05302s
num_batchs: tensor([17384], device='cuda:0')
num_batchs: tensor([3931], device='cuda:2')
num_batchs: tensor([16139], device='cuda:1')
num_batchs: tensor([14244], device='cuda:3')
num_batchs: num_batchs: tensor([1915], device='cuda:3')
num_batchs: tensor([1395], device='cuda:1')
num_batchs: tensor([6920], device='cuda:2')
tensor([850], device='cuda:0')
num_batchs:num_batchs: num_batchs: num_batchs: tensor([5545], device='cuda:2')
tensor([1015], device='cuda:0')tensor([2785], device='cuda:3')
tensor([1736], device='cuda:1')
Epoch 0:
Epoch 0:
Epoch 0:
Epoch 0:
train loss:1331.9399 train ap:0.977517 val ap:0.961959 val auc:0.965566
train loss:1162.0666 train ap:0.981900 val ap:0.961959 val auc:0.965566
train loss:1244.0312 train ap:0.978548 val ap:0.961959 val auc:0.965566
train loss:1308.8701 train ap:0.979221 val ap:0.961959 val auc:0.965566
total time:125.18s prep time:60.37s
total time:125.18s prep time:60.37s
fetch time:0.00s write back time:0.00s
fetch time:0.00s write back time:0.00s
total time:125.18s prep time:60.37s
total time:125.18s prep time:60.37s
fetch time:0.00s write back time:0.00s
fetch time:0.00s write back time:0.00s
Epoch 1:
Epoch 1:Epoch 1:
Epoch 1:
train loss:1169.3326 train ap:0.981283 val ap:0.965914 val auc:0.968475
train loss:1227.6728 train ap:0.981686 val ap:0.965914 val auc:0.968475
train loss:1226.7282 train ap:0.980509 val ap:0.965914 val auc:0.968475
train loss:1078.3342 train ap:0.984551 val ap:0.965914 val auc:0.968475
total time:125.97s prep time:61.45s
total time:125.97s prep time:61.45s
total time:125.97s prep time:61.45s
total time:125.97s prep time:61.45s
fetch time:0.00s write back time:0.00s
fetch time:0.00s write back time:0.00s
fetch time:0.00s write back time:0.00s
fetch time:0.00s write back time:0.00s
Epoch 2:
Epoch 2:
Epoch 2:Epoch 2:
train loss:1157.2600 train ap:0.981707 val ap:0.967135 val auc:0.969484
train loss:1227.8567 train ap:0.981577 val ap:0.967135 val auc:0.969484
train loss:1224.2131 train ap:0.980388 val ap:0.967135 val auc:0.969484
total time:125.33s prep time:60.54s
total time:125.33s prep time:60.54s
total time:125.33s prep time:60.54s
fetch time:0.00s write back time:0.00s
train loss:1071.5106 train ap:0.984690 val ap:0.967135 val auc:0.969484
fetch time:0.00s write back time:0.00s
fetch time:0.00s write back time:0.00s
total time:125.33s prep time:60.54s
fetch time:0.00s write back time:0.00s
Epoch 3:Epoch 3:
Epoch 3:
Epoch 3:
train loss:1154.4245 train ap:0.981666 val ap:0.939874 val auc:0.947249
train loss:1221.6654 train ap:0.981759 val ap:0.939874 val auc:0.947249
train loss:1217.4941 train ap:0.980394 val ap:0.939874 val auc:0.947249
train loss:1064.5069 train ap:0.984769 val ap:0.939874 val auc:0.947249
total time:124.82s prep time:60.42s
total time:124.82s prep time:60.42s
total time:124.82s prep time:60.42s
fetch time:0.00s write back time:0.00s
total time:124.82s prep time:60.42s
fetch time:0.00s write back time:0.00s
fetch time:0.00s write back time:0.00s
fetch time:0.00s write back time:0.00s
Epoch 4:
Epoch 4:
Epoch 4:
Epoch 4:
train loss:1206.6170 train ap:0.980821 val ap:0.958336 val auc:0.962538
train loss:1058.5493 train ap:0.984994 val ap:0.958336 val auc:0.962538
train loss:1153.8455 train ap:0.981657 val ap:0.958336 val auc:0.962538
train loss:1214.2795 train ap:0.981938 val ap:0.958336 val auc:0.962538
total time:124.91s prep time:60.22s
total time:124.91s prep time:60.22s
total time:124.91s prep time:60.22s
fetch time:0.00s write back time:0.00s
fetch time:0.00s write back time:0.00s
total time:124.91s prep time:60.22s
fetch time:0.00s write back time:0.00s
fetch time:0.00s write back time:0.00s
Epoch 5:
Epoch 5:
Epoch 5:
Epoch 5:
train loss:1211.5216 train ap:0.982009 val ap:0.949435 val auc:0.953953
train loss:1140.1948 train ap:0.982149 val ap:0.949435 val auc:0.953953
train loss:1050.7336 train ap:0.985195 val ap:0.949435 val auc:0.953953
total time:124.92s prep time:60.40s
train loss:1205.8990 train ap:0.980842 val ap:0.949435 val auc:0.953953
total time:124.92s prep time:60.40s
fetch time:0.00s write back time:0.00s
total time:124.92s prep time:60.40s
total time:124.92s prep time:60.40s
fetch time:0.00s write back time:0.00s
fetch time:0.00s write back time:0.00s
fetch time:0.00s write back time:0.00s
Epoch 6:Epoch 6:
Epoch 6:
Epoch 6:
train loss:1180.1657 train ap:0.981860 val ap:0.960236 val auc:0.963957
train loss:1120.2221 train ap:0.982891 val ap:0.960236 val auc:0.963957
train loss:1180.3385 train ap:0.983051 val ap:0.960236 val auc:0.963957
total time:124.69s prep time:60.01s
total time:124.69s prep time:60.01s
fetch time:0.00s write back time:0.00s
total time:124.69s prep time:60.01s
fetch time:0.00s write back time:0.00s
train loss:1026.1377 train ap:0.985808 val ap:0.960236 val auc:0.963957
fetch time:0.00s write back time:0.00s
total time:124.69s prep time:60.01s
fetch time:0.00s write back time:0.00s
Epoch 7:
Epoch 7:
Epoch 7:
Epoch 7:
Early stopping at epoch 7
Early stopping at epoch 7
Early stopping at epoch 7
Early stopping at epoch 7
Loading the best model at epoch 2
Loading the best model at epoch 2
Loading the best model at epoch 2
Loading the best model at epoch 2
0.9759191870689392 0.977138340473175
0.9759191870689392 0.977138340473175
0.9759191870689392 0.977138340473175
0.9759191870689392 0.977138340473175
0.9553558826446533 0.9581618309020996
0.9553558826446533 0.9581618309020996
0.9553558826446533 0.9581618309020996
0.9553558826446533 0.9581618309020996
test AP:0.940169 test AUC:0.942460
test AP:0.940169 test AUC:0.942460
test AP:0.940169 test AUC:0.942460
test AP:0.940169 test AUC:0.942460
test_dataset 836763 avg_time 9.689588661193847
test_dataset 509738 avg_time 9.689606781005859
test_dataset 1148929 avg_time 9.689572229385377
test_dataset 4151746 avg_time 9.689626097679138
This source diff could not be displayed because it is too large. You can view the blob instead.
LOCAL RANK 0, RANK0
initlize distributed
use cuda on 0
18143
get_neighbors consume: 5.92616s
raw file found, skipping download
Dataset directory is /home/zlj/.miniconda3/envs/dgnn-3.10/lib/python3.10/site-packages/tgb/datasets/tgbl_flight
loading processed file
tensor([1546318800, 1546318800, 1546318800, ..., 1667188800, 1667188800,
1667188800], device='cuda:0') tensor([1546318800, 1546318800, 1546318800, ..., 1667188800, 1667188800,
1667188800])
tensor([1638162000, 1638162000, 1638162000, 1638162000, 1638162000, 1638162000,
1638162000, 1638162000, 1638162000, 1638162000, 1638162000, 1638162000,
1638162000, 1638162000, 1638162000, 1638162000, 1638162000, 1638162000,
1638162000, 1638162000, 1638162000, 1638162000, 1638162000, 1638162000,
1638162000, 1638162000, 1638162000, 1638162000, 1638162000, 1638162000,
1638162000, 1638162000, 1638162000, 1638162000, 1638162000, 1638162000,
1638162000, 1638162000, 1638162000, 1638162000, 1638162000, 1638162000,
1638162000, 1638162000, 1638162000, 1638162000, 1638162000, 1638162000,
1638162000, 1638162000, 1638162000, 1638162000, 1638162000, 1638162000,
1638162000, 1638162000, 1638162000, 1638162000, 1638162000, 1638162000,
1638162000, 1638162000, 1638162000, 1638162000, 1638162000, 1638162000,
1638162000, 1638162000, 1638162000, 1638162000, 1638162000, 1638162000,
1638162000, 1638162000, 1638162000, 1638162000, 1638162000, 1638162000,
1638162000, 1638162000, 1638162000, 1638162000, 1638162000, 1638162000,
1638162000, 1638162000, 1638162000, 1638162000, 1638162000, 1638162000,
1638162000, 1638162000, 1638162000, 1638162000, 1638162000, 1638162000,
1638162000, 1638162000, 1638162000, 1638162000], device='cuda:0') tensor([1653796800, 1653796800, 1653796800, ..., 1667188800, 1667188800,
1667188800], device='cuda:0') tensor([1546318800, 1546318800, 1546318800, ..., 1638075600, 1638075600,
1638075600], device='cuda:0')
LOCAL RANK 0, RANK0
use cuda on 0
18143
get_neighbors consume: 5.32513s
Epoch 0:
train loss:17328.4542 train ap:0.989251 val ap:0.991792 val auc:0.992826
total time:985.43s prep time:855.49s
fetch time:0.00s write back time:0.00s
Epoch 1:
train loss:14294.4074 train ap:0.992597 val ap:0.987955 val auc:0.988882
total time:981.59s prep time:851.99s
fetch time:0.00s write back time:0.00s
Epoch 2:
train loss:14297.0658 train ap:0.992655 val ap:0.991572 val auc:0.992562
total time:980.14s prep time:851.62s
fetch time:0.00s write back time:0.00s
Epoch 3:
train loss:14622.0160 train ap:0.992315 val ap:0.989207 val auc:0.990117
total time:1115.17s prep time:951.63s
fetch time:0.00s write back time:0.00s
Epoch 4:
train loss:14551.7185 train ap:0.992456 val ap:0.986963 val auc:0.988173
total time:1225.09s prep time:1063.97s
fetch time:0.00s write back time:0.00s
Epoch 5:
Early stopping at epoch 5
Loading the best model at epoch 0
0.975067675113678 0.9767603874206543
0.9743184447288513 0.9764328598976135
test AP:0.970799 test AUC:0.973111
test_dataset 10026943 avg_time 112.53466749191284
LOCAL RANK 0, RANK0
LOCAL RANK 2, RANK2
LOCAL RANK 1, RANK1
LOCAL RANK 3, RANK3
use cuda on 1
use cuda on 0
use cuda on 3
use cuda on 2
1814318143
18143
18143
get_neighbors consume: 5.95983s
get_neighbors consume: 6.05613s
get_neighbors consume: 6.14142s
num_batchs: tensor([19750], device='cuda:3')
num_batchs: tensor([17645], device='cuda:2')
num_batchs: tensor([20015], device='cuda:1')
get_neighbors consume: 6.29594s
num_batchs: tensor([20994], device='cuda:0')
num_batchs: num_batchs:num_batchs: num_batchs: tensor([3877], device='cuda:0')tensor([3605], device='cuda:2')
tensor([4524], device='cuda:3')
tensor([4708], device='cuda:1')
num_batchs:num_batchs:num_batchs: num_batchs: tensor([4149], device='cuda:0')tensor([4748], device='cuda:3')tensor([4298], device='cuda:1')tensor([3638], device='cuda:2')
Epoch 0:
Epoch 0:
Epoch 0:
Epoch 0:
train loss:8041.7524 train ap:0.963479 val ap:0.976991 val auc:0.979452
train loss:5974.9788 train ap:0.976456 val ap:0.976991 val auc:0.979452
train loss:6391.0674 train ap:0.969010 val ap:0.976991 val auc:0.979452
total time:369.28s prep time:310.18s
total time:369.28s prep time:310.18s
fetch time:0.00s write back time:0.00s
total time:369.28s prep time:310.18s
fetch time:0.00s write back time:0.00s
fetch time:0.00s write back time:0.00s
train loss:6098.5427 train ap:0.979030 val ap:0.976991 val auc:0.979452
total time:369.28s prep time:310.18s
fetch time:0.00s write back time:0.00s
Epoch 1:
Epoch 1:
Epoch 1:
Epoch 1:
train loss:6785.4083 train ap:0.974335 val ap:0.981245 val auc:0.983090
train loss:5226.3762 train ap:0.979351 val ap:0.981245 val auc:0.983090
train loss:4392.9569 train ap:0.986932 val ap:0.981245 val auc:0.983090
total time:369.13s prep time:309.22s
total time:369.13s prep time:309.22s
total time:369.13s prep time:309.22s
fetch time:0.00s write back time:0.00s
fetch time:0.00s write back time:0.00s
fetch time:0.00s write back time:0.00s
train loss:4860.7541 train ap:0.986432 val ap:0.981245 val auc:0.983090
total time:369.13s prep time:309.22s
fetch time:0.00s write back time:0.00s
Epoch 2:
Epoch 2:
Epoch 2:
Epoch 2:
train loss:4129.9429 train ap:0.988314 val ap:0.980251 val auc:0.982577
train loss:6498.5192 train ap:0.976384 val ap:0.980251 val auc:0.982577
train loss:4974.4880 train ap:0.981104 val ap:0.980251 val auc:0.982577
total time:365.46s prep time:306.60s
total time:365.46s prep time:306.60s
total time:365.46s prep time:306.60s
train loss:4656.6260 train ap:0.987497 val ap:0.980251 val auc:0.982577
fetch time:0.00s write back time:0.00s
fetch time:0.00s write back time:0.00s
fetch time:0.00s write back time:0.00s
total time:365.46s prep time:306.60s
fetch time:0.00s write back time:0.00s
Epoch 3:Epoch 3:
Epoch 3:
Epoch 3:
train loss:6346.4929 train ap:0.977303 val ap:0.978576 val auc:0.980634
train loss:4068.8824 train ap:0.988540 val ap:0.978576 val auc:0.980634
train loss:4918.4225 train ap:0.981662 val ap:0.978576 val auc:0.980634
total time:362.13s prep time:303.87s
train loss:4560.1019 train ap:0.987865 val ap:0.978576 val auc:0.980634
total time:362.13s prep time:303.87s
total time:362.13s prep time:303.87s
fetch time:0.00s write back time:0.00s
fetch time:0.00s write back time:0.00s
total time:362.13s prep time:303.87s
fetch time:0.00s write back time:0.00s
fetch time:0.00s write back time:0.00s
Epoch 4:Epoch 4:Epoch 4:
Epoch 4:
train loss:6310.2800 train ap:0.977605 val ap:0.974950 val auc:0.978057
train loss:3919.8300 train ap:0.989225 val ap:0.974950 val auc:0.978057
train loss:4854.9052 train ap:0.981859 val ap:0.974950 val auc:0.978057
total time:363.65s prep time:304.71s
total time:363.65s prep time:304.71s
fetch time:0.00s write back time:0.00s
total time:363.65s prep time:304.71s
fetch time:0.00s write back time:0.00s
fetch time:0.00s write back time:0.00s
train loss:4467.4543 train ap:0.988407 val ap:0.974950 val auc:0.978057
total time:363.65s prep time:304.71s
fetch time:0.00s write back time:0.00s
Epoch 5:
Epoch 5:
Epoch 5:
Epoch 5:
train loss:6215.2998 train ap:0.978244 val ap:0.970081 val auc:0.973433
train loss:3909.4435 train ap:0.989068 val ap:0.970081 val auc:0.973433
train loss:4805.6621 train ap:0.982218 val ap:0.970081 val auc:0.973433
total time:367.40s prep time:307.71s
total time:367.40s prep time:307.71s
total time:367.40s prep time:307.71s
fetch time:0.00s write back time:0.00s
fetch time:0.00s write back time:0.00s
fetch time:0.00s write back time:0.00s
train loss:4363.8407 train ap:0.988786 val ap:0.970081 val auc:0.973433
total time:367.40s prep time:307.71s
fetch time:0.00s write back time:0.00s
Epoch 6:
Epoch 6:
Epoch 6:
Epoch 6:
LOCAL RANK 0, RANK0
initlize distributed
use cuda on 0
352637
get_neighbors consume: 1.38667s
raw file found, skipping download
Dataset directory is /home/zlj/.miniconda3/envs/dgnn-3.10/lib/python3.10/site-packages/tgb/datasets/tgbl_review
loading processed file
tensor([ 929232000, 930787200, 931824000, ..., 1538524800, 1538611200,
1538611200], device='cuda:0') tensor([ 929232000, 930787200, 931824000, ..., 1538524800, 1538611200,
1538611200])
tensor([1464912000, 1464912000, 1464912000, 1464912000, 1464912000, 1464912000,
1464912000, 1464912000, 1464912000, 1464912000, 1464912000, 1464912000,
1464912000, 1464912000, 1464912000, 1464912000, 1464912000, 1464912000,
1464912000, 1464912000, 1464912000, 1464912000, 1464912000, 1464912000,
1464912000, 1464912000, 1464912000, 1464912000, 1464912000, 1464912000,
1464912000, 1464912000, 1464912000, 1464912000, 1464912000, 1464912000,
1464912000, 1464912000, 1464912000, 1464912000, 1464912000, 1464912000,
1464912000, 1464912000, 1464912000, 1464912000, 1464912000, 1464912000,
1464912000, 1464912000, 1464912000, 1464912000, 1464912000, 1464912000,
1464912000, 1464912000, 1464912000, 1464912000, 1464912000, 1464912000,
1464912000, 1464912000, 1464912000, 1464912000, 1464912000, 1464912000,
1464912000, 1464912000, 1464912000, 1464912000, 1464912000, 1464912000,
1464912000, 1464912000, 1464912000, 1464912000, 1464912000, 1464912000,
1464912000, 1464912000, 1464912000, 1464912000, 1464912000, 1464912000,
1464912000, 1464912000, 1464912000, 1464912000, 1464912000, 1464912000,
1464912000, 1464912000, 1464912000, 1464912000, 1464912000, 1464912000,
1464912000, 1464912000, 1464912000, 1464912000], device='cuda:0') tensor([1488844800, 1488844800, 1488844800, ..., 1538524800, 1538611200,
1538611200], device='cuda:0') tensor([ 929232000, 930787200, 931824000, ..., 1464825600, 1464825600,
1464825600], device='cuda:0')
val metric
metric hits@ is 0.545051806698618
metric mrr is 0.24419529163892895
metric ap is 0.05492426247254426
metric auc is 0.8336828640087952
test metric
metric hits@ is 0.50441967217847
metric mrr is 0.20610860927523572
metric ap is 0.03974634949228331
metric auc is 0.8243427933946369
test_dataset 728919 avg_time 0.0
This source diff could not be displayed because it is too large. You can view the blob instead.
LOCAL RANK 0, RANK0
use cuda on 0
352637
get_neighbors consume: 1.137s
Epoch 0:
train loss:13642.4319 train ap:0.904599 val ap:0.877952 val auc:0.875205
total time:237.42s prep time:206.95s
fetch time:0.00s write back time:0.00s
Epoch 1:
train loss:13355.5800 train ap:0.909874 val ap:0.893142 val auc:0.885588
total time:254.75s prep time:224.74s
fetch time:0.00s write back time:0.00s
Epoch 2:
train loss:13198.5205 train ap:0.912843 val ap:0.880398 val auc:0.881145
total time:254.27s prep time:223.80s
fetch time:0.00s write back time:0.00s
Epoch 3:
train loss:13217.6608 train ap:0.912531 val ap:0.900547 val auc:0.891135
total time:246.83s prep time:217.90s
fetch time:0.00s write back time:0.00s
Epoch 4:
train loss:13175.3821 train ap:0.913456 val ap:0.896569 val auc:0.888247
total time:254.71s prep time:222.90s
fetch time:0.00s write back time:0.00s
Epoch 5:
train loss:13129.5668 train ap:0.914303 val ap:0.901932 val auc:0.891772
total time:256.61s prep time:225.98s
fetch time:0.00s write back time:0.00s
Epoch 6:
train loss:13110.9403 train ap:0.914760 val ap:0.899841 val auc:0.890011
total time:258.74s prep time:228.35s
fetch time:0.00s write back time:0.00s
Epoch 7:
train loss:13174.3921 train ap:0.913513 val ap:0.899906 val auc:0.890777
total time:246.85s prep time:217.08s
fetch time:0.00s write back time:0.00s
Epoch 8:
train loss:13082.4202 train ap:0.915001 val ap:0.896229 val auc:0.888280
total time:248.71s prep time:218.14s
fetch time:0.00s write back time:0.00s
Epoch 9:
train loss:13044.7357 train ap:0.915478 val ap:0.900687 val auc:0.891358
total time:251.59s prep time:220.47s
fetch time:0.00s write back time:0.00s
Epoch 10:
Early stopping at epoch 10
Loading the best model at epoch 5
0.7501348853111267 0.711144745349884
0.8450031280517578 0.8455020189285278
test AP:0.803709 test AUC:0.829060
test_dataset 728919 avg_time 48.60917709827423
LOCAL RANK 0, RANK0
initlize distributed
use cuda on 0
9227
get_neighbors consume: 0.0126036s
raw file found, skipping download
Dataset directory is /home/zlj/.miniconda3/envs/dgnn-3.10/lib/python3.10/site-packages/tgb/datasets/tgbl_wiki
loading processed file
998
998
val metric
metric hits@ is 0.6441952827610548
metric mrr is 0.4429445469178228
metric ap is 0.13983213193432092
metric auc is 0.9535034098327113
998
998
test metric
metric hits@ is 0.5930238971918947
metric mrr is 0.4000823857097686
metric ap is 0.11770197066761524
metric auc is 0.9466253653660359
test_dataset 23621 avg_time 0.0
LOCAL RANK 0, RANK0
initlize distributed
use cuda on 0
9227
get_neighbors consume: 0.0121001s
raw file found, skipping download
Dataset directory is /home/zlj/.miniconda3/envs/dgnn-3.10/lib/python3.10/site-packages/tgb/datasets/tgbl_wiki
loading processed file
998
998
val metric
metric hits@ is 0.5399417177408556
metric mrr is 0.2723149118104061
metric ap is 0.05431445631127528
metric auc is 0.934435814512715
998
998
test metric
metric hits@ is 0.4802390880222562
metric mrr is 0.23669700951704495
metric ap is 0.040643292031104734
metric auc is 0.923300786458167
test_dataset 23621 avg_time 0.0
LOCAL RANK 0, RANK0
use cuda on 0
9227
get_neighbors consume: 0.0119422s
Epoch 0:
train loss:438.1820 train ap:0.905229 val ap:0.929162 val auc:0.928815
total time:10.91s prep time:9.85s
fetch time:0.00s write back time:0.00s
Epoch 1:
train loss:372.5819 train ap:0.926295 val ap:0.929234 val auc:0.926704
total time:8.88s prep time:7.75s
fetch time:0.00s write back time:0.00s
Epoch 2:
train loss:348.3494 train ap:0.939406 val ap:0.951674 val auc:0.948898
total time:9.72s prep time:8.57s
fetch time:0.00s write back time:0.00s
Epoch 3:
train loss:311.1370 train ap:0.952314 val ap:0.959255 val auc:0.956934
total time:9.88s prep time:8.77s
fetch time:0.00s write back time:0.00s
Epoch 4:
train loss:288.5227 train ap:0.959499 val ap:0.967144 val auc:0.964830
total time:9.83s prep time:8.71s
fetch time:0.00s write back time:0.00s
Epoch 5:
train loss:270.2249 train ap:0.964637 val ap:0.971685 val auc:0.969840
total time:9.53s prep time:8.41s
fetch time:0.00s write back time:0.00s
Epoch 6:
train loss:262.1692 train ap:0.966888 val ap:0.972277 val auc:0.970054
total time:9.58s prep time:8.50s
fetch time:0.00s write back time:0.00s
Epoch 7:
train loss:255.6310 train ap:0.968568 val ap:0.972964 val auc:0.971109
total time:9.37s prep time:8.28s
fetch time:0.00s write back time:0.00s
Epoch 8:
train loss:248.8265 train ap:0.970266 val ap:0.975745 val auc:0.974002
total time:9.68s prep time:8.53s
fetch time:0.00s write back time:0.00s
Epoch 9:
train loss:243.9315 train ap:0.971279 val ap:0.976228 val auc:0.974509
total time:9.86s prep time:8.69s
fetch time:0.00s write back time:0.00s
Epoch 10:
train loss:239.1431 train ap:0.972743 val ap:0.977532 val auc:0.975514
total time:9.80s prep time:8.62s
fetch time:0.00s write back time:0.00s
Epoch 11:
train loss:237.8836 train ap:0.972920 val ap:0.977701 val auc:0.976333
total time:9.93s prep time:8.76s
fetch time:0.00s write back time:0.00s
Epoch 12:
train loss:231.7141 train ap:0.973696 val ap:0.978296 val auc:0.976799
total time:9.90s prep time:8.73s
fetch time:0.00s write back time:0.00s
Epoch 13:
train loss:230.5749 train ap:0.974243 val ap:0.978770 val auc:0.977488
total time:9.77s prep time:8.62s
fetch time:0.00s write back time:0.00s
Epoch 14:
train loss:227.9846 train ap:0.974771 val ap:0.978397 val auc:0.977168
total time:9.80s prep time:8.63s
fetch time:0.00s write back time:0.00s
Epoch 15:
train loss:224.3624 train ap:0.975223 val ap:0.980206 val auc:0.979011
total time:9.78s prep time:8.68s
fetch time:0.00s write back time:0.00s
Epoch 16:
train loss:223.1655 train ap:0.975816 val ap:0.981120 val auc:0.979900
total time:9.62s prep time:8.51s
fetch time:0.00s write back time:0.00s
Epoch 17:
train loss:219.1989 train ap:0.976670 val ap:0.981726 val auc:0.980641
total time:9.79s prep time:8.69s
fetch time:0.00s write back time:0.00s
Epoch 18:
train loss:215.7983 train ap:0.977476 val ap:0.981537 val auc:0.980316
total time:10.03s prep time:8.84s
fetch time:0.00s write back time:0.00s
Epoch 19:
train loss:217.2757 train ap:0.976921 val ap:0.981455 val auc:0.980277
total time:9.82s prep time:8.66s
fetch time:0.00s write back time:0.00s
Epoch 20:
train loss:219.2030 train ap:0.976782 val ap:0.981089 val auc:0.980011
total time:10.38s prep time:9.28s
fetch time:0.00s write back time:0.00s
Epoch 21:
train loss:219.9309 train ap:0.976416 val ap:0.981670 val auc:0.980690
total time:9.43s prep time:8.32s
fetch time:0.00s write back time:0.00s
Epoch 22:
train loss:214.2197 train ap:0.977587 val ap:0.982226 val auc:0.981129
total time:9.75s prep time:8.53s
fetch time:0.00s write back time:0.00s
Epoch 23:
train loss:208.9837 train ap:0.978911 val ap:0.982907 val auc:0.981704
total time:11.65s prep time:10.40s
fetch time:0.00s write back time:0.00s
Epoch 24:
train loss:210.4146 train ap:0.978243 val ap:0.982097 val auc:0.980691
total time:10.86s prep time:9.73s
fetch time:0.00s write back time:0.00s
Epoch 25:
train loss:210.4207 train ap:0.978632 val ap:0.982267 val auc:0.981537
total time:9.95s prep time:8.77s
fetch time:0.00s write back time:0.00s
Epoch 26:
train loss:205.5232 train ap:0.979174 val ap:0.983918 val auc:0.982727
total time:9.51s prep time:8.42s
fetch time:0.00s write back time:0.00s
Epoch 27:
train loss:204.8931 train ap:0.979227 val ap:0.983066 val auc:0.982013
total time:9.13s prep time:8.08s
fetch time:0.00s write back time:0.00s
Epoch 28:
train loss:199.6552 train ap:0.980440 val ap:0.982168 val auc:0.981335
total time:9.11s prep time:8.04s
fetch time:0.00s write back time:0.00s
Epoch 29:
train loss:202.9698 train ap:0.979732 val ap:0.981972 val auc:0.981716
total time:9.21s prep time:8.11s
fetch time:0.00s write back time:0.00s
Epoch 30:
train loss:201.1851 train ap:0.980338 val ap:0.983631 val auc:0.982533
total time:9.40s prep time:8.30s
fetch time:0.00s write back time:0.00s
Epoch 31:
train loss:202.0885 train ap:0.979852 val ap:0.984241 val auc:0.983195
total time:9.11s prep time:7.96s
fetch time:0.00s write back time:0.00s
Epoch 32:
train loss:195.8186 train ap:0.981171 val ap:0.985042 val auc:0.984066
total time:9.39s prep time:8.17s
fetch time:0.00s write back time:0.00s
Epoch 33:
train loss:195.5999 train ap:0.980943 val ap:0.984088 val auc:0.983118
total time:9.81s prep time:8.62s
fetch time:0.00s write back time:0.00s
Epoch 34:
train loss:195.3828 train ap:0.981070 val ap:0.984907 val auc:0.983951
total time:10.13s prep time:8.90s
fetch time:0.00s write back time:0.00s
Epoch 35:
train loss:194.4766 train ap:0.981191 val ap:0.985196 val auc:0.984022
total time:10.03s prep time:8.93s
fetch time:0.00s write back time:0.00s
Epoch 36:
train loss:194.5252 train ap:0.981201 val ap:0.984551 val auc:0.983756
total time:9.26s prep time:8.19s
fetch time:0.00s write back time:0.00s
Epoch 37:
train loss:193.6458 train ap:0.981244 val ap:0.985164 val auc:0.984187
total time:9.13s prep time:8.07s
fetch time:0.00s write back time:0.00s
Epoch 38:
train loss:195.7096 train ap:0.981002 val ap:0.983946 val auc:0.982756
total time:9.05s prep time:7.99s
fetch time:0.00s write back time:0.00s
Epoch 39:
train loss:195.2296 train ap:0.981059 val ap:0.985010 val auc:0.983845
total time:9.11s prep time:8.04s
fetch time:0.00s write back time:0.00s
Epoch 40:
train loss:191.4868 train ap:0.981932 val ap:0.985390 val auc:0.984618
total time:9.09s prep time:8.03s
fetch time:0.00s write back time:0.00s
Epoch 41:
train loss:189.6005 train ap:0.981906 val ap:0.984920 val auc:0.984112
total time:9.05s prep time:7.98s
fetch time:0.00s write back time:0.00s
Epoch 42:
train loss:191.7600 train ap:0.981801 val ap:0.984937 val auc:0.984145
total time:9.18s prep time:8.12s
fetch time:0.00s write back time:0.00s
Epoch 43:
train loss:193.1520 train ap:0.981423 val ap:0.984112 val auc:0.983052
total time:9.07s prep time:7.98s
fetch time:0.00s write back time:0.00s
Epoch 44:
train loss:193.6521 train ap:0.981466 val ap:0.984804 val auc:0.983935
total time:9.35s prep time:8.24s
fetch time:0.00s write back time:0.00s
Epoch 45:
Early stopping at epoch 45
Loading the best model at epoch 40
0.9833124876022339 0.9827097654342651
0.9832088351249695 0.982053816318512
test AP:0.978572 test AUC:0.977290
test_dataset 23621 avg_time 7.839827566146851
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment