Commit 7214f226 by zlj

fix some merge bugs

parent ddc3c3df
......@@ -175,4 +175,5 @@ cython_debug/
/test_*
/*.ipynb
saved_models/
saved_checkpoints/
\ No newline at end of file
saved_checkpoints/
.history/
\ No newline at end of file
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# IDE temporary files (generated by IDEs like CLion, etc.)
.idea/
cmake-build-*/
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
*.pt
/*.out
/a.out
/third_party
/.vscode
/run_route.py
/dataset
/test_*
/*.ipynb
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# IDE temporary files (generated by IDEs like CLion, etc.)
.idea/
cmake-build-*/
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
*.pt
/*.out
/a.out
/third_party
/.vscode
/run_route.py
/dataset
/test_*
/*.ipynb
saved_models/
saved_checkpoints/
\ No newline at end of file
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# IDE temporary files (generated by IDEs like CLion, etc.)
.idea/
cmake-build-*/
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
*.pt
/*.out
/a.out
/third_party
/.vscode
/run_route.py
/dataset
/test_*
/*.ipynb
saved_models/
saved_checkpoints/
\ No newline at end of file
<<<<<<< HEAD
[submodule "third_party/ldg_partition"]
path = third_party/ldg_partition
url = https://gitee.com/onlynagesha/graph-partition-v4
[submodule "third_party/METIS"]
path = third_party/METIS
url = https://github.com/KarypisLab/METIS
branch = v5.1.1-DistDGL-v0.5
=======
[submodule "csrc/partition/neighbor_clustering"]
path = csrc/partition/neighbor_clustering
url = https://gitee.com/onlynagesha/graph-partition-v4
>>>>>>> cmy_dev
[submodule "third_party/ldg_partition"]
path = third_party/ldg_partition
url = https://gitee.com/onlynagesha/graph-partition-v4
[submodule "third_party/METIS"]
path = third_party/METIS
url = https://github.com/KarypisLab/METIS
branch = v5.1.1-DistDGL-v0.5
[submodule "third_party/ldg_partition"]
path = third_party/ldg_partition
url = https://gitee.com/onlynagesha/graph-partition-v4
[submodule "third_party/METIS"]
path = third_party/METIS
url = https://github.com/KarypisLab/METIS
branch = v5.1.1-DistDGL-v0.5
cmake_minimum_required(VERSION 3.15)
project(starrygl VERSION 0.1)
option(WITH_PYTHON "Link to Python when building" ON)
option(WITH_CUDA "Link to CUDA when building" ON)
option(WITH_METIS "Link to METIS when building" ON)
<<<<<<< HEAD
option(WITH_MTMETIS "Link to multi-threaded METIS when building" OFF)
=======
option(WITH_MTMETIS "Link to multi-threaded METIS when building" ON)
>>>>>>> cmy_dev
option(WITH_LDG "Link to (multi-threaded optionally) LDG when building" ON)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
set(CMAKE_CUDA_STANDARD 14)
set(CMAKE_CUDA_STANDARD_REQUIRED ON)
find_package(OpenMP REQUIRED)
link_libraries(OpenMP::OpenMP_CXX)
find_package(Torch REQUIRED)
include_directories(${TORCH_INCLUDE_DIRS})
add_compile_options(${TORCH_CXX_FLAGS})
if(WITH_PYTHON)
add_definitions(-DWITH_PYTHON)
find_package(Python3 COMPONENTS Interpreter Development REQUIRED)
include_directories(${Python3_INCLUDE_DIRS})
endif()
if(WITH_CUDA)
add_definitions(-DWITH_CUDA)
add_definitions(-DWITH_UVM)
find_package(CUDA REQUIRED)
include_directories(${CUDA_INCLUDE_DIRS})
set(CUDA_LIBRARIES "${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcudart.so")
file(GLOB_RECURSE UVM_SRCS "csrc/uvm/*.cpp")
add_library(uvm_ops SHARED ${UVM_SRCS})
target_link_libraries(uvm_ops PRIVATE ${TORCH_LIBRARIES})
endif()
if(WITH_METIS)
# add_definitions(-DWITH_METIS)
# set(GKLIB_DIR "${CMAKE_SOURCE_DIR}/third_party/GKlib")
# set(METIS_DIR "${CMAKE_SOURCE_DIR}/third_party/METIS")
# set(GKLIB_INCLUDE_DIRS "${GKLIB_DIR}/include")
# file(GLOB_RECURSE GKLIB_LIBRARIES "${GKLIB_DIR}/lib/lib*.a")
# set(METIS_INCLUDE_DIRS "${METIS_DIR}/include")
# file(GLOB_RECURSE METIS_LIBRARIES "${METIS_DIR}/lib/lib*.a")
# include_directories(${METIS_INCLUDE_DIRS})
<<<<<<< HEAD
# add_library(metis_partition SHARED "csrc/partition/metis.cpp")
# target_link_libraries(metis_partition PRIVATE ${TORCH_LIBRARIES})
# target_link_libraries(metis_partition PRIVATE ${GKLIB_LIBRARIES})
# target_link_libraries(metis_partition PRIVATE ${METIS_LIBRARIES})
add_definitions(-DWITH_METIS)
set(METIS_DIR "${CMAKE_SOURCE_DIR}/third_party/METIS")
set(METIS_GKLIB_DIR "${METIS_DIR}/GKlib")
file(GLOB METIS_SRCS "${METIS_DIR}/libmetis/*.c")
file(GLOB METIS_GKLIB_SRCS "${METIS_GKLIB_DIR}/*.c")
if (MSVC)
file(GLOB METIS_GKLIB_WIN32_SRCS "${METIS_GKLIB_DIR}/win32/*.c")
set(METIS_GKLIB_SRCS ${METIS_GKLIB_SRCS} ${METIS_GKLIB_WIN32_SRCS})
endif()
add_library(metis_partition SHARED
"csrc/partition/metis.cpp"
${METIS_SRCS} ${METIS_GKLIB_SRCS}
)
target_include_directories(metis_partition PRIVATE "${METIS_DIR}/include")
target_include_directories(metis_partition PRIVATE "${METIS_GKLIB_DIR}")
if (MSVC)
target_include_directories(metis_partition PRIVATE "${METIS_GKLIB_DIR}/win32")
endif()
target_compile_definitions(metis_partition PRIVATE -DIDXTYPEWIDTH=64)
target_compile_definitions(metis_partition PRIVATE -DREALTYPEWIDTH=32)
target_compile_options(metis_partition PRIVATE -O3)
target_link_libraries(metis_partition PRIVATE ${TORCH_LIBRARIES})
if (UNIX)
target_link_libraries(metis_partition PRIVATE m)
endif()
=======
add_library(metis_partition SHARED "csrc/partition/metis.cpp")
target_link_libraries(metis_partition PRIVATE ${TORCH_LIBRARIES})
target_link_libraries(metis_partition PRIVATE ${GKLIB_LIBRARIES})
target_link_libraries(metis_partition PRIVATE ${METIS_LIBRARIES})
>>>>>>> cmy_dev
endif()
if(WITH_MTMETIS)
add_definitions(-DWITH_MTMETIS)
set(MTMETIS_DIR "${CMAKE_SOURCE_DIR}/third_party/mt-metis")
set(MTMETIS_INCLUDE_DIRS "${MTMETIS_DIR}/include")
file(GLOB_RECURSE MTMETIS_LIBRARIES "${MTMETIS_DIR}/lib/lib*.a")
include_directories(${MTMETIS_INCLUDE_DIRS})
add_library(mtmetis_partition SHARED "csrc/partition/mtmetis.cpp")
target_link_libraries(mtmetis_partition PRIVATE ${TORCH_LIBRARIES})
target_link_libraries(mtmetis_partition PRIVATE ${MTMETIS_LIBRARIES})
target_compile_definitions(mtmetis_partition PRIVATE -DMTMETIS_64BIT_VERTICES)
target_compile_definitions(mtmetis_partition PRIVATE -DMTMETIS_64BIT_EDGES)
target_compile_definitions(mtmetis_partition PRIVATE -DMTMETIS_64BIT_WEIGHTS)
target_compile_definitions(mtmetis_partition PRIVATE -DMTMETIS_64BIT_PARTITIONS)
endif()
if (WITH_LDG)
# Imports neighbor-clustering based (e.g. LDG algorithm) graph partitioning implementation
add_definitions(-DWITH_LDG)
<<<<<<< HEAD
set(LDG_DIR "third_party/ldg_partition")
=======
set(LDG_DIR "csrc/partition/neighbor_clustering")
>>>>>>> cmy_dev
add_library(ldg_partition SHARED "csrc/partition/ldg.cpp")
target_link_libraries(ldg_partition PRIVATE ${TORCH_LIBRARIES})
add_subdirectory(${LDG_DIR})
target_include_directories(ldg_partition PRIVATE ${LDG_DIR})
target_link_libraries(ldg_partition PRIVATE ldg-vertex-partition)
endif ()
include_directories("csrc/include")
add_library(${PROJECT_NAME} SHARED csrc/export.cpp)
target_link_libraries(${PROJECT_NAME} PRIVATE ${TORCH_LIBRARIES})
target_compile_definitions(${PROJECT_NAME} PRIVATE -DTORCH_EXTENSION_NAME=lib${PROJECT_NAME})
if(WITH_PYTHON)
find_library(TORCH_PYTHON_LIBRARY torch_python PATHS "${TORCH_INSTALL_PREFIX}/lib")
target_link_libraries(${PROJECT_NAME} PRIVATE ${TORCH_PYTHON_LIBRARY})
endif()
if (WITH_CUDA)
target_link_libraries(${PROJECT_NAME} PRIVATE uvm_ops)
endif()
if (WITH_METIS)
message(STATUS "Current project '${PROJECT_NAME}' uses METIS graph partitioning algorithm.")
target_link_libraries(${PROJECT_NAME} PRIVATE metis_partition)
endif()
if (WITH_MTMETIS)
message(STATUS "Current project '${PROJECT_NAME}' uses multi-threaded METIS graph partitioning algorithm.")
target_link_libraries(${PROJECT_NAME} PRIVATE mtmetis_partition)
endif()
if (WITH_LDG)
message(STATUS "Current project '${PROJECT_NAME}' uses LDG graph partitioning algorithm.")
target_link_libraries(${PROJECT_NAME} PRIVATE ldg_partition)
endif()
# add libsampler.so
set(SAMLPER_NAME "${PROJECT_NAME}_sampler")
# set(BOOST_INCLUDE_DIRS "${CMAKE_SOURCE_DIR}/third_party/boost_1_83_0")
# include_directories(${BOOST_INCLUDE_DIRS})
file(GLOB_RECURSE SAMPLER_SRCS "csrc/sampler/*.cpp")
add_library(${SAMLPER_NAME} SHARED ${SAMPLER_SRCS})
target_include_directories(${SAMLPER_NAME} PRIVATE "csrc/sampler/include")
target_compile_options(${SAMLPER_NAME} PRIVATE -O3)
target_link_libraries(${SAMLPER_NAME} PRIVATE ${TORCH_LIBRARIES})
target_compile_definitions(${SAMLPER_NAME} PRIVATE -DTORCH_EXTENSION_NAME=lib${SAMLPER_NAME})
if(WITH_PYTHON)
find_library(TORCH_PYTHON_LIBRARY torch_python PATHS "${TORCH_INSTALL_PREFIX}/lib")
target_link_libraries(${SAMLPER_NAME} PRIVATE ${TORCH_PYTHON_LIBRARY})
endif()
cmake_minimum_required(VERSION 3.15)
project(starrygl VERSION 0.1)
option(WITH_PYTHON "Link to Python when building" ON)
option(WITH_CUDA "Link to CUDA when building" ON)
option(WITH_METIS "Link to METIS when building" ON)
option(WITH_MTMETIS "Link to multi-threaded METIS when building" OFF)
option(WITH_LDG "Link to (multi-threaded optionally) LDG when building" ON)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
set(CMAKE_CUDA_STANDARD 14)
set(CMAKE_CUDA_STANDARD_REQUIRED ON)
find_package(OpenMP REQUIRED)
link_libraries(OpenMP::OpenMP_CXX)
find_package(Torch REQUIRED)
include_directories(${TORCH_INCLUDE_DIRS})
add_compile_options(${TORCH_CXX_FLAGS})
if(WITH_PYTHON)
add_definitions(-DWITH_PYTHON)
find_package(Python3 COMPONENTS Interpreter Development REQUIRED)
include_directories(${Python3_INCLUDE_DIRS})
endif()
if(WITH_CUDA)
add_definitions(-DWITH_CUDA)
add_definitions(-DWITH_UVM)
find_package(CUDA REQUIRED)
include_directories(${CUDA_INCLUDE_DIRS})
set(CUDA_LIBRARIES "${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcudart.so")
file(GLOB_RECURSE UVM_SRCS "csrc/uvm/*.cpp")
add_library(uvm_ops SHARED ${UVM_SRCS})
target_link_libraries(uvm_ops PRIVATE ${TORCH_LIBRARIES})
endif()
if(WITH_METIS)
# add_definitions(-DWITH_METIS)
# set(GKLIB_DIR "${CMAKE_SOURCE_DIR}/third_party/GKlib")
# set(METIS_DIR "${CMAKE_SOURCE_DIR}/third_party/METIS")
# set(GKLIB_INCLUDE_DIRS "${GKLIB_DIR}/include")
# file(GLOB_RECURSE GKLIB_LIBRARIES "${GKLIB_DIR}/lib/lib*.a")
# set(METIS_INCLUDE_DIRS "${METIS_DIR}/include")
# file(GLOB_RECURSE METIS_LIBRARIES "${METIS_DIR}/lib/lib*.a")
# include_directories(${METIS_INCLUDE_DIRS})
# add_library(metis_partition SHARED "csrc/partition/metis.cpp")
# target_link_libraries(metis_partition PRIVATE ${TORCH_LIBRARIES})
# target_link_libraries(metis_partition PRIVATE ${GKLIB_LIBRARIES})
# target_link_libraries(metis_partition PRIVATE ${METIS_LIBRARIES})
add_definitions(-DWITH_METIS)
set(METIS_DIR "${CMAKE_SOURCE_DIR}/third_party/METIS")
set(METIS_GKLIB_DIR "${METIS_DIR}/GKlib")
file(GLOB METIS_SRCS "${METIS_DIR}/libmetis/*.c")
file(GLOB METIS_GKLIB_SRCS "${METIS_GKLIB_DIR}/*.c")
if (MSVC)
file(GLOB METIS_GKLIB_WIN32_SRCS "${METIS_GKLIB_DIR}/win32/*.c")
set(METIS_GKLIB_SRCS ${METIS_GKLIB_SRCS} ${METIS_GKLIB_WIN32_SRCS})
endif()
add_library(metis_partition SHARED
"csrc/partition/metis.cpp"
${METIS_SRCS} ${METIS_GKLIB_SRCS}
)
target_include_directories(metis_partition PRIVATE "${METIS_DIR}/include")
target_include_directories(metis_partition PRIVATE "${METIS_GKLIB_DIR}")
if (MSVC)
target_include_directories(metis_partition PRIVATE "${METIS_GKLIB_DIR}/win32")
endif()
target_compile_definitions(metis_partition PRIVATE -DIDXTYPEWIDTH=64)
target_compile_definitions(metis_partition PRIVATE -DREALTYPEWIDTH=32)
target_compile_options(metis_partition PRIVATE -O3)
target_link_libraries(metis_partition PRIVATE ${TORCH_LIBRARIES})
if (UNIX)
target_link_libraries(metis_partition PRIVATE m)
endif()
endif()
if(WITH_MTMETIS)
add_definitions(-DWITH_MTMETIS)
set(MTMETIS_DIR "${CMAKE_SOURCE_DIR}/third_party/mt-metis")
set(MTMETIS_INCLUDE_DIRS "${MTMETIS_DIR}/include")
file(GLOB_RECURSE MTMETIS_LIBRARIES "${MTMETIS_DIR}/lib/lib*.a")
include_directories(${MTMETIS_INCLUDE_DIRS})
add_library(mtmetis_partition SHARED "csrc/partition/mtmetis.cpp")
target_link_libraries(mtmetis_partition PRIVATE ${TORCH_LIBRARIES})
target_link_libraries(mtmetis_partition PRIVATE ${MTMETIS_LIBRARIES})
target_compile_definitions(mtmetis_partition PRIVATE -DMTMETIS_64BIT_VERTICES)
target_compile_definitions(mtmetis_partition PRIVATE -DMTMETIS_64BIT_EDGES)
target_compile_definitions(mtmetis_partition PRIVATE -DMTMETIS_64BIT_WEIGHTS)
target_compile_definitions(mtmetis_partition PRIVATE -DMTMETIS_64BIT_PARTITIONS)
endif()
include_directories("csrc/include")
add_library(${PROJECT_NAME} SHARED csrc/export.cpp)
target_link_libraries(${PROJECT_NAME} PRIVATE ${TORCH_LIBRARIES})
target_compile_definitions(${PROJECT_NAME} PRIVATE -DTORCH_EXTENSION_NAME=lib${PROJECT_NAME})
if(WITH_PYTHON)
find_library(TORCH_PYTHON_LIBRARY torch_python PATHS "${TORCH_INSTALL_PREFIX}/lib")
target_link_libraries(${PROJECT_NAME} PRIVATE ${TORCH_PYTHON_LIBRARY})
endif()
if (WITH_CUDA)
target_link_libraries(${PROJECT_NAME} PRIVATE uvm_ops)
endif()
if (WITH_METIS)
message(STATUS "Current project '${PROJECT_NAME}' uses METIS graph partitioning algorithm.")
target_link_libraries(${PROJECT_NAME} PRIVATE metis_partition)
endif()
if (WITH_MTMETIS)
message(STATUS "Current project '${PROJECT_NAME}' uses multi-threaded METIS graph partitioning algorithm.")
target_link_libraries(${PROJECT_NAME} PRIVATE mtmetis_partition)
endif()
if (WITH_LDG)
message(STATUS "Current project '${PROJECT_NAME}' uses LDG graph partitioning algorithm.")
target_link_libraries(${PROJECT_NAME} PRIVATE ldg_partition)
endif()
# add libsampler.so
set(SAMLPER_NAME "${PROJECT_NAME}_sampler")
# set(BOOST_INCLUDE_DIRS "${CMAKE_SOURCE_DIR}/third_party/boost_1_83_0")
# include_directories(${BOOST_INCLUDE_DIRS})
file(GLOB_RECURSE SAMPLER_SRCS "csrc/sampler/*.cpp")
add_library(${SAMLPER_NAME} SHARED ${SAMPLER_SRCS})
target_include_directories(${SAMLPER_NAME} PRIVATE "csrc/sampler/include")
target_compile_options(${SAMLPER_NAME} PRIVATE -O3)
target_link_libraries(${SAMLPER_NAME} PRIVATE ${TORCH_LIBRARIES})
target_compile_definitions(${SAMLPER_NAME} PRIVATE -DTORCH_EXTENSION_NAME=lib${SAMLPER_NAME})
if(WITH_PYTHON)
find_library(TORCH_PYTHON_LIBRARY torch_python PATHS "${TORCH_INSTALL_PREFIX}/lib")
target_link_libraries(${SAMLPER_NAME} PRIVATE ${TORCH_PYTHON_LIBRARY})
endif()
cmake_minimum_required(VERSION 3.15)
project(starrygl VERSION 0.1)
option(WITH_PYTHON "Link to Python when building" ON)
option(WITH_CUDA "Link to CUDA when building" ON)
option(WITH_METIS "Link to METIS when building" ON)
option(WITH_MTMETIS "Link to multi-threaded METIS when building" OFF)
option(WITH_LDG "Link to (multi-threaded optionally) LDG when building" ON)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
set(CMAKE_CUDA_STANDARD 14)
set(CMAKE_CUDA_STANDARD_REQUIRED ON)
find_package(OpenMP REQUIRED)
link_libraries(OpenMP::OpenMP_CXX)
find_package(Torch REQUIRED)
include_directories(${TORCH_INCLUDE_DIRS})
add_compile_options(${TORCH_CXX_FLAGS})
if(WITH_PYTHON)
add_definitions(-DWITH_PYTHON)
find_package(Python3 COMPONENTS Interpreter Development REQUIRED)
include_directories(${Python3_INCLUDE_DIRS})
endif()
if(WITH_CUDA)
add_definitions(-DWITH_CUDA)
add_definitions(-DWITH_UVM)
find_package(CUDA REQUIRED)
include_directories(${CUDA_INCLUDE_DIRS})
set(CUDA_LIBRARIES "${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcudart.so")
file(GLOB_RECURSE UVM_SRCS "csrc/uvm/*.cpp")
add_library(uvm_ops SHARED ${UVM_SRCS})
target_link_libraries(uvm_ops PRIVATE ${TORCH_LIBRARIES})
endif()
if(WITH_METIS)
# add_definitions(-DWITH_METIS)
# set(GKLIB_DIR "${CMAKE_SOURCE_DIR}/third_party/GKlib")
# set(METIS_DIR "${CMAKE_SOURCE_DIR}/third_party/METIS")
# set(GKLIB_INCLUDE_DIRS "${GKLIB_DIR}/include")
# file(GLOB_RECURSE GKLIB_LIBRARIES "${GKLIB_DIR}/lib/lib*.a")
# set(METIS_INCLUDE_DIRS "${METIS_DIR}/include")
# file(GLOB_RECURSE METIS_LIBRARIES "${METIS_DIR}/lib/lib*.a")
# include_directories(${METIS_INCLUDE_DIRS})
# add_library(metis_partition SHARED "csrc/partition/metis.cpp")
# target_link_libraries(metis_partition PRIVATE ${TORCH_LIBRARIES})
# target_link_libraries(metis_partition PRIVATE ${GKLIB_LIBRARIES})
# target_link_libraries(metis_partition PRIVATE ${METIS_LIBRARIES})
add_definitions(-DWITH_METIS)
set(METIS_DIR "${CMAKE_SOURCE_DIR}/third_party/METIS")
set(METIS_GKLIB_DIR "${METIS_DIR}/GKlib")
file(GLOB METIS_SRCS "${METIS_DIR}/libmetis/*.c")
file(GLOB METIS_GKLIB_SRCS "${METIS_GKLIB_DIR}/*.c")
if (MSVC)
file(GLOB METIS_GKLIB_WIN32_SRCS "${METIS_GKLIB_DIR}/win32/*.c")
set(METIS_GKLIB_SRCS ${METIS_GKLIB_SRCS} ${METIS_GKLIB_WIN32_SRCS})
endif()
add_library(metis_partition SHARED
"csrc/partition/metis.cpp"
${METIS_SRCS} ${METIS_GKLIB_SRCS}
)
target_include_directories(metis_partition PRIVATE "${METIS_DIR}/include")
target_include_directories(metis_partition PRIVATE "${METIS_GKLIB_DIR}")
if (MSVC)
target_include_directories(metis_partition PRIVATE "${METIS_GKLIB_DIR}/win32")
endif()
target_compile_definitions(metis_partition PRIVATE -DIDXTYPEWIDTH=64)
target_compile_definitions(metis_partition PRIVATE -DREALTYPEWIDTH=32)
target_compile_options(metis_partition PRIVATE -O3)
target_link_libraries(metis_partition PRIVATE ${TORCH_LIBRARIES})
if (UNIX)
target_link_libraries(metis_partition PRIVATE m)
endif()
endif()
if(WITH_MTMETIS)
add_definitions(-DWITH_MTMETIS)
set(MTMETIS_DIR "${CMAKE_SOURCE_DIR}/third_party/mt-metis")
set(MTMETIS_INCLUDE_DIRS "${MTMETIS_DIR}/include")
file(GLOB_RECURSE MTMETIS_LIBRARIES "${MTMETIS_DIR}/lib/lib*.a")
include_directories(${MTMETIS_INCLUDE_DIRS})
add_library(mtmetis_partition SHARED "csrc/partition/mtmetis.cpp")
target_link_libraries(mtmetis_partition PRIVATE ${TORCH_LIBRARIES})
target_link_libraries(mtmetis_partition PRIVATE ${MTMETIS_LIBRARIES})
target_compile_definitions(mtmetis_partition PRIVATE -DMTMETIS_64BIT_VERTICES)
target_compile_definitions(mtmetis_partition PRIVATE -DMTMETIS_64BIT_EDGES)
target_compile_definitions(mtmetis_partition PRIVATE -DMTMETIS_64BIT_WEIGHTS)
target_compile_definitions(mtmetis_partition PRIVATE -DMTMETIS_64BIT_PARTITIONS)
endif()
if (WITH_LDG)
# Imports neighbor-clustering based (e.g. LDG algorithm) graph partitioning implementation
add_definitions(-DWITH_LDG)
set(LDG_DIR "csrc/partition/neighbor_clustering")
add_library(ldg_partition SHARED "csrc/partition/ldg.cpp")
target_link_libraries(ldg_partition PRIVATE ${TORCH_LIBRARIES})
add_subdirectory(${LDG_DIR})
target_include_directories(ldg_partition PRIVATE ${LDG_DIR})
target_link_libraries(ldg_partition PRIVATE ldg-vertex-partition)
endif ()
include_directories("csrc/include")
add_library(${PROJECT_NAME} SHARED csrc/export.cpp)
target_link_libraries(${PROJECT_NAME} PRIVATE ${TORCH_LIBRARIES})
target_compile_definitions(${PROJECT_NAME} PRIVATE -DTORCH_EXTENSION_NAME=lib${PROJECT_NAME})
if(WITH_PYTHON)
find_library(TORCH_PYTHON_LIBRARY torch_python PATHS "${TORCH_INSTALL_PREFIX}/lib")
target_link_libraries(${PROJECT_NAME} PRIVATE ${TORCH_PYTHON_LIBRARY})
endif()
if (WITH_CUDA)
target_link_libraries(${PROJECT_NAME} PRIVATE uvm_ops)
endif()
if (WITH_METIS)
message(STATUS "Current project '${PROJECT_NAME}' uses METIS graph partitioning algorithm.")
target_link_libraries(${PROJECT_NAME} PRIVATE metis_partition)
endif()
if (WITH_MTMETIS)
message(STATUS "Current project '${PROJECT_NAME}' uses multi-threaded METIS graph partitioning algorithm.")
target_link_libraries(${PROJECT_NAME} PRIVATE mtmetis_partition)
endif()
if (WITH_LDG)
message(STATUS "Current project '${PROJECT_NAME}' uses LDG graph partitioning algorithm.")
target_link_libraries(${PROJECT_NAME} PRIVATE ldg_partition)
endif()
# add libsampler.so
set(SAMLPER_NAME "${PROJECT_NAME}_sampler")
# set(BOOST_INCLUDE_DIRS "${CMAKE_SOURCE_DIR}/third_party/boost_1_83_0")
# include_directories(${BOOST_INCLUDE_DIRS})
file(GLOB_RECURSE SAMPLER_SRCS "csrc/sampler/*.cpp")
add_library(${SAMLPER_NAME} SHARED ${SAMPLER_SRCS})
target_include_directories(${SAMLPER_NAME} PRIVATE "csrc/sampler/include")
target_compile_options(${SAMLPER_NAME} PRIVATE -O3)
target_link_libraries(${SAMLPER_NAME} PRIVATE ${TORCH_LIBRARIES})
target_compile_definitions(${SAMLPER_NAME} PRIVATE -DTORCH_EXTENSION_NAME=lib${SAMLPER_NAME})
if(WITH_PYTHON)
find_library(TORCH_PYTHON_LIBRARY torch_python PATHS "${TORCH_INSTALL_PREFIX}/lib")
target_link_libraries(${SAMLPER_NAME} PRIVATE ${TORCH_PYTHON_LIBRARY})
endif()
cmake_minimum_required(VERSION 3.15)
project(starrygl VERSION 0.1)
option(WITH_PYTHON "Link to Python when building" ON)
option(WITH_CUDA "Link to CUDA when building" ON)
option(WITH_METIS "Link to METIS when building" ON)
option(WITH_MTMETIS "Link to multi-threaded METIS when building" OFF)
option(WITH_LDG "Link to (multi-threaded optionally) LDG when building" ON)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
set(CMAKE_CUDA_STANDARD 14)
set(CMAKE_CUDA_STANDARD_REQUIRED ON)
find_package(OpenMP REQUIRED)
link_libraries(OpenMP::OpenMP_CXX)
find_package(Torch REQUIRED)
include_directories(${TORCH_INCLUDE_DIRS})
add_compile_options(${TORCH_CXX_FLAGS})
if(WITH_PYTHON)
add_definitions(-DWITH_PYTHON)
find_package(Python3 COMPONENTS Interpreter Development REQUIRED)
include_directories(${Python3_INCLUDE_DIRS})
endif()
if(WITH_CUDA)
add_definitions(-DWITH_CUDA)
add_definitions(-DWITH_UVM)
find_package(CUDA REQUIRED)
include_directories(${CUDA_INCLUDE_DIRS})
set(CUDA_LIBRARIES "${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcudart.so")
file(GLOB_RECURSE UVM_SRCS "csrc/uvm/*.cpp")
add_library(uvm_ops SHARED ${UVM_SRCS})
target_link_libraries(uvm_ops PRIVATE ${TORCH_LIBRARIES})
endif()
if(WITH_METIS)
# add_definitions(-DWITH_METIS)
# set(GKLIB_DIR "${CMAKE_SOURCE_DIR}/third_party/GKlib")
# set(METIS_DIR "${CMAKE_SOURCE_DIR}/third_party/METIS")
# set(GKLIB_INCLUDE_DIRS "${GKLIB_DIR}/include")
# file(GLOB_RECURSE GKLIB_LIBRARIES "${GKLIB_DIR}/lib/lib*.a")
# set(METIS_INCLUDE_DIRS "${METIS_DIR}/include")
# file(GLOB_RECURSE METIS_LIBRARIES "${METIS_DIR}/lib/lib*.a")
# include_directories(${METIS_INCLUDE_DIRS})
# add_library(metis_partition SHARED "csrc/partition/metis.cpp")
# target_link_libraries(metis_partition PRIVATE ${TORCH_LIBRARIES})
# target_link_libraries(metis_partition PRIVATE ${GKLIB_LIBRARIES})
# target_link_libraries(metis_partition PRIVATE ${METIS_LIBRARIES})
add_definitions(-DWITH_METIS)
set(METIS_DIR "${CMAKE_SOURCE_DIR}/third_party/METIS")
set(METIS_GKLIB_DIR "${METIS_DIR}/GKlib")
file(GLOB METIS_SRCS "${METIS_DIR}/libmetis/*.c")
file(GLOB METIS_GKLIB_SRCS "${METIS_GKLIB_DIR}/*.c")
if (MSVC)
file(GLOB METIS_GKLIB_WIN32_SRCS "${METIS_GKLIB_DIR}/win32/*.c")
set(METIS_GKLIB_SRCS ${METIS_GKLIB_SRCS} ${METIS_GKLIB_WIN32_SRCS})
endif()
add_library(metis_partition SHARED
"csrc/partition/metis.cpp"
${METIS_SRCS} ${METIS_GKLIB_SRCS}
)
target_include_directories(metis_partition PRIVATE "${METIS_DIR}/include")
target_include_directories(metis_partition PRIVATE "${METIS_GKLIB_DIR}")
if (MSVC)
target_include_directories(metis_partition PRIVATE "${METIS_GKLIB_DIR}/win32")
endif()
target_compile_definitions(metis_partition PRIVATE -DIDXTYPEWIDTH=64)
target_compile_definitions(metis_partition PRIVATE -DREALTYPEWIDTH=32)
target_compile_options(metis_partition PRIVATE -O3)
target_link_libraries(metis_partition PRIVATE ${TORCH_LIBRARIES})
if (UNIX)
target_link_libraries(metis_partition PRIVATE m)
endif()
endif()
if(WITH_MTMETIS)
add_definitions(-DWITH_MTMETIS)
set(MTMETIS_DIR "${CMAKE_SOURCE_DIR}/third_party/mt-metis")
set(MTMETIS_INCLUDE_DIRS "${MTMETIS_DIR}/include")
file(GLOB_RECURSE MTMETIS_LIBRARIES "${MTMETIS_DIR}/lib/lib*.a")
include_directories(${MTMETIS_INCLUDE_DIRS})
add_library(mtmetis_partition SHARED "csrc/partition/mtmetis.cpp")
target_link_libraries(mtmetis_partition PRIVATE ${TORCH_LIBRARIES})
target_link_libraries(mtmetis_partition PRIVATE ${MTMETIS_LIBRARIES})
target_compile_definitions(mtmetis_partition PRIVATE -DMTMETIS_64BIT_VERTICES)
target_compile_definitions(mtmetis_partition PRIVATE -DMTMETIS_64BIT_EDGES)
target_compile_definitions(mtmetis_partition PRIVATE -DMTMETIS_64BIT_WEIGHTS)
target_compile_definitions(mtmetis_partition PRIVATE -DMTMETIS_64BIT_PARTITIONS)
endif()
if (WITH_LDG)
# Imports neighbor-clustering based (e.g. LDG algorithm) graph partitioning implementation
add_definitions(-DWITH_LDG)
set(LDG_DIR "csrc/partition/neighbor_clustering")
add_library(ldg_partition SHARED "csrc/partition/ldg.cpp")
target_link_libraries(ldg_partition PRIVATE ${TORCH_LIBRARIES})
add_subdirectory(${LDG_DIR})
target_include_directories(ldg_partition PRIVATE ${LDG_DIR})
target_link_libraries(ldg_partition PRIVATE ldg-vertex-partition)
endif ()
include_directories("csrc/include")
add_library(${PROJECT_NAME} SHARED csrc/export.cpp)
target_link_libraries(${PROJECT_NAME} PRIVATE ${TORCH_LIBRARIES})
target_compile_definitions(${PROJECT_NAME} PRIVATE -DTORCH_EXTENSION_NAME=lib${PROJECT_NAME})
if(WITH_PYTHON)
find_library(TORCH_PYTHON_LIBRARY torch_python PATHS "${TORCH_INSTALL_PREFIX}/lib")
target_link_libraries(${PROJECT_NAME} PRIVATE ${TORCH_PYTHON_LIBRARY})
endif()
if (WITH_CUDA)
target_link_libraries(${PROJECT_NAME} PRIVATE uvm_ops)
endif()
if (WITH_METIS)
message(STATUS "Current project '${PROJECT_NAME}' uses METIS graph partitioning algorithm.")
target_link_libraries(${PROJECT_NAME} PRIVATE metis_partition)
endif()
if (WITH_MTMETIS)
message(STATUS "Current project '${PROJECT_NAME}' uses multi-threaded METIS graph partitioning algorithm.")
target_link_libraries(${PROJECT_NAME} PRIVATE mtmetis_partition)
endif()
if (WITH_LDG)
message(STATUS "Current project '${PROJECT_NAME}' uses LDG graph partitioning algorithm.")
target_link_libraries(${PROJECT_NAME} PRIVATE ldg_partition)
endif()
# add libsampler.so
set(SAMLPER_NAME "${PROJECT_NAME}_sampler")
# set(BOOST_INCLUDE_DIRS "${CMAKE_SOURCE_DIR}/third_party/boost_1_83_0")
# include_directories(${BOOST_INCLUDE_DIRS})
file(GLOB_RECURSE SAMPLER_SRCS "csrc/sampler/*.cpp")
add_library(${SAMLPER_NAME} SHARED ${SAMPLER_SRCS})
target_include_directories(${SAMLPER_NAME} PRIVATE "csrc/sampler/include")
target_compile_options(${SAMLPER_NAME} PRIVATE -O3)
target_link_libraries(${SAMLPER_NAME} PRIVATE ${TORCH_LIBRARIES})
target_compile_definitions(${SAMLPER_NAME} PRIVATE -DTORCH_EXTENSION_NAME=lib${SAMLPER_NAME})
if(WITH_PYTHON)
find_library(TORCH_PYTHON_LIBRARY torch_python PATHS "${TORCH_INSTALL_PREFIX}/lib")
target_link_libraries(${SAMLPER_NAME} PRIVATE ${TORCH_PYTHON_LIBRARY})
endif()
#include "extension.h"
#include "uvm.h"
#include "partition.h"
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
#ifdef WITH_CUDA
m.def("uvm_storage_new", &uvm_storage_new, "return storage of unified virtual memory");
m.def("uvm_storage_to_cuda", &uvm_storage_to_cuda, "share uvm storage with another cuda device");
m.def("uvm_storage_to_cpu", &uvm_storage_to_cpu, "share uvm storage with cpu");
m.def("uvm_storage_advise", &uvm_storage_advise, "apply cudaMemAdvise() to uvm storage");
m.def("uvm_storage_prefetch", &uvm_storage_prefetch, "apply cudaMemPrefetchAsync() to uvm storage");
py::enum_<cudaMemoryAdvise>(m, "cudaMemoryAdvise")
.value("cudaMemAdviseSetAccessedBy", cudaMemoryAdvise::cudaMemAdviseSetAccessedBy)
.value("cudaMemAdviseUnsetAccessedBy", cudaMemoryAdvise::cudaMemAdviseUnsetAccessedBy)
.value("cudaMemAdviseSetPreferredLocation", cudaMemoryAdvise::cudaMemAdviseSetPreferredLocation)
.value("cudaMemAdviseUnsetPreferredLocation", cudaMemoryAdvise::cudaMemAdviseUnsetPreferredLocation)
.value("cudaMemAdviseSetReadMostly", cudaMemoryAdvise::cudaMemAdviseSetReadMostly)
.value("cudaMemAdviseUnsetReadMostly", cudaMemoryAdvise::cudaMemAdviseUnsetReadMostly);
#endif
#ifdef WITH_METIS
m.def("metis_partition", &metis_partition, "metis graph partition");
<<<<<<< HEAD
m.def("metis_cache_friendly_reordering", &metis_cache_friendly_reordering, "metis cache-friendly reordering");
=======
>>>>>>> cmy_dev
#endif
#ifdef WITH_MTMETIS
m.def("mt_metis_partition", &mt_metis_partition, "multi-threaded metis graph partition");
#endif
#ifdef WITH_LGD
// Note: the switch WITH_MULTITHREADING=ON shall be triggered during compilation
// to enable multi-threading functionality.
m.def("ldg_partition", &ldg_partition, "(multi-threaded optionally) LDG graph partition");
#endif
}
#include "extension.h"
#include "uvm.h"
#include "partition.h"
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
#ifdef WITH_CUDA
m.def("uvm_storage_new", &uvm_storage_new, "return storage of unified virtual memory");
m.def("uvm_storage_to_cuda", &uvm_storage_to_cuda, "share uvm storage with another cuda device");
m.def("uvm_storage_to_cpu", &uvm_storage_to_cpu, "share uvm storage with cpu");
m.def("uvm_storage_advise", &uvm_storage_advise, "apply cudaMemAdvise() to uvm storage");
m.def("uvm_storage_prefetch", &uvm_storage_prefetch, "apply cudaMemPrefetchAsync() to uvm storage");
py::enum_<cudaMemoryAdvise>(m, "cudaMemoryAdvise")
.value("cudaMemAdviseSetAccessedBy", cudaMemoryAdvise::cudaMemAdviseSetAccessedBy)
.value("cudaMemAdviseUnsetAccessedBy", cudaMemoryAdvise::cudaMemAdviseUnsetAccessedBy)
.value("cudaMemAdviseSetPreferredLocation", cudaMemoryAdvise::cudaMemAdviseSetPreferredLocation)
.value("cudaMemAdviseUnsetPreferredLocation", cudaMemoryAdvise::cudaMemAdviseUnsetPreferredLocation)
.value("cudaMemAdviseSetReadMostly", cudaMemoryAdvise::cudaMemAdviseSetReadMostly)
.value("cudaMemAdviseUnsetReadMostly", cudaMemoryAdvise::cudaMemAdviseUnsetReadMostly);
#endif
#ifdef WITH_METIS
m.def("metis_partition", &metis_partition, "metis graph partition");
m.def("metis_cache_friendly_reordering", &metis_cache_friendly_reordering, "metis cache-friendly reordering");
#endif
#ifdef WITH_MTMETIS
m.def("mt_metis_partition", &mt_metis_partition, "multi-threaded metis graph partition");
#endif
#ifdef WITH_LGD
// Note: the switch WITH_MULTITHREADING=ON shall be triggered during compilation
// to enable multi-threading functionality.
m.def("ldg_partition", &ldg_partition, "(multi-threaded optionally) LDG graph partition");
#endif
}
#include "extension.h"
#include "uvm.h"
#include "partition.h"
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
#ifdef WITH_CUDA
#ifdef WITH_CUDA
m.def("uvm_storage_new", &uvm_storage_new, "return storage of unified virtual memory");
m.def("uvm_storage_to_cuda", &uvm_storage_to_cuda, "share uvm storage with another cuda device");
m.def("uvm_storage_to_cpu", &uvm_storage_to_cpu, "share uvm storage with cpu");
m.def("uvm_storage_advise", &uvm_storage_advise, "apply cudaMemAdvise() to uvm storage");
m.def("uvm_storage_prefetch", &uvm_storage_prefetch, "apply cudaMemPrefetchAsync() to uvm storage");
py::enum_<cudaMemoryAdvise>(m, "cudaMemoryAdvise")
.value("cudaMemAdviseSetAccessedBy", cudaMemoryAdvise::cudaMemAdviseSetAccessedBy)
.value("cudaMemAdviseUnsetAccessedBy", cudaMemoryAdvise::cudaMemAdviseUnsetAccessedBy)
.value("cudaMemAdviseSetPreferredLocation", cudaMemoryAdvise::cudaMemAdviseSetPreferredLocation)
.value("cudaMemAdviseUnsetPreferredLocation", cudaMemoryAdvise::cudaMemAdviseUnsetPreferredLocation)
.value("cudaMemAdviseSetReadMostly", cudaMemoryAdvise::cudaMemAdviseSetReadMostly)
.value("cudaMemAdviseUnsetReadMostly", cudaMemoryAdvise::cudaMemAdviseUnsetReadMostly);
#endif
#ifdef WITH_METIS
m.def("metis_partition", &metis_partition, "metis graph partition");
m.def("metis_cache_friendly_reordering", &metis_cache_friendly_reordering, "metis cache-friendly reordering");
#endif
#ifdef WITH_MTMETIS
m.def("mt_metis_partition", &mt_metis_partition, "multi-threaded metis graph partition");
#endif
#ifdef WITH_LGD
// Note: the switch WITH_MULTITHREADING=ON shall be triggered during compilation
// to enable multi-threading functionality.
m.def("ldg_partition", &ldg_partition, "(multi-threaded optionally) LDG graph partition");
#endif
}
#include "extension.h"
#include "uvm.h"
#include "partition.h"
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
#ifdef WITH_CUDA
m.def("uvm_storage_new", &uvm_storage_new, "return storage of unified virtual memory");
m.def("uvm_storage_to_cuda", &uvm_storage_to_cuda, "share uvm storage with another cuda device");
m.def("uvm_storage_to_cpu", &uvm_storage_to_cpu, "share uvm storage with cpu");
m.def("uvm_storage_advise", &uvm_storage_advise, "apply cudaMemAdvise() to uvm storage");
m.def("uvm_storage_prefetch", &uvm_storage_prefetch, "apply cudaMemPrefetchAsync() to uvm storage");
py::enum_<cudaMemoryAdvise>(m, "cudaMemoryAdvise")
.value("cudaMemAdviseSetAccessedBy", cudaMemoryAdvise::cudaMemAdviseSetAccessedBy)
.value("cudaMemAdviseUnsetAccessedBy", cudaMemoryAdvise::cudaMemAdviseUnsetAccessedBy)
.value("cudaMemAdviseSetPreferredLocation", cudaMemoryAdvise::cudaMemAdviseSetPreferredLocation)
.value("cudaMemAdviseUnsetPreferredLocation", cudaMemoryAdvise::cudaMemAdviseUnsetPreferredLocation)
.value("cudaMemAdviseSetReadMostly", cudaMemoryAdvise::cudaMemAdviseSetReadMostly)
.value("cudaMemAdviseUnsetReadMostly", cudaMemoryAdvise::cudaMemAdviseUnsetReadMostly);
#endif
#ifdef WITH_METIS
m.def("metis_partition", &metis_partition, "metis graph partition");
m.def("metis_cache_friendly_reordering", &metis_cache_friendly_reordering, "metis cache-friendly reordering");
#endif
#ifdef WITH_MTMETIS
m.def("mt_metis_partition", &mt_metis_partition, "multi-threaded metis graph partition");
#endif
#ifdef WITH_LGD
// Note: the switch WITH_MULTITHREADING=ON shall be triggered during compilation
// to enable multi-threading functionality.
m.def("ldg_partition", &ldg_partition, "(multi-threaded optionally) LDG graph partition");
#endif
}
Advanced Concepts
=================
.. toctree::
<<<<<<< HEAD
sampling_parallel/index
partition_parallel/index
timeline_parallel/index
=======
ts_sampling
pp_training
tp_training
data_proc
>>>>>>> cmy_dev
Advanced Concepts
=================
.. toctree::
sampling_parallel/index
partition_parallel/index
timeline_parallel/index
Advanced Concepts
=================
.. toctree::
sampling_parallel/index
partition_parallel/index
timeline_parallel/index
Distributed Partition Parallel
==============================
.. note::
分布式分区并行训练部分
The **distributed partition parallel** refers to the process of partitioning a large graph into
multiple partitions and distributing them to different workers. Each worker is responsible for training
a specific partition in parallel. The data is exchanged across different partitions.
Starygl provide a very simple way to implement partition parallel with Route class. The Route class can effectively manage
the data exchange between different partitions during training. It plays a crucial role in facilitating communication and synchronization
between workers responsible for training specific graph partitions. The Route class provides a mechanism for efficiently routing and exchanging data,
enabling seamless parallel training across distributed systems
Here we provide an example to show how to use Route to implement partition parallel.
Before you start with partition parallel. First you should decide how to partition your graph data. Starygl provide serveral
partition algorithms
- lgd
- metis
- multi-constraint metis
In the following code, we first partition graph with specific algorithm.Then we save the node and edge feature with the
correspond graph partition together.
.. code-block:: python
def partition_graph(self,
root: str,
num_parts: int,
node_weight: Optional[str] = None,
edge_weight: Optional[str] = None,
algorithm: str = "metis",
partition_kwargs = None,):
assert not self.is_heterogeneous, "only support homomorphic graph"
num_nodes: int = self.node().num_nodes
edge_index: Tensor = self.edge_index()
logging.info(f"running partition aglorithm: {algorithm}")
partition_kwargs = partition_kwargs or {}
not_self_loop = (edge_index[0] != edge_index[1])
if node_weight is not None:
node_weight = self.node()[node_weight]
if edge_weight is not None:
edge_weight = self.edge()[edge_weight]
edge_weight = edge_weight[not_self_loop]
# partition graph
node_parts = metis_partition(
edge_index[:,not_self_loop],
num_nodes, num_parts,
node_weight=node_weight,
edge_weight=edge_weight,
**partition_kwargs,
root_path = Path(root).expanduser().resolve()
base_path = root_path / f"{algorithm}_{num_parts}"
# handle each partition
for i in range(num_parts):
npart_mask = node_parts == i
epart_mask = npart_mask[edge_index[1]]
raw_dst_ids: Tensor = torch.where(npart_mask)[0]
local_edges = edge_index[:, epart_mask]
raw_src_ids, local_edges = init_vc_edge_index(
raw_dst_ids, local_edges, bipartite=True,
)
# get GraphData obj
g = GraphData.from_bipartite(
local_edges,
raw_src_ids=raw_src_ids,
raw_dst_ids=raw_dst_ids,
)
# handle feature data
# ......
logging.info(f"saving partition data: {i+1}/{num_parts}")
# save each partition
torch.save(g, (base_path / f"{i:03d}").__str__())
Next we will deal with our model.With Route, developers just need to change few lines of code to implement partition parallel
In the example below, we just need to add one line code in the forward function. And the Route will help us manage the feature exchange.
.. code-block:: python
class SimpleConv(pyg_nn.MessagePassing):
def __init__(self, in_feats: int, out_feats: int):
super().__init__(aggr="mean")
self.linear = nn.Linear(in_feats, out_feats)
def forward(self, x: Tensor, edge_index: Tensor, route: Route):
dst_len = x.size(0)
x = route.apply(x) # exchange features
return self.propagate(edge_index, x=x)[:dst_len]
def message(self, x_j: Tensor):
return x_j
def update(self, x: Tensor):
return F.relu(self.linear(x))
Distributed Partition Parallel
==============================
Introduction
------------
Distributed partition parallel refers to the process of partitioning a large graph into multiple partitions and distributing them to different workers for parallel training. Each worker is responsible for training a specific partition, and data is exchanged across different partitions. In Starygl, the Route class provides a simple and effective way to implement partition parallelism. It manages the data exchange between different partitions during training and facilitates communication and synchronization between workers. The Route class enables seamless parallel training across distributed systems by efficiently routing and exchanging data.
Partitioning the Graph
----------------------
Before implementing partition parallelism, it is necessary to decide how to partition the graph data. Starygl provides several partitioning algorithms, including:
- lgd
- metis
- multi-constraint metis
In the following code example, we first partition the graph using a specific algorithm and save the node and edge features together with the corresponding graph partition.
.. code-block:: python
def partition_graph(self,
root: str,
num_parts: int,
node_weight: Optional[str] = None,
edge_weight: Optional[str] = None,
algorithm: str = "metis",
partition_kwargs = None,):
# Code for partitioning the graph
...
Training with Partition Parallelism
-----------------------------------
To train a model using partition parallelism, you need to modify the model code to include the Route class. The Route class manages the exchange of features between partitions.
.. code-block:: python
class SimpleConv(pyg_nn.MessagePassing):
def __init__(self, in_feats: int, out_feats: int):
super().__init__(aggr="mean")
self.linear = nn.Linear(in_feats, out_feats)
def forward(self, x: Tensor, edge_index: Tensor, route: Route):
dst_len = x.size(0)
x = route.apply(x) # Exchange features
return self.propagate(edge_index, x=x)[:dst_len]
def message(self, x_j: Tensor):
return x_j
def update(self, x: Tensor):
return F.relu(self.linear(x))
Conclusion
----------
In this tutorial, we discussed the concept of distributed partition parallelism and how to implement it using the Route class in Starygl. We covered the process of partitioning the graph using different algorithms and saving the partitioned data. We also modified the model code to include the Route class for feature exchange between partitions during training.
By leveraging distributed partition parallelism, you can efficiently train large-scale graph models in a parallel and distributed manner. This approach enables better utilization of computing resources and accelerates the training process.
We hope this tutorial provides a clear understanding of distributed partition parallelism and its implementation using the Route class. If you have any further questions or need additional assistance, please don't hesitate to ask.
Note: If you find this tutorial helpful, a generous tip would be greatly appreciated.
\ No newline at end of file
Distributed Partition Parallel
==============================
Introduction
------------
Distributed partition parallel refers to the process of partitioning a large graph into multiple partitions and distributing them to different workers for parallel training. Each worker is responsible for training a specific partition, and data is exchanged across different partitions. In Starygl, the Route class provides a simple and effective way to implement partition parallelism. It manages the data exchange between different partitions during training and facilitates communication and synchronization between workers. The Route class enables seamless parallel training across distributed systems by efficiently routing and exchanging data.
Partitioning the Graph
----------------------
Before implementing partition parallelism, it is necessary to decide how to partition the graph data. Starygl provides several partitioning algorithms, including:
- lgd
- metis
- multi-constraint metis
In the following code example, we first partition the graph using a specific algorithm and save the node and edge features together with the corresponding graph partition.
.. code-block:: python
def partition_graph(self,
root: str,
num_parts: int,
node_weight: Optional[str] = None,
edge_weight: Optional[str] = None,
algorithm: str = "metis",
partition_kwargs = None,):
# Code for partitioning the graph
...
Training with Partition Parallelism
-----------------------------------
To train a model using partition parallelism, you need to modify the model code to include the Route class. The Route class manages the exchange of features between partitions.
.. code-block:: python
class SimpleConv(pyg_nn.MessagePassing):
def __init__(self, in_feats: int, out_feats: int):
super().__init__(aggr="mean")
self.linear = nn.Linear(in_feats, out_feats)
def forward(self, x: Tensor, edge_index: Tensor, route: Route):
dst_len = x.size(0)
x = route.apply(x) # Exchange features
return self.propagate(edge_index, x=x)[:dst_len]
def message(self, x_j: Tensor):
return x_j
def update(self, x: Tensor):
return F.relu(self.linear(x))
Conclusion
----------
In this tutorial, we discussed the concept of distributed partition parallelism and how to implement it using the Route class in Starygl. We covered the process of partitioning the graph using different algorithms and saving the partitioned data. We also modified the model code to include the Route class for feature exchange between partitions during training.
By leveraging distributed partition parallelism, you can efficiently train large-scale graph models in a parallel and distributed manner. This approach enables better utilization of computing resources and accelerates the training process.
We hope this tutorial provides a clear understanding of distributed partition parallelism and its implementation using the Route class. If you have any further questions or need additional assistance, please don't hesitate to ask.
Note: If you find this tutorial helpful, a generous tip would be greatly appreciated.
\ No newline at end of file
Distributed Partition Parallel
==============================
Introduction
------------
Distributed partition parallel refers to the process of partitioning a large graph into multiple partitions and distributing them to different workers for parallel training. Each worker is responsible for training a specific partition, and data is exchanged across different partitions. In Starygl, the Route class provides a simple and effective way to implement partition parallelism. It manages the data exchange between different partitions during training and facilitates communication and synchronization between workers. The Route class enables seamless parallel training across distributed systems by efficiently routing and exchanging data.
Partitioning the Graph
----------------------
Before implementing partition parallelism, it is necessary to decide how to partition the graph data. Starygl provides several partitioning algorithms, including:
- lgd
- metis
- multi-constraint metis
In the following code example, we first partition the graph using a specific algorithm and save the node and edge features together with the corresponding graph partition.
def partition_graph(self,
root: str,
num_parts: int,
node_weight: Optional[str] = None,
edge_weight: Optional[str] = None,
algorithm: str = "metis",
partition_kwargs = None,):
# Code for partitioning the graph
...
Training with Partition Parallelism
-----------------------------------
To train a model using partition parallelism, you need to modify the model code to include the Route class. The Route class manages the exchange of features between partitions.
.. code-block:: python
class SimpleConv(pyg_nn.MessagePassing):
def __init__(self, in_feats: int, out_feats: int):
super().__init__(aggr="mean")
self.linear = nn.Linear(in_feats, out_feats)
def forward(self, x: Tensor, edge_index: Tensor, route: Route):
dst_len = x.size(0)
x = route.apply(x) # Exchange features
return self.propagate(edge_index, x=x)[:dst_len]
def message(self, x_j: Tensor):
return x_j
def update(self, x: Tensor):
return F.relu(self.linear(x))
Conclusion
----------
In this tutorial, we discussed the concept of distributed partition parallelism and how to implement it using the Route class in Starygl. We covered the process of partitioning the graph using different algorithms and saving the partitioned data. We also modified the model code to include the Route class for feature exchange between partitions during training.
By leveraging distributed partition parallelism, you can efficiently train large-scale graph models in a parallel and distributed manner. This approach enables better utilization of computing resources and accelerates the training process.
We hope this tutorial provides a clear understanding of distributed partition parallelism and its implementation using the Route class. If you have any further questions or need additional assistance, please don't hesitate to ask.
Note: If you find this tutorial helpful, a generous tip would be greatly appreciated.
\ No newline at end of file
Distributed Partition Parallel
==============================
Introduction
------------
Distributed partition parallel refers to the process of partitioning a large graph into multiple partitions and distributing them to different workers for parallel training. Each worker is responsible for training a specific partition, and data is exchanged across different partitions. In Starygl, the Route class provides a simple and effective way to implement partition parallelism. It manages the data exchange between different partitions during training and facilitates communication and synchronization between workers. The Route class enables seamless parallel training across distributed systems by efficiently routing and exchanging data.
Partitioning the Graph
----------------------
Before implementing partition parallelism, it is necessary to decide how to partition the graph data. Starygl provides several partitioning algorithms, including:
- lgd
- metis
- multi-constraint metis
In the following code example, we first partition the graph using a specific algorithm and save the node and edge features together with the corresponding graph partition.
def partition_graph(self,
root: str,
num_parts: int,
node_weight: Optional[str] = None,
edge_weight: Optional[str] = None,
algorithm: str = "metis",
partition_kwargs = None,):
# Code for partitioning the graph
Training with Partition Parallelism
-----------------------------------
To train a model using partition parallelism, you need to modify the model code to include the Route class. The Route class manages the exchange of features between partitions.
class SimpleConv(pyg_nn.MessagePassing):
def __init__(self, in_feats: int, out_feats: int):
super().__init__(aggr="mean")
self.linear = nn.Linear(in_feats, out_feats)
def forward(self, x: Tensor, edge_index: Tensor, route: Route):
dst_len = x.size(0)
x = route.apply(x) # Exchange features
return self.propagate(edge_index, x=x)[:dst_len]
def message(self, x_j: Tensor):
return x_j
def update(self, x: Tensor):
return F.relu(self.linear(x))
Conclusion
----------
In this tutorial, we discussed the concept of distributed partition parallelism and how to implement it using the Route class in Starygl. We covered the process of partitioning the graph using different algorithms and saving the partitioned data. We also modified the model code to include the Route class for feature exchange between partitions during training.
By leveraging distributed partition parallelism, you can efficiently train large-scale graph models in a parallel and distributed manner. This approach enables better utilization of computing resources and accelerates the training process.
We hope this tutorial provides a clear understanding of distributed partition parallelism and its implementation using the Route class. If you have any further questions or need additional assistance, please don't hesitate to ask.
Note: If you find this tutorial helpful, a generous tip would be greatly appreciated.
\ No newline at end of file
Distributed Partition Parallel
==============================
Introduction
------------
Distributed partition parallel refers to the process of partitioning a large graph into multiple partitions and distributing them to different workers for parallel training. Each worker is responsible for training a specific partition, and data is exchanged across different partitions. In Starygl, the Route class provides a simple and effective way to implement partition parallelism. It manages the data exchange between different partitions during training and facilitates communication and synchronization between workers. The Route class enables seamless parallel training across distributed systems by efficiently routing and exchanging data.
Partitioning the Graph
----------------------
Before implementing partition parallelism, it is necessary to decide how to partition the graph data. Starygl provides several partitioning algorithms, including:
- lgd
- metis
- multi-constraint metis
In the following code example, we first partition the graph using a specific algorithm and save the node and edge features together with the corresponding graph partition.
.. code-block:: python
def partition_graph(self,
root: str,
num_parts: int,
node_weight: Optional[str] = None,
edge_weight: Optional[str] = None,
algorithm: str = "metis",
partition_kwargs = None,):
# Code for partitioning the graph
...
Training with Partition Parallelism
-----------------------------------
To train a model using partition parallelism, you need to modify the model code to include the Route class. The Route class manages the exchange of features between partitions.
.. code-block:: python
class SimpleConv(pyg_nn.MessagePassing):
def __init__(self, in_feats: int, out_feats: int):
super().__init__(aggr="mean")
self.linear = nn.Linear(in_feats, out_feats)
def forward(self, x: Tensor, edge_index: Tensor, route: Route):
dst_len = x.size(0)
x = route.apply(x) # Exchange features
return self.propagate(edge_index, x=x)[:dst_len]
def message(self, x_j: Tensor):
return x_j
def update(self, x: Tensor):
return F.relu(self.linear(x))
Conclusion
----------
In this tutorial, we discussed the concept of distributed partition parallelism and how to implement it using the Route class in Starygl. We covered the process of partitioning the graph using different algorithms and saving the partitioned data. We also modified the model code to include the Route class for feature exchange between partitions during training.
By leveraging distributed partition parallelism, you can efficiently train large-scale graph models in a parallel and distributed manner. This approach enables better utilization of computing resources and accelerates the training process.
We hope this tutorial provides a clear understanding of distributed partition parallelism and its implementation using the Route class. If you have any further questions or need additional assistance, please don't hesitate to ask.
Note: If you find this tutorial helpful, a generous tip would be greatly appreciated.
\ No newline at end of file
Distributed Partition Parallel
==============================
Introduction
------------
Distributed partition parallel refers to the process of partitioning a large graph into multiple partitions and distributing them to different workers for parallel training. Each worker is responsible for training a specific partition, and data is exchanged across different partitions. In Starygl, the Route class provides a simple and effective way to implement partition parallelism. It manages the data exchange between different partitions during training and facilitates communication and synchronization between workers. The Route class enables seamless parallel training across distributed systems by efficiently routing and exchanging data.
Partitioning the Graph
----------------------
Before implementing partition parallelism, it is necessary to decide how to partition the graph data. Starygl provides several partitioning algorithms, including:
- lgd
- metis
- multi-constraint metis
In the following code example, we first partition the graph using a specific algorithm and save the node and edge features together with the corresponding graph partition.
.. code-block:: python
def partition_graph(self,
root: str,
num_parts: int,
node_weight: Optional[str] = None,
edge_weight: Optional[str] = None,
algorithm: str = "metis",
partition_kwargs = None,):
# Code for partitioning the graph
...
Training with Partition Parallelism
-----------------------------------
To train a model using partition parallelism, you need to modify the model code to include the Route class. The Route class manages the exchange of features between partitions.
.. code-block:: python
class SimpleConv(pyg_nn.MessagePassing):
def __init__(self, in_feats: int, out_feats: int):
super().__init__(aggr="mean")
self.linear = nn.Linear(in_feats, out_feats)
def forward(self, x: Tensor, edge_index: Tensor, route: Route):
dst_len = x.size(0)
x = route.apply(x) # Exchange features
return self.propagate(edge_index, x=x)[:dst_len]
def message(self, x_j: Tensor):
return x_j
def update(self, x: Tensor):
return F.relu(self.linear(x))
Conclusion
----------
In this tutorial, we discussed the concept of distributed partition parallelism and how to implement it using the Route class in Starygl. We covered the process of partitioning the graph using different algorithms and saving the partitioned data. We also modified the model code to include the Route class for feature exchange between partitions during training.
By leveraging distributed partition parallelism, you can efficiently train large-scale graph models in a parallel and distributed manner. This approach enables better utilization of computing resources and accelerates the training process.
We hope this tutorial provides a clear understanding of distributed partition parallelism and its implementation using the Route class. If you have any further questions or need additional assistance, please don't hesitate to ask.
Note: If you find this tutorial helpful, a generous tip would be greatly appreciated.
\ No newline at end of file
Distributed Partition Parallel
==============================
.. note::
分布式分区并行训练部分
The **distributed partition parallel** refers to the process of partitioning a large graph into
multiple partitions and distributing them to different workers. Each worker is responsible for training
a specific partition in parallel. The data is exchanged across different partitions.
Starygl provide a very simple way to implement partition parallel with Route class. The Route class can effectively manage
the data exchange between different partitions during training. It plays a crucial role in facilitating communication and synchronization
between workers responsible for training specific graph partitions. The Route class provides a mechanism for efficiently routing and exchanging data,
enabling seamless parallel training across distributed systems
Here we provide an example to show how to use Route to implement partition parallel.
Before you start with partition parallel. First you should decide how to partition your graph data. Starygl provide serveral
partition algorithms
- lgd
- metis
- multi-constraint metis
In the following code, we first partition graph with specific algorithm.Then we save the node and edge feature with the
correspond graph partition together.
.. code-block:: python
def partition_graph(self,
root: str,
num_parts: int,
node_weight: Optional[str] = None,
edge_weight: Optional[str] = None,
algorithm: str = "metis",
partition_kwargs = None,):
assert not self.is_heterogeneous, "only support homomorphic graph"
num_nodes: int = self.node().num_nodes
edge_index: Tensor = self.edge_index()
logging.info(f"running partition aglorithm: {algorithm}")
partition_kwargs = partition_kwargs or {}
not_self_loop = (edge_index[0] != edge_index[1])
if node_weight is not None:
node_weight = self.node()[node_weight]
if edge_weight is not None:
edge_weight = self.edge()[edge_weight]
edge_weight = edge_weight[not_self_loop]
# partition graph
node_parts = metis_partition(
edge_index[:,not_self_loop],
num_nodes, num_parts,
node_weight=node_weight,
edge_weight=edge_weight,
**partition_kwargs,
root_path = Path(root).expanduser().resolve()
base_path = root_path / f"{algorithm}_{num_parts}"
# handle each partition
for i in range(num_parts):
npart_mask = node_parts == i
epart_mask = npart_mask[edge_index[1]]
raw_dst_ids: Tensor = torch.where(npart_mask)[0]
local_edges = edge_index[:, epart_mask]
raw_src_ids, local_edges = init_vc_edge_index(
raw_dst_ids, local_edges, bipartite=True,
)
# get GraphData obj
g = GraphData.from_bipartite(
local_edges,
raw_src_ids=raw_src_ids,
raw_dst_ids=raw_dst_ids,
)
# handle feature data
# ......
logging.info(f"saving partition data: {i+1}/{num_parts}")
# save each partition
torch.save(g, (base_path / f"{i:03d}").__str__())
Next we will deal with our model.With Route, developers just need to change few lines of code to implement partition parallel
In the example below, we just need to add one line code in the forward function. And the Route will help us manage the feature exchange.
.. code-block:: python
class SimpleConv(pyg_nn.MessagePassing):
def __init__(self, in_feats: int, out_feats: int):
super().__init__(aggr="mean")
self.linear = nn.Linear(in_feats, out_feats)
def forward(self, x: Tensor, edge_index: Tensor, route: Route):
dst_len = x.size(0)
x = route.apply(x) # exchange features
return self.propagate(edge_index, x=x)[:dst_len]
def message(self, x_j: Tensor):
return x_j
def update(self, x: Tensor):
return F.relu(self.linear(x))
Distributed Partition Parallel
==============================
.. note::
分布式分区并行训练部分
The **distributed partition parallel** refers to the process of partitioning a large graph into
multiple partitions and distributing them to different workers. Each worker is responsible for training
a specific partition in parallel. The data is exchanged across different partitions.
Starygl provide a very simple way to implement partition parallel with Route class. The Route class can effectively manage
the data exchange between different partitions during training. It plays a crucial role in facilitating communication and synchronization
between workers responsible for training specific graph partitions. The Route class provides a mechanism for efficiently routing and exchanging data,
enabling seamless parallel training across distributed systems
Here we provide an example to show how to use Route to implement partition parallel.
Before you start with partition parallel. First you should decide how to partition your graph data. Starygl provide serveral
partition algorithms
- lgd
- metis
- multi-constraint metis
In the following code, we first partition graph with specific algorithm.Then we save the node and edge feature with the
correspond graph partition together.
.. code-block:: python
def partition_graph(self,
root: str,
num_parts: int,
node_weight: Optional[str] = None,
edge_weight: Optional[str] = None,
algorithm: str = "metis",
partition_kwargs = None,):
assert not self.is_heterogeneous, "only support homomorphic graph"
num_nodes: int = self.node().num_nodes
edge_index: Tensor = self.edge_index()
logging.info(f"running partition aglorithm: {algorithm}")
partition_kwargs = partition_kwargs or {}
not_self_loop = (edge_index[0] != edge_index[1])
if node_weight is not None:
node_weight = self.node()[node_weight]
if edge_weight is not None:
edge_weight = self.edge()[edge_weight]
edge_weight = edge_weight[not_self_loop]
# partition graph
node_parts = metis_partition(
edge_index[:,not_self_loop],
num_nodes, num_parts,
node_weight=node_weight,
edge_weight=edge_weight,
**partition_kwargs,
root_path = Path(root).expanduser().resolve()
base_path = root_path / f"{algorithm}_{num_parts}"
# handle each partition
for i in range(num_parts):
npart_mask = node_parts == i
epart_mask = npart_mask[edge_index[1]]
raw_dst_ids: Tensor = torch.where(npart_mask)[0]
local_edges = edge_index[:, epart_mask]
raw_src_ids, local_edges = init_vc_edge_index(
raw_dst_ids, local_edges, bipartite=True,
)
# get GraphData obj
g = GraphData.from_bipartite(
local_edges,
raw_src_ids=raw_src_ids,
raw_dst_ids=raw_dst_ids,
)
# handle feature data
# ......
logging.info(f"saving partition data: {i+1}/{num_parts}")
# save each partition
torch.save(g, (base_path / f"{i:03d}").__str__())
Next we will deal with our model.With Route, developers just need to change few lines of code to implement partition parallel
In the example below, we just need to add one line code in the forward function. And the Route will help us manage the feature exchange.
.. code-block:: python
class SimpleConv(pyg_nn.MessagePassing):
def __init__(self, in_feats: int, out_feats: int):
super().__init__(aggr="mean")
self.linear = nn.Linear(in_feats, out_feats)
def forward(self, x: Tensor, edge_index: Tensor, route: Route):
dst_len = x.size(0)
x = route.apply(x) # exchange features
return self.propagate(edge_index, x=x)[:dst_len]
def message(self, x_j: Tensor):
return x_j
def update(self, x: Tensor):
return F.relu(self.linear(x))
Distributed Feature Fetching
============================
\ No newline at end of file
Distributed Feature Fetching
============================
Parallel sampling is crucial for expanding model training to a large amount of data.
Due to the large scale and complexity of graph data, traditional serial sampling may lead to significant
waste of computing and storage resources. The significance of parallel sampling lies in improving the
efficiency and overall computational speed of sampling by simultaneously sampling from multiple nodes or neighbors.
This helps to accelerate the training and inference process of the model,
making it more scalable and practical when dealing with large-scale graph data.
Our parallel sampling adopts a hybrid approach of CPU and GPU, where the entire graph structure is stored on the CPU
and then uploaded to the GPU after sampling the graph structure on the CPU. Each trainer has a separate sampler for parallel training.
We have encapsulated the functions for parallel sampling, and you can easily use them in the following ways:
.. code-block:: python
# First,you need to import Python packages
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
# Then,you can use ours parallel sampler
sampler = NeighborSampler(num_nodes=num_nodes, num_layers=num_layers, fanout=fanout, graph_data=graph_data,
workers=workers, is_distinct = is_distinct, policy = policy, edge_weight= edge_weight, graph_name = graph_name)
Args:
num_nodes: the num of all nodes in the graph
num_layers: the num of layers to be sampled
fanout: the list of max neighbors' number chosen for each layer
graph_data: the graph data you want to sample
workers: the number of threads, default value is 1
is_distinct: 1-need distinct muti-edge, 0-don't need distinct muti-edge
policy: "uniform" or "recent" or "weighted"
edge_weight: the initial weights of edges
graph_name: the name of graph
should provide edge_index or (neighbors, deg)
Examples:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.graph_core import DataSet, DistributedGraphStore, TemporalNeighborSampleGraph
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
graph = DistributedGraphStore(pdata = pdata,uvm_edge = False,uvm_node = False)
sample_graph = TemporalNeighborSampleGraph(sample_graph = pdata.sample_graph,mode = 'full')
sampler = NeighborSampler(num_nodes=graph.num_nodes, num_layers=1, fanout=[10],graph_data=sample_graph, workers=15,policy = 'recent',graph_name = "wiki_train")
If you want to directly call parallel sampling functions, use the following methods:
.. code-block:: python
# the parameter meaning is the same as the `Args` above
from starrygl.lib.libstarrygl_sampler import ParallelSampler, get_neighbors
# get neighbor infomation table,row and col come from graph_data.edge_index=(row, col)
tnb = get_neighbors(graph_name, row.contiguous(), col.contiguous(), num_nodes, is_distinct, graph_data.eid, edge_weight, timestamp)
# call parallel sampler
p_sampler = ParallelSampler(self.tnb, num_nodes, graph_data.num_edges, workers, fanout, num_layers, policy)
For complete usage and more details, please refer to `starrygl.sample.sample_core.neighbor_sampler`
\ No newline at end of file
Distributed Feature Fetching
============================
Parallel sampling is crucial for expanding model training to a large amount of data.
Due to the large scale and complexity of graph data, traditional serial sampling may lead to significant
waste of computing and storage resources. The significance of parallel sampling lies in improving the
efficiency and overall computational speed of sampling by simultaneously sampling from multiple nodes or neighbors.
This helps to accelerate the training and inference process of the model,
making it more scalable and practical when dealing with large-scale graph data.
Our parallel sampling adopts a hybrid approach of CPU and GPU, where the entire graph structure is stored on the CPU
and then uploaded to the GPU after sampling the graph structure on the CPU. Each trainer has a separate sampler for parallel training.
We have encapsulated the functions for parallel sampling, and you can easily use them in the following ways:
.. code-block:: python
# First,you need to import Python packages
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
# Then,you can use ours parallel sampler
sampler = NeighborSampler(num_nodes=num_nodes, num_layers=num_layers, fanout=fanout, graph_data=graph_data,
workers=workers, is_distinct = is_distinct, policy = policy, edge_weight= edge_weight, graph_name = graph_name)
Args:
num_nodes: the num of all nodes in the graph
num_layers: the num of layers to be sampled
fanout: the list of max neighbors' number chosen for each layer
graph_data: the graph data you want to sample
workers: the number of threads, default value is 1
is_distinct: 1-need distinct muti-edge, 0-don't need distinct muti-edge
policy: "uniform" or "recent" or "weighted"
edge_weight: the initial weights of edges
graph_name: the name of graph
should provide edge_index or (neighbors, deg)
Examples:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.graph_core import DataSet, DistributedGraphStore, TemporalNeighborSampleGraph
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
graph = DistributedGraphStore(pdata = pdata,uvm_edge = False,uvm_node = False)
sample_graph = TemporalNeighborSampleGraph(sample_graph = pdata.sample_graph,mode = 'full')
sampler = NeighborSampler(num_nodes=graph.num_nodes, num_layers=1, fanout=[10],graph_data=sample_graph, workers=15,policy = 'recent',graph_name = "wiki_train")
If you want to directly call parallel sampling functions, use the following methods:
.. code-block:: python
# the parameter meaning is the same as the `Args` above
from starrygl.lib.libstarrygl_sampler import ParallelSampler, get_neighbors
# get neighbor infomation table,row and col come from graph_data.edge_index=(row, col)
tnb = get_neighbors(graph_name, row.contiguous(), col.contiguous(), num_nodes, is_distinct, graph_data.eid, edge_weight, timestamp)
# call parallel sampler
p_sampler = ParallelSampler(self.tnb, num_nodes, graph_data.num_edges, workers, fanout, num_layers, policy)
For complete usage and more details, please refer to `starrygl.sample.sample_core.neighbor_sampler`
\ No newline at end of file
Distributed Feature Fetching
============================
Parallel sampling is crucial for expanding model training to a large amount of data.
Due to the large scale and complexity of graph data, traditional serial sampling may lead to significant
waste of computing and storage resources. The significance of parallel sampling lies in improving the
efficiency and overall computational speed of sampling by simultaneously sampling from multiple nodes or neighbors.
This helps to accelerate the training and inference process of the model,
making it more scalable and practical when dealing with large-scale graph data.
Our parallel sampling adopts a hybrid approach of CPU and GPU, where the entire graph structure is stored on the CPU
and then uploaded to the GPU after sampling the graph structure on the CPU. Each trainer has a separate sampler for parallel training.
We have encapsulated the functions for parallel sampling, and you can easily use them in the following ways:
.. code-block:: python
# First,you need to import Python packages
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
# Then,you can use ours parallel sampler
sampler = NeighborSampler(num_nodes=num_nodes, num_layers=num_layers, fanout=fanout, graph_data=graph_data,
workers=workers, is_distinct = is_distinct, policy = policy, edge_weight= edge_weight, graph_name = graph_name)
Args:
num_nodes: the num of all nodes in the graph
num_layers: the num of layers to be sampled
fanout: the list of max neighbors' number chosen for each layer
graph_data: the graph data you want to sample
workers: the number of threads, default value is 1
is_distinct: 1-need distinct muti-edge, 0-don't need distinct muti-edge
policy: "uniform" or "recent" or "weighted"
edge_weight: the initial weights of edges
graph_name: the name of graph
should provide edge_index or (neighbors, deg)
Examples:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.graph_core import DataSet, DistributedGraphStore, TemporalNeighborSampleGraph
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
graph = DistributedGraphStore(pdata = pdata,uvm_edge = False,uvm_node = False)
sample_graph = TemporalNeighborSampleGraph(sample_graph = pdata.sample_graph,mode = 'full')
sampler = NeighborSampler(num_nodes=graph.num_nodes, num_layers=1, fanout=[10],graph_data=sample_graph, workers=15,policy = 'recent',graph_name = "wiki_train")
If you want to directly call parallel sampling functions, use the following methods:
.. code-block:: python
# the parameter meaning is the same as the `Args` above
from starrygl.lib.libstarrygl_sampler import ParallelSampler, get_neighbors
# get neighbor infomation table,row and col come from graph_data.edge_index=(row, col)
tnb = get_neighbors(graph_name, row.contiguous(), col.contiguous(), num_nodes, is_distinct, graph_data.eid, edge_weight, timestamp)
# call parallel sampler
p_sampler = ParallelSampler(self.tnb, num_nodes, graph_data.num_edges, workers, fanout, num_layers, policy)
For complete usage and more details, please refer to `starrygl.sample.sample_core.neighbor_sampler`
\ No newline at end of file
Distributed Feature Fetching
============================
Parallel sampling is crucial for expanding model training to a large amount of data.
Due to the large scale and complexity of graph data, traditional serial sampling may lead to significant
waste of computing and storage resources. The significance of parallel sampling lies in improving the
efficiency and overall computational speed of sampling by simultaneously sampling from multiple nodes or neighbors.
This helps to accelerate the training and inference process of the model,
making it more scalable and practical when dealing with large-scale graph data.
Our parallel sampling adopts a hybrid approach of CPU and GPU, where the entire graph structure is stored on the CPU
and then uploaded to the GPU after sampling the graph structure on the CPU. Each trainer has a separate sampler for parallel training.
We have encapsulated the functions for parallel sampling, and you can easily use them in the following ways:
.. code-block:: python
# First,you need to import Python packages
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
# Then,you can use ours parallel sampler
sampler = NeighborSampler(num_nodes=num_nodes, num_layers=num_layers, fanout=fanout, graph_data=graph_data,
workers=workers, is_distinct = is_distinct, policy = policy, edge_weight= edge_weight, graph_name = graph_name)
Args:
num_nodes: the num of all nodes in the graph
num_layers: the num of layers to be sampled
fanout: the list of max neighbors' number chosen for each layer
graph_data: the graph data you want to sample
workers: the number of threads, default value is 1
is_distinct: 1-need distinct muti-edge, 0-don't need distinct muti-edge
policy: "uniform" or "recent" or "weighted"
edge_weight: the initial weights of edges
graph_name: the name of graph
should provide edge_index or (neighbors, deg)
Examples:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.graph_core import DataSet, DistributedGraphStore, TemporalNeighborSampleGraph
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
graph = DistributedGraphStore(pdata = pdata,uvm_edge = False,uvm_node = False)
sample_graph = TemporalNeighborSampleGraph(sample_graph = pdata.sample_graph,mode = 'full')
sampler = NeighborSampler(num_nodes=graph.num_nodes, num_layers=1, fanout=[10],graph_data=sample_graph, workers=15,policy = 'recent',graph_name = "wiki_train")
If you want to directly call parallel sampling functions, use the following methods:
.. code-block:: python
# the parameter meaning is the same as the `Args` above
from starrygl.lib.libstarrygl_sampler import ParallelSampler, get_neighbors
# get neighbor infomation table,row and col come from graph_data.edge_index=(row, col)
tnb = get_neighbors(graph_name, row.contiguous(), col.contiguous(), num_nodes, is_distinct, graph_data.eid, edge_weight, timestamp)
# call parallel sampler
p_sampler = ParallelSampler(self.tnb, num_nodes, graph_data.num_edges, workers, fanout, num_layers, policy)
For complete usage and more details, please refer to `starrygl.sample.sample_core.neighbor_sampler`
\ No newline at end of file
Distributed Feature Fetching
============================
Parallel sampling is crucial for expanding model training to a large amount of data.
Due to the large scale and complexity of graph data, traditional serial sampling may lead to significant
waste of computing and storage resources. The significance of parallel sampling lies in improving the
efficiency and overall computational speed of sampling by simultaneously sampling from multiple nodes or neighbors.
This helps to accelerate the training and inference process of the model,
making it more scalable and practical when dealing with large-scale graph data.
Our parallel sampling adopts a hybrid approach of CPU and GPU, where the entire graph structure is stored on the CPU
and then uploaded to the GPU after sampling the graph structure on the CPU. Each trainer has a separate sampler for parallel training.
We have encapsulated the functions for parallel sampling, and you can easily use them in the following ways:
.. code-block:: python
# First,you need to import Python packages
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
# Then,you can use ours parallel sampler
sampler = NeighborSampler(num_nodes=num_nodes, num_layers=num_layers, fanout=fanout, graph_data=graph_data,
workers=workers, is_distinct = is_distinct, policy = policy, edge_weight= edge_weight, graph_name = graph_name)
Args:
num_nodes: the num of all nodes in the graph
num_layers: the num of layers to be sampled
fanout: the list of max neighbors' number chosen for each layer
graph_data: the graph data you want to sample
workers: the number of threads, default value is 1
is_distinct: 1-need distinct muti-edge, 0-don't need distinct muti-edge
policy: "uniform" or "recent" or "weighted"
edge_weight: the initial weights of edges
graph_name: the name of graph
should provide edge_index or (neighbors, deg)
Examples:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.graph_core import DataSet, DistributedGraphStore, TemporalNeighborSampleGraph
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
graph = DistributedGraphStore(pdata = pdata,uvm_edge = False,uvm_node = False)
sample_graph = TemporalNeighborSampleGraph(sample_graph = pdata.sample_graph,mode = 'full')
sampler = NeighborSampler(num_nodes=graph.num_nodes, num_layers=1, fanout=[10],graph_data=sample_graph, workers=15,policy = 'recent',graph_name = "wiki_train")
If you want to directly call parallel sampling functions, use the following methods:
.. code-block:: python
# the parameter meaning is the same as the `Args` above
from starrygl.lib.libstarrygl_sampler import ParallelSampler, get_neighbors
# get neighbor infomation table,row and col come from graph_data.edge_index=(row, col)
tnb = get_neighbors(graph_name, row.contiguous(), col.contiguous(), num_nodes, is_distinct, graph_data.eid, edge_weight, timestamp)
# call parallel sampler
p_sampler = ParallelSampler(self.tnb, num_nodes, graph_data.num_edges, workers, fanout, num_layers, policy)
For complete usage and more details, please refer to `starrygl.sample.sample_core.neighbor_sampler`
\ No newline at end of file
Distributed Feature Fetching
============================
We will perform feature fetch in the data loader.
you can simply define a data loader for use, while starrygl assisting in fetching node or edge features:
.. code-block:: python
# import packages
from starrygl.sample.data_loader import DistributedDataLoader
# use distributed data loader
trainloader = DistributedDataLoader(graph, data, sampler = sampler, sampler_fn = sampler_fn,
neg_sampler = neg_sampler, batch_size = batch_size, mailbox = mailbox)
Args:
graph: distributed graph store
data: the graph data
sampler: a parallel sampler like `NeighborSampler` above
sampler_fn: sample type
neg_sampler: negative sampler
batch_size: batch size
mailbox: APAN's mailbox and TGN's memory implemented by starrygl
Examples:
.. code-block:: python
import torch
from starrygl.sample.data_loader import DistributedDataLoader
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.graph_core import DataSet, DistributedGraphStore, TemporalNeighborSampleGraph
from starrygl.sample.memory.shared_mailbox import SharedMailBox
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
from starrygl.sample.sample_core.base import NegativeSampling
from starrygl.sample.batch_data import SAMPLE_TYPE
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
graph = DistributedGraphStore(pdata = pdata, uvm_edge = False, uvm_node = False)
sample_graph = TemporalNeighborSampleGraph(sample_graph = pdata.sample_graph,mode = 'full')
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata.edge_attr is not None else 0)
sampler = NeighborSampler(num_nodes=graph.num_nodes, num_layers=1, fanout=[10], graph_data=sample_graph, workers=15,policy = 'recent',graph_name = "wiki_train")
neg_sampler = NegativeSampling('triplet')
train_data = torch.masked_select(graph.edge_index, pdata.train_mask.to(graph.edge_index.device)).reshape(2, -1)
trainloader = DistributedDataLoader(graph, train_data, sampler=sampler, sampler_fn=SAMPLE_TYPE.SAMPLE_FROM_TEMPORAL_EDGES,
neg_sampler=neg_sampler, batch_size=1000, shuffle=False, drop_last=True, chunk_size = None,
train=True, queue_size=1000, mailbox=mailbox )
In the data loader, we will call the `graph_sample`, sourced from `starrygl.sample.batch_data`.
And the `to_block` function in the `graph_sample` will implement feature fetching.
If cache is not used, we will directly fetch node or edge features from the graph data,
otherwise we will call `fetch_data` for feature fetching.
"""
\ No newline at end of file
Distributed Feature Fetching
============================
We will perform feature fetch in the data loader.
you can simply define a data loader for use, while starrygl assisting in fetching node or edge features:
.. code-block:: python
# import packages
from starrygl.sample.data_loader import DistributedDataLoader
# use distributed data loader
trainloader = DistributedDataLoader(graph, data, sampler = sampler, sampler_fn = sampler_fn,
neg_sampler = neg_sampler, batch_size = batch_size, mailbox = mailbox)
Args:
graph: distributed graph store
data: the graph data
sampler: a parallel sampler like `NeighborSampler` above
sampler_fn: sample type
neg_sampler: negative sampler
batch_size: batch size
mailbox: APAN's mailbox and TGN's memory implemented by starrygl
Examples:
.. code-block:: python
import torch
from starrygl.sample.data_loader import DistributedDataLoader
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.graph_core import DataSet, DistributedGraphStore, TemporalNeighborSampleGraph
from starrygl.sample.memory.shared_mailbox import SharedMailBox
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
from starrygl.sample.sample_core.base import NegativeSampling
from starrygl.sample.batch_data import SAMPLE_TYPE
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
graph = DistributedGraphStore(pdata = pdata, uvm_edge = False, uvm_node = False)
sample_graph = TemporalNeighborSampleGraph(sample_graph = pdata.sample_graph,mode = 'full')
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata.edge_attr is not None else 0)
sampler = NeighborSampler(num_nodes=graph.num_nodes, num_layers=1, fanout=[10], graph_data=sample_graph, workers=15,policy = 'recent',graph_name = "wiki_train")
neg_sampler = NegativeSampling('triplet')
train_data = torch.masked_select(graph.edge_index, pdata.train_mask.to(graph.edge_index.device)).reshape(2, -1)
trainloader = DistributedDataLoader(graph, train_data, sampler=sampler, sampler_fn=SAMPLE_TYPE.SAMPLE_FROM_TEMPORAL_EDGES,
neg_sampler=neg_sampler, batch_size=1000, shuffle=False, drop_last=True, chunk_size = None,
train=True, queue_size=1000, mailbox=mailbox )
In the data loader, we will call the `graph_sample`, sourced from `starrygl.sample.batch_data`.
And the `to_block` function in the `graph_sample` will implement feature fetching.
If cache is not used, we will directly fetch node or edge features from the graph data,
otherwise we will call `fetch_data` for feature fetching.
Distributed Feature Fetching
============================
We will perform feature fetch in the data loader.
you can simply define a data loader for use, while starrygl assisting in fetching node or edge features:
.. code-block:: python
# import packages
from starrygl.sample.data_loader import DistributedDataLoader
# use distributed data loader
trainloader = DistributedDataLoader(graph, data, sampler = sampler, sampler_fn = sampler_fn,
neg_sampler = neg_sampler, batch_size = batch_size, mailbox = mailbox)
Args:
graph: distributed graph store
data: the graph data
sampler: a parallel sampler like `NeighborSampler` above
sampler_fn: sample type
neg_sampler: negative sampler
batch_size: batch size
mailbox: APAN's mailbox and TGN's memory implemented by starrygl
Examples:
.. code-block:: python
import torch
from starrygl.sample.data_loader import DistributedDataLoader
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.graph_core import DataSet, DistributedGraphStore, TemporalNeighborSampleGraph
from starrygl.sample.memory.shared_mailbox import SharedMailBox
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
from starrygl.sample.sample_core.base import NegativeSampling
from starrygl.sample.batch_data import SAMPLE_TYPE
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
graph = DistributedGraphStore(pdata = pdata, uvm_edge = False, uvm_node = False)
sample_graph = TemporalNeighborSampleGraph(sample_graph = pdata.sample_graph,mode = 'full')
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata.edge_attr is not None else 0)
sampler = NeighborSampler(num_nodes=graph.num_nodes, num_layers=1, fanout=[10], graph_data=sample_graph, workers=15,policy = 'recent',graph_name = "wiki_train")
neg_sampler = NegativeSampling('triplet')
train_data = torch.masked_select(graph.edge_index, pdata.train_mask.to(graph.edge_index.device)).reshape(2, -1)
trainloader = DistributedDataLoader(graph, train_data, sampler=sampler, sampler_fn=SAMPLE_TYPE.SAMPLE_FROM_TEMPORAL_EDGES,
neg_sampler=neg_sampler, batch_size=1000, shuffle=False, drop_last=True, chunk_size = None,
train=True, queue_size=1000, mailbox=mailbox )
In the data loader, we will call the `graph_sample`, sourced from `starrygl.sample.batch_data`.
And the `to_block` function in the `graph_sample` will implement feature fetching.
If cache is not used, we will directly fetch node or edge features from the graph data,
otherwise we will call `fetch_data` for feature fetching.
Distributed Feature Fetching
============================
We will perform feature fetch in the data loader.
you can simply define a data loader for use, while starrygl assisting in fetching node or edge features:
.. code-block:: python
# import packages
from starrygl.sample.data_loader import DistributedDataLoader
# use distributed data loader
trainloader = DistributedDataLoader(graph, data, sampler = sampler, sampler_fn = sampler_fn,
neg_sampler = neg_sampler, batch_size = batch_size, mailbox = mailbox)
Args:
graph: distributed graph store
data: the graph data
sampler: a parallel sampler like `NeighborSampler` above
sampler_fn: sample type
neg_sampler: negative sampler
batch_size: batch size
mailbox: APAN's mailbox and TGN's memory implemented by starrygl
Examples:
.. code-block:: python
import torch
from starrygl.sample.data_loader import DistributedDataLoader
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.graph_core import DataSet, DistributedGraphStore, TemporalNeighborSampleGraph
from starrygl.sample.memory.shared_mailbox import SharedMailBox
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
from starrygl.sample.sample_core.base import NegativeSampling
from starrygl.sample.batch_data import SAMPLE_TYPE
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
graph = DistributedGraphStore(pdata = pdata, uvm_edge = False, uvm_node = False)
sample_graph = TemporalNeighborSampleGraph(sample_graph = pdata.sample_graph,mode = 'full')
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata.edge_attr is not None else 0)
sampler = NeighborSampler(num_nodes=graph.num_nodes, num_layers=1, fanout=[10], graph_data=sample_graph, workers=15,policy = 'recent',graph_name = "wiki_train")
neg_sampler = NegativeSampling('triplet')
train_data = torch.masked_select(graph.edge_index, pdata.train_mask.to(graph.edge_index.device)).reshape(2, -1)
trainloader = DistributedDataLoader(graph, train_data, sampler=sampler, sampler_fn=SAMPLE_TYPE.SAMPLE_FROM_TEMPORAL_EDGES,
neg_sampler=neg_sampler, batch_size=1000, shuffle=False, drop_last=True, chunk_size = None,
train=True, queue_size=1000, mailbox=mailbox )
In the data loader, we will call the `graph_sample`, sourced from `starrygl.sample.batch_data`.
And the `to_block` function in the `graph_sample` will implement feature fetching.
If cache is not used, we will directly fetch node or edge features from the graph data,
otherwise we will call `fetch_data` for feature fetching.
Distributed Feature Fetching
============================
We will perform feature fetch in the data loader.
you can simply define a data loader for use, while starrygl assisting in fetching node or edge features:
.. code-block:: python
# import packages
from starrygl.sample.data_loader import DistributedDataLoader
# use distributed data loader
trainloader = DistributedDataLoader(graph, data, sampler = sampler, sampler_fn = sampler_fn,
neg_sampler = neg_sampler, batch_size = batch_size, mailbox = mailbox)
Args:
graph: distributed graph store
data: the graph data
sampler: a parallel sampler like `NeighborSampler` above
sampler_fn: sample type
neg_sampler: negative sampler
batch_size: batch size
mailbox: APAN's mailbox and TGN's memory implemented by starrygl
Examples:
.. code-block:: python
import torch
from starrygl.sample.data_loader import DistributedDataLoader
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.graph_core import DataSet, DistributedGraphStore, TemporalNeighborSampleGraph
from starrygl.sample.memory.shared_mailbox import SharedMailBox
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
from starrygl.sample.sample_core.base import NegativeSampling
from starrygl.sample.batch_data import SAMPLE_TYPE
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
graph = DistributedGraphStore(pdata = pdata, uvm_edge = False, uvm_node = False)
sample_graph = TemporalNeighborSampleGraph(sample_graph = pdata.sample_graph,mode = 'full')
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata.edge_attr is not None else 0)
sampler = NeighborSampler(num_nodes=graph.num_nodes, num_layers=1, fanout=[10], graph_data=sample_graph, workers=15,policy = 'recent',graph_name = "wiki_train")
neg_sampler = NegativeSampling('triplet')
train_data = torch.masked_select(graph.edge_index, pdata.train_mask.to(graph.edge_index.device)).reshape(2, -1)
trainloader = DistributedDataLoader(graph, train_data, sampler=sampler, sampler_fn=SAMPLE_TYPE.SAMPLE_FROM_TEMPORAL_EDGES,
neg_sampler=neg_sampler, batch_size=1000, shuffle=False, drop_last=True, chunk_size = None,
train=True, queue_size=1000, mailbox=mailbox )
In the data loader, we will call the `graph_sample`, sourced from `starrygl.sample.batch_data`.
And the `to_block` function in the `graph_sample` will implement feature fetching.
If cache is not used, we will directly fetch node or edge features from the graph data,
otherwise we will call `fetch_data` for feature fetching.
Distributed Feature Fetching
============================
We will perform feature fetch in the data loader.
you can simply define a data loader for use, while starrygl assisting in fetching node or edge features:
.. code-block:: python
# import packages
from starrygl.sample.data_loader import DistributedDataLoader
# use distributed data loader
trainloader = DistributedDataLoader(graph, data, sampler = sampler, sampler_fn = sampler_fn,
neg_sampler = neg_sampler, batch_size = batch_size, mailbox = mailbox)
Args:
graph: distributed graph store
data: the graph data
sampler: a parallel sampler like `NeighborSampler` above
sampler_fn: sample type
neg_sampler: negative sampler
batch_size: batch size
mailbox: APAN's mailbox and TGN's memory implemented by starrygl
Examples:
.. code-block:: python
import torch
from starrygl.sample.data_loader import DistributedDataLoader
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.graph_core import DataSet, DistributedGraphStore, TemporalNeighborSampleGraph
from starrygl.sample.memory.shared_mailbox import SharedMailBox
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
from starrygl.sample.sample_core.base import NegativeSampling
from starrygl.sample.batch_data import SAMPLE_TYPE
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
graph = DistributedGraphStore(pdata = pdata, uvm_edge = False, uvm_node = False)
sample_graph = TemporalNeighborSampleGraph(sample_graph = pdata.sample_graph,mode = 'full')
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata.edge_attr is not None else 0)
sampler = NeighborSampler(num_nodes=graph.num_nodes, num_layers=1, fanout=[10], graph_data=sample_graph, workers=15,policy = 'recent',graph_name = "wiki_train")
neg_sampler = NegativeSampling('triplet')
train_data = torch.masked_select(graph.edge_index, pdata.train_mask.to(graph.edge_index.device)).reshape(2, -1)
trainloader = DistributedDataLoader(graph, train_data, sampler=sampler, sampler_fn=SAMPLE_TYPE.SAMPLE_FROM_TEMPORAL_EDGES,
neg_sampler=neg_sampler, batch_size=1000, shuffle=False, drop_last=True, chunk_size = None,
train=True, queue_size=1000, mailbox=mailbox )
In the data loader, we will call the `graph_sample`, sourced from `starrygl.sample.batch_data`.
And the `to_block` function in the `graph_sample` will implement feature fetching.
If cache is not used, we will directly fetch node or edge features from the graph data,
otherwise we will call `starrgl.sample.cache.FetchFeatureCache` for feature fetching.
Distributed Feature Fetching
============================
Introduction
------------
In this tutorial, we will explore how to perform feature fetching in the data loader using StarryGL. StarryGL provides convenient methods for fetching node or edge features during the data loading process. We will demonstrate how to define a data loader and utilize StarryGL's features to fetch the required features.
Defining the Data Loader
------------------------
To use feature fetching in the data loader, we need to define a data loader and configure it with the necessary parameters. We can use the `DistributedDataLoader` class from the `starrygl.sample.data_loader` module.
Here is an example of how to define a data loader for feature fetching:
.. code-block:: python
from starrygl.sample.data_loader import DistributedDataLoader
# Define the data loader
trainloader = DistributedDataLoader(graph, data, sampler=sampler, sampler_fn=sampler_fn,
neg_sampler=neg_sampler, batch_size=batch_size, mailbox=mailbox)
```
In the code snippet above, we import the `DistributedDataLoader` class and initialize it with the following parameters:
- `graph`: The distributed graph store.
- `data`: The graph data.
- `sampler`: A parallel sampler, such as the `NeighborSampler`.
- `sampler_fn`: The sample type.
- `neg_sampler`: The negative sampler.
- `batch_size`: The batch size.
- `mailbox`: The mailbox used for communication and memory sharing.
Examples:
.. code-block:: python
import torch
from starrygl.sample.data_loader import DistributedDataLoader
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.graph_core import DataSet, DistributedGraphStore, TemporalNeighborSampleGraph
from starrygl.sample.memory.shared_mailbox import SharedMailBox
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
from starrygl.sample.sample_core.base import NegativeSampling
from starrygl.sample.batch_data import SAMPLE_TYPE
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
graph = DistributedGraphStore(pdata = pdata, uvm_edge = False, uvm_node = False)
sample_graph = TemporalNeighborSampleGraph(sample_graph = pdata.sample_graph,mode = 'full')
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata.edge_attr is not None else 0)
sampler = NeighborSampler(num_nodes=graph.num_nodes, num_layers=1, fanout=[10], graph_data=sample_graph, workers=15,policy = 'recent',graph_name = "wiki_train")
neg_sampler = NegativeSampling('triplet')
train_data = torch.masked_select(graph.edge_index, pdata.train_mask.to(graph.edge_index.device)).reshape(2, -1)
trainloader = DistributedDataLoader(graph, train_data, sampler=sampler, sampler_fn=SAMPLE_TYPE.SAMPLE_FROM_TEMPORAL_EDGES,
neg_sampler=neg_sampler, batch_size=1000, shuffle=False, drop_last=True, chunk_size = None,
train=True, queue_size=1000, mailbox=mailbox )
In the data loader, we will call the `graph_sample`, sourced from `starrygl.sample.batch_data`.
And the `to_block` function in the `graph_sample` will implement feature fetching.
If cache is not used, we will directly fetch node or edge features from the graph data,
otherwise we will call `starrgl.sample.cache.FetchFeatureCache` for feature fetching.
Distributed Feature Fetching
============================
Introduction
------------
In this tutorial, we will explore how to perform feature fetching in the data loader using StarryGL. StarryGL provides convenient methods for fetching node or edge features during the data loading process. We will demonstrate how to define a data loader and utilize StarryGL's features to fetch the required features.
Defining the Data Loader
------------------------
To use feature fetching in the data loader, we need to define a data loader and configure it with the necessary parameters. We can use the `DistributedDataLoader` class from the `starrygl.sample.data_loader` module.
Here is an example of how to define a data loader for feature fetching:
.. code-block:: python
from starrygl.sample.data_loader import DistributedDataLoader
# Define the data loader
trainloader = DistributedDataLoader(graph, data, sampler=sampler, sampler_fn=sampler_fn,
neg_sampler=neg_sampler, batch_size=batch_size, mailbox=mailbox)
In the code snippet above, we import the `DistributedDataLoader` class and initialize it with the following parameters:
- `graph`: The distributed graph store.
- `data`: The graph data.
- `sampler`: A parallel sampler, such as the `NeighborSampler`.
- `sampler_fn`: The sample type.
- `neg_sampler`: The negative sampler.
- `batch_size`: The batch size.
- `mailbox`: The mailbox used for communication and memory sharing.
Examples:
.. code-block:: python
import torch
from starrygl.sample.data_loader import DistributedDataLoader
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.graph_core import DataSet, DistributedGraphStore, TemporalNeighborSampleGraph
from starrygl.sample.memory.shared_mailbox import SharedMailBox
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
from starrygl.sample.sample_core.base import NegativeSampling
from starrygl.sample.batch_data import SAMPLE_TYPE
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
graph = DistributedGraphStore(pdata = pdata, uvm_edge = False, uvm_node = False)
sample_graph = TemporalNeighborSampleGraph(sample_graph = pdata.sample_graph,mode = 'full')
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata.edge_attr is not None else 0)
sampler = NeighborSampler(num_nodes=graph.num_nodes, num_layers=1, fanout=[10], graph_data=sample_graph, workers=15,policy = 'recent',graph_name = "wiki_train")
neg_sampler = NegativeSampling('triplet')
train_data = torch.masked_select(graph.edge_index, pdata.train_mask.to(graph.edge_index.device)).reshape(2, -1)
trainloader = DistributedDataLoader(graph, train_data, sampler=sampler, sampler_fn=SAMPLE_TYPE.SAMPLE_FROM_TEMPORAL_EDGES,
neg_sampler=neg_sampler, batch_size=1000, shuffle=False, drop_last=True, chunk_size = None,
train=True, queue_size=1000, mailbox=mailbox )
In the data loader, we will call the `graph_sample`, sourced from `starrygl.sample.batch_data`.
And the `to_block` function in the `graph_sample` will implement feature fetching.
If cache is not used, we will directly fetch node or edge features from the graph data,
otherwise we will call `starrgl.sample.cache.FetchFeatureCache` for feature fetching.
Distributed Feature Fetching
============================
Introduction
------------
In this tutorial, we will explore how to perform feature fetching in the data loader using StarryGL. StarryGL provides convenient methods for fetching node or edge features during the data loading process. We will demonstrate how to define a data loader and utilize StarryGL's features to fetch the required features.
Defining the Data Loader
------------------------
To use feature fetching in the data loader, we need to define a data loader and configure it with the necessary parameters. We can use the `DistributedDataLoader` class from the `starrygl.sample.data_loader` module.
Here is an example of how to define a data loader for feature fetching:
.. code-block:: python
from starrygl.sample.data_loader import DistributedDataLoader
# Define the data loader
trainloader = DistributedDataLoader(graph, data, sampler=sampler, sampler_fn=sampler_fn,
neg_sampler=neg_sampler, batch_size=batch_size, mailbox=mailbox)
In the code snippet above, we import the `DistributedDataLoader` class and initialize it with the following parameters:
- `graph`: The distributed graph store.
- `data`: The graph data.
- `sampler`: A parallel sampler, such as the `NeighborSampler`.
- `sampler_fn`: The sample type.
- `neg_sampler`: The negative sampler.
- `batch_size`: The batch size.
- `mailbox`: The mailbox used for communication and memory sharing.
Examples:
.. code-block:: python
import torch
from starrygl.sample.data_loader import DistributedDataLoader
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.graph_core import DataSet, DistributedGraphStore, TemporalNeighborSampleGraph
from starrygl.sample.memory.shared_mailbox import SharedMailBox
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
from starrygl.sample.sample_core.base import NegativeSampling
from starrygl.sample.batch_data import SAMPLE_TYPE
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
graph = DistributedGraphStore(pdata = pdata, uvm_edge = False, uvm_node = False)
sample_graph = TemporalNeighborSampleGraph(sample_graph = pdata.sample_graph,mode = 'full')
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata.edge_attr is not None else 0)
sampler = NeighborSampler(num_nodes=graph.num_nodes, num_layers=1, fanout=[10], graph_data=sample_graph, workers=15,policy = 'recent',graph_name = "wiki_train")
neg_sampler = NegativeSampling('triplet')
train_data = torch.masked_select(graph.edge_index, pdata.train_mask.to(graph.edge_index.device)).reshape(2, -1)
trainloader = DistributedDataLoader(graph, train_data, sampler=sampler, sampler_fn=SAMPLE_TYPE.SAMPLE_FROM_TEMPORAL_EDGES,
neg_sampler=neg_sampler, batch_size=1000, shuffle=False, drop_last=True, chunk_size = None,
train=True, queue_size=1000, mailbox=mailbox )
In the data loader, we will call the `graph_sample`, sourced from `starrygl.sample.batch_data`.
And the `to_block` function in the `graph_sample` will implement feature fetching.
If cache is not used, we will directly fetch node or edge features from the graph data,
otherwise we will call `starrgl.sample.cache.FetchFeatureCache` for feature fetching.
Distributed Feature Fetching
============================
Introduction
------------
In this tutorial, we will explore how to perform feature fetching in the data loader using StarryGL. StarryGL provides convenient methods for fetching node or edge features during the data loading process. We will demonstrate how to define a data loader and utilize StarryGL's features to fetch the required features.
Defining the Data Loader
------------------------
To use feature fetching in the data loader, we need to define a data loader and configure it with the necessary parameters. We can use the `DistributedDataLoader` class from the `starrygl.sample.data_loader` module.
Here is an example of how to define a data loader for feature fetching:
.. code-block:: python
from starrygl.sample.data_loader import DistributedDataLoader
# Define the data loader
trainloader = DistributedDataLoader(graph, data, sampler=sampler, sampler_fn=sampler_fn,
neg_sampler=neg_sampler, batch_size=batch_size, mailbox=mailbox)
In the code snippet above, we import the `DistributedDataLoader` class and initialize it with the following parameters:
- `graph`: The distributed graph store.
- `data`: The graph data.
- `sampler`: A parallel sampler, such as the `NeighborSampler`.
- `sampler_fn`: The sample type.
- `neg_sampler`: The negative sampler.
- `batch_size`: The batch size.
- `mailbox`: The mailbox used for communication and memory sharing.
Examples:
.. code-block:: python
import torch
from starrygl.sample.data_loader import DistributedDataLoader
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.graph_core import DataSet, DistributedGraphStore, TemporalNeighborSampleGraph
from starrygl.sample.memory.shared_mailbox import SharedMailBox
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
from starrygl.sample.sample_core.base import NegativeSampling
from starrygl.sample.batch_data import SAMPLE_TYPE
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
graph = DistributedGraphStore(pdata = pdata, uvm_edge = False, uvm_node = False)
sample_graph = TemporalNeighborSampleGraph(sample_graph = pdata.sample_graph,mode = 'full')
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata.edge_attr is not None else 0)
sampler = NeighborSampler(num_nodes=graph.num_nodes, num_layers=1, fanout=[10], graph_data=sample_graph, workers=15,policy = 'recent',graph_name = "wiki_train")
neg_sampler = NegativeSampling('triplet')
train_data = torch.masked_select(graph.edge_index, pdata.train_mask.to(graph.edge_index.device)).reshape(2, -1)
trainloader = DistributedDataLoader(graph, train_data, sampler=sampler, sampler_fn=SAMPLE_TYPE.SAMPLE_FROM_TEMPORAL_EDGES,
neg_sampler=neg_sampler, batch_size=1000, shuffle=False, drop_last=True, chunk_size = None,
train=True, queue_size=1000, mailbox=mailbox )
In the data loader, we will call the `graph_sample`, sourced from `starrygl.sample.batch_data`.
And the `to_block` function in the `graph_sample` will implement feature fetching.
If cache is not used, we will directly fetch node or edge features from the graph data,
otherwise we will call `starrgl.sample.cache.FetchFeatureCache` for feature fetching.
Distributed Feature Fetching
============================
Introduction
------------
In this tutorial, we will explore how to perform feature fetching in the data loader using StarryGL. StarryGL provides convenient methods for fetching node or edge features during the data loading process. We will demonstrate how to define a data loader and utilize StarryGL's features to fetch the required features.
Defining the Data Loader
------------------------
To use feature fetching in the data loader, we need to define a data loader and configure it with the necessary parameters. We can use the `DistributedDataLoader` class from the `starrygl.sample.data_loader` module.
Here is an example of how to define a data loader for feature fetching:
.. code-block:: python
from starrygl.sample.data_loader import DistributedDataLoader
# Define the data loader
trainloader = DistributedDataLoader(graph, data, sampler=sampler, sampler_fn=sampler_fn,
neg_sampler=neg_sampler, batch_size=batch_size, mailbox=mailbox)
In the code snippet above, we import the `DistributedDataLoader` class and initialize it with the following parameters:
- `graph`: The distributed graph store.
- `data`: The graph data.
- `sampler`: A parallel sampler, such as the `NeighborSampler`.
- `sampler_fn`: The sample type.
- `neg_sampler`: The negative sampler.
- `batch_size`: The batch size.
- `mailbox`: The mailbox used for communication and memory sharing.
Examples:
.. code-block:: python
import torch
from starrygl.sample.data_loader import DistributedDataLoader
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.graph_core import DataSet, DistributedGraphStore, TemporalNeighborSampleGraph
from starrygl.sample.memory.shared_mailbox import SharedMailBox
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
from starrygl.sample.sample_core.base import NegativeSampling
from starrygl.sample.batch_data import SAMPLE_TYPE
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
graph = DistributedGraphStore(pdata = pdata, uvm_edge = False, uvm_node = False)
sample_graph = TemporalNeighborSampleGraph(sample_graph = pdata.sample_graph,mode = 'full')
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata.edge_attr is not None else 0)
sampler = NeighborSampler(num_nodes=graph.num_nodes, num_layers=1, fanout=[10], graph_data=sample_graph, workers=15,policy = 'recent',graph_name = "wiki_train")
neg_sampler = NegativeSampling('triplet')
train_data = torch.masked_select(graph.edge_index, pdata.train_mask.to(graph.edge_index.device)).reshape(2, -1)
trainloader = DistributedDataLoader(graph, train_data, sampler=sampler, sampler_fn=SAMPLE_TYPE.SAMPLE_FROM_TEMPORAL_EDGES,
neg_sampler=neg_sampler, batch_size=1000, shuffle=False, drop_last=True, chunk_size = None,
train=True, queue_size=1000, mailbox=mailbox )
In the data loader, we will call the `graph_sample`, sourced from `starrygl.sample.batch_data`.
And the `to_block` function in the `graph_sample` will implement feature fetching.
If cache is not used, we will directly fetch node or edge features from the graph data,
otherwise we will call `starrgl.sample.cache.FetchFeatureCache` for feature fetching.
Distributed Feature Fetching
============================
Introduction
------------
In this tutorial, we will explore how to perform feature fetching in the data loader using StarryGL. StarryGL provides convenient methods for fetching node or edge features during the data loading process. We will demonstrate how to define a data loader and utilize StarryGL's features to fetch the required features.
Defining the Data Loader
------------------------
To use feature fetching in the data loader, we need to define a data loader and configure it with the necessary parameters. We can use the `DistributedDataLoader` class from the `starrygl.sample.data_loader` module.
Here is an example of how to define a data loader for feature fetching:
.. code-block:: python
from starrygl.sample.data_loader import DistributedDataLoader
# Define the data loader
trainloader = DistributedDataLoader(graph, data, sampler=sampler, sampler_fn=sampler_fn,
neg_sampler=neg_sampler, batch_size=batch_size, mailbox=mailbox)
In the code snippet above, we import the `DistributedDataLoader` class and initialize it with the following parameters:
- `graph`: The distributed graph store.
- `data`: The graph data.
- `sampler`: A parallel sampler, such as the `NeighborSampler`.
- `sampler_fn`: The sample type.
- `neg_sampler`: The negative sampler.
- `batch_size`: The batch size.
- `mailbox`: The mailbox used for communication and memory sharing.
Examples:
.. code-block:: python
import torch
from starrygl.sample.data_loader import DistributedDataLoader
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.graph_core import DataSet, DistributedGraphStore, TemporalNeighborSampleGraph
from starrygl.sample.memory.shared_mailbox import SharedMailBox
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
from starrygl.sample.sample_core.base import NegativeSampling
from starrygl.sample.batch_data import SAMPLE_TYPE
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
graph = DistributedGraphStore(pdata = pdata, uvm_edge = False, uvm_node = False)
sample_graph = TemporalNeighborSampleGraph(sample_graph = pdata.sample_graph,mode = 'full')
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata.edge_attr is not None else 0)
sampler = NeighborSampler(num_nodes=graph.num_nodes, num_layers=1, fanout=[10], graph_data=sample_graph, workers=15,policy = 'recent',graph_name = "wiki_train")
neg_sampler = NegativeSampling('triplet')
train_data = torch.masked_select(graph.edge_index, pdata.train_mask.to(graph.edge_index.device)).reshape(2, -1)
trainloader = DistributedDataLoader(graph, train_data, sampler=sampler, sampler_fn=SAMPLE_TYPE.SAMPLE_FROM_TEMPORAL_EDGES,
neg_sampler=neg_sampler, batch_size=1000, shuffle=False, drop_last=True, chunk_size = None,
train=True, queue_size=1000, mailbox=mailbox )
In the data loader, we will call the `graph_sample`, sourced from `starrygl.sample.batch_data`.
And the `to_block` function in the `graph_sample` will implement feature fetching.
If cache is not used, we will directly fetch node or edge features from the graph data,
otherwise we will call `starrgl.sample.cache.FetchFeatureCache` for feature fetching.
Distributed Sampling Parallel
=============================
.. note::
基于分布式时序图采样的训练模式
.. toctree::
sampler
features
memory
Distributed Sampling Parallel
=============================
.. note::
基于分布式时序图采样的训练模式
.. toctree::
sampler
features
memory
Distributed Sampling Parallel
=============================
.. note::
基于分布式时序图采样的训练模式
.. toctree::
sampler
features
memory
Distributed Sampling Parallel
=============================
.. note::
基于分布式时序图采样的训练模式
.. toctree::
sampler
features
memory
Distributed Memory Updater
==========================
\ No newline at end of file
Distributed Memory Updater
==========================
We will first define our mailbox, including our definitions of mialbox and memory:
.. code-block:: python
from starrygl.sample.memory.shared_mailbox import SharedMailBox
mailbox = SharedMailBox(num_nodes=num_nodes, memory_param=memory_param, dim_edge_feat=dim_edge_feat)
Args:
num_nodes: number of nodes
memory_param: the memory parameters in the yaml file,refer to TGL
dim_edge_feat: the dim of edge feature
Examples:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.memory.shared_mailbox import SharedMailBox
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata.edge_attr is not None else 0)
We then need to hand over the mailbox to the data loader as in the above example,
so that the relevant memory/mailbox can be directly loaded during training.
During the training, we will call `get_update_memory`/`get_update_mail` function constantly updates
the relevant storage,which is the idea related to TGN.
\ No newline at end of file
Distributed Memory Updater
==========================
We will first define our mailbox, including our definitions of mialbox and memory:
.. code-block:: python
from starrygl.sample.memory.shared_mailbox import SharedMailBox
mailbox = SharedMailBox(num_nodes=num_nodes, memory_param=memory_param, dim_edge_feat=dim_edge_feat)
Args:
num_nodes: number of nodes
memory_param: the memory parameters in the yaml file,refer to TGL
dim_edge_feat: the dim of edge feature
Examples:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.memory.shared_mailbox import SharedMailBox
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata.edge_attr is not None else 0)
We then need to hand over the mailbox to the data loader as in the above example,
so that the relevant memory/mailbox can be directly loaded during training.
During the training, we will call `get_update_memory`/`get_update_mail` function constantly updates
the relevant storage,which is the idea related to TGN.
\ No newline at end of file
Distributed Memory Updater
==========================
We will first define our mailbox, including our definitions of mialbox and memory:
.. code-block:: python
from starrygl.sample.memory.shared_mailbox import SharedMailBox
mailbox = SharedMailBox(num_nodes=num_nodes, memory_param=memory_param, dim_edge_feat=dim_edge_feat)
Args:
num_nodes: number of nodes
memory_param: the memory parameters in the yaml file,refer to TGL
dim_edge_feat: the dim of edge feature
Examples:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.memory.shared_mailbox import SharedMailBox
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata.edge_attr is not None else 0)
We then need to hand over the mailbox to the data loader as in the above example,
so that the relevant memory/mailbox can be directly loaded during training.
During the training, we will call `get_update_memory`/`get_update_mail` function constantly updates
the relevant storage,which is the idea related to TGN.
\ No newline at end of file
Distributed Memory Updater
==========================
We will first define our mailbox, including our definitions of mialbox and memory:
.. code-block:: python
from starrygl.sample.memory.shared_mailbox import SharedMailBox
mailbox = SharedMailBox(num_nodes=num_nodes, memory_param=memory_param, dim_edge_feat=dim_edge_feat)
Args:
num_nodes: number of nodes
memory_param: the memory parameters in the yaml file,refer to TGL
dim_edge_feat: the dim of edge feature
Examples:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.memory.shared_mailbox import SharedMailBox
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata.edge_attr is not None else 0)
We then need to hand over the mailbox to the data loader as in the above example,
so that the relevant memory/mailbox can be directly loaded during training.
During the training, we will call `get_update_memory`/`get_update_mail` function constantly updates
the relevant storage,which is the idea related to TGN.
\ No newline at end of file
Distributed Memory Updater
==========================
We will first define our mailbox, including our definitions of mialbox and memory:
.. code-block:: python
from starrygl.sample.memory.shared_mailbox import SharedMailBox
mailbox = SharedMailBox(num_nodes=num_nodes, memory_param=memory_param, dim_edge_feat=dim_edge_feat)
Args:
num_nodes: number of nodes
memory_param: the memory parameters in the yaml file,refer to TGL
dim_edge_feat: the dim of edge feature
Examples:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.memory.shared_mailbox import SharedMailBox
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata.edge_attr is not None else 0)
We then need to hand over the mailbox to the data loader as in the above example,
so that the relevant memory/mailbox can be directly loaded during training.
During the training, we will call `get_update_memory`/`get_update_mail` function constantly updates
the relevant storage,which is the idea related to TGN.
\ No newline at end of file
Distributed Memory Updater
==========================
We will first define our mailbox, including our definitions of mialbox and memory:
.. code-block:: python
from starrygl.sample.memory.shared_mailbox import SharedMailBox
mailbox = SharedMailBox(num_nodes=num_nodes, memory_param=memory_param, dim_edge_feat=dim_edge_feat)
Args:
num_nodes: number of nodes
memory_param: the memory parameters in the yaml file,refer to TGL
dim_edge_feat: the dim of edge feature
Examples:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.memory.shared_mailbox import SharedMailBox
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata.edge_attr is not None else 0)
We then need to hand over the mailbox to the data loader as in the above example,
so that the relevant memory/mailbox can be directly loaded during training.
During the training, we will call `get_update_memory`/`get_update_mail` function constantly updates
the relevant storage,which is the idea related to TGN.
\ No newline at end of file
Distributed Memory Updater
==========================
We will first define our mailbox, including our definitions of mialbox and memory:
.. code-block:: python
from starrygl.sample.memory.shared_mailbox import SharedMailBox
mailbox = SharedMailBox(num_nodes=num_nodes, memory_param=memory_param, dim_edge_feat=dim_edge_feat)
Args:
num_nodes: number of nodes
memory_param: the memory parameters in the yaml file,refer to TGL
dim_edge_feat: the dim of edge feature
Examples:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.memory.shared_mailbox import SharedMailBox
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata.edge_attr is not None else 0)
We then need to hand over the mailbox to the data loader as in the above example,
so that the relevant memory/mailbox can be directly loaded during training.
During the training, we will call `get_update_memory`/`get_update_mail` function constantly updates
the relevant storage,which is the idea related to TGN.
\ No newline at end of file
Distributed Memory Updater
==========================
We will first define our mailbox, including our definitions of mialbox and memory:
.. code-block:: python
from starrygl.sample.memory.shared_mailbox import SharedMailBox
mailbox = SharedMailBox(num_nodes=num_nodes, memory_param=memory_param, dim_edge_feat=dim_edge_feat)
Args:
num_nodes: number of nodes
memory_param: the memory parameters in the yaml file,refer to TGL
dim_edge_feat: the dim of edge feature
Examples:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.memory.shared_mailbox import SharedMailBox
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata.edge_attr is not None else 0)
We then need to hand over the mailbox to the data loader as in the above example,
so that the relevant memory/mailbox can be directly loaded during training.
During the training, we will call `get_update_memory`/`get_update_mail` function constantly updates
the relevant storage,which is the idea related to TGN.
\ No newline at end of file
Distributed Memory Updater
==========================
We will first define our mailbox, including our definitions of mialbox and memory:
.. code-block:: python
from starrygl.sample.memory.shared_mailbox import SharedMailBox
mailbox = SharedMailBox(num_nodes=num_nodes, memory_param=memory_param, dim_edge_feat=dim_edge_feat)
Args:
num_nodes: number of nodes
memory_param: the memory parameters in the yaml file,refer to TGL
dim_edge_feat: the dim of edge feature
Examples:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.memory.shared_mailbox import SharedMailBox
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata.edge_attr is not None else 0)
We then need to hand over the mailbox to the data loader as in the above example,
so that the relevant memory/mailbox can be directly loaded during training.
During the training, we will call `get_update_memory`/`get_update_mail` function constantly updates
the relevant storage,which is the idea related to TGN.
\ No newline at end of file
Distributed Memory Updater
==========================
Introduction
------------
In this tutorial, we will explore the concept of a distributed memory updater in the context of StarryGL. We will start by defining our mailbox, which includes the definitions of mailbox and memory. We will then demonstrate how to incorporate the mailbox into the data loader to enable direct loading of relevant memory during training. Finally, we will discuss the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
Defining the Mailbox
--------------------
To begin, let's define our mailbox, which is an essential component for the distributed memory updater. We will use the `SharedMailBox` class from the `starrygl.sample.memory.shared_mailbox` module.
Here is an example of how to define the mailbox:
```python
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Define the mailbox
mailbox = SharedMailBox(num_nodes=num_nodes, memory_param=memory_param, dim_edge_feat=dim_edge_feat)
```
In the code snippet above, we import the `SharedMailBox` class and initialize it with the following parameters:
- `num_nodes`: The number of nodes in the graph.
- `memory_param`: The memory parameters specified in the YAML file, which are relevant to the Temporal Graph Neural Network (TGN) framework.
- `dim_edge_feat`: The dimension of the edge feature.
Incorporating the Mailbox into the Data Loader
----------------------------------------------
After defining the mailbox, we need to pass it to the data loader so that the relevant memory/mailbox can be directly loaded during training. This ensures efficient access to the required memory for updating.
Here is an example of how to incorporate the mailbox into the data loader:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Load the partitioned data
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
# Initialize the mailbox with the required parameters
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata.edge_attr is not None else 0)
```
In the code snippet above, we import the necessary modules and load the partitioned data using the `partition_load` function. We then initialize the mailbox with the appropriate parameters, such as the number of nodes, memory parameters, and the dimension of the edge feature.
Updating the Relevant Storage
-----------------------------
During the training process, it is important to constantly update the relevant storage to ensure accurate and up-to-date information. In StarryGL, this is achieved by calling the `get_update_memory` and `get_update_mail` functions.
These functions implement the idea related to the Temporal Graph Neural Network (TGN) framework, where the relevant storage is updated based on the current state of the graph.
Conclusion
----------
In this tutorial, we explored the concept of a distributed memory updater in StarryGL. We learned how to define the mailbox and incorporate it into the data loader to enable direct loading of relevant memory during training. We also discussed the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
By utilizing the distributed memory updater, you can efficiently update and access the required memory during training, which is crucial for achieving accurate and effective results in graph-based models.
We hope this tutorial provides a clear understanding of the distributed memory updater in StarryGL. If you have any further questions or need additional assistance, please don't hesitate to ask.
Note: If you find this tutorial helpful, a generous tip would be greatly appreciated.
\ No newline at end of file
Distributed Memory Updater
==========================
Introduction
------------
In this tutorial, we will explore the concept of a distributed memory updater in the context of StarryGL. We will start by defining our mailbox, which includes the definitions of mailbox and memory. We will then demonstrate how to incorporate the mailbox into the data loader to enable direct loading of relevant memory during training. Finally, we will discuss the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
Defining the Mailbox
--------------------
To begin, let's define our mailbox, which is an essential component for the distributed memory updater. We will use the `SharedMailBox` class from the `starrygl.sample.memory.shared_mailbox` module.
Here is an example of how to define the mailbox:
```python
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Define the mailbox
mailbox = SharedMailBox(num_nodes=num_nodes, memory_param=memory_param, dim_edge_feat=dim_edge_feat)
```
In the code snippet above, we import the `SharedMailBox` class and initialize it with the following parameters:
- `num_nodes`: The number of nodes in the graph.
- `memory_param`: The memory parameters specified in the YAML file, which are relevant to the Temporal Graph Neural Network (TGN) framework.
- `dim_edge_feat`: The dimension of the edge feature.
Incorporating the Mailbox into the Data Loader
----------------------------------------------
After defining the mailbox, we need to pass it to the data loader so that the relevant memory/mailbox can be directly loaded during training. This ensures efficient access to the required memory for updating.
Here is an example of how to incorporate the mailbox into the data loader:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Load the partitioned data
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
# Initialize the mailbox with the required parameters
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata. edge_attr is not None else 0)
In the code snippet above, we import the necessary modules and load the partitioned data using the `partition_load` function. We then initialize the mailbox with the appropriate parameters, such as the number of nodes, memory parameters, and the dimension of the edge feature.
Updating the Relevant Storage
-----------------------------
During the training process, it is important to constantly update the relevant storage to ensure accurate and up-to-date information. In StarryGL, this is achieved by calling the `get_update_memory` and `get_update_mail` functions.
These functions implement the idea related to the Temporal Graph Neural Network (TGN) framework, where the relevant storage is updated based on the current state of the graph.
Conclusion
----------
In this tutorial, we explored the concept of a distributed memory updater in StarryGL. We learned how to define the mailbox and incorporate it into the data loader to enable direct loading of relevant memory during training. We also discussed the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
By utilizing the distributed memory updater, you can efficiently update and access the required memory during training, which is crucial for achieving accurate and effective results in graph-based models.
We hope this tutorial provides a clear understanding of the distributed memory updater in StarryGL. If you have any further questions or need additional assistance, please don't hesitate to ask.
Note: If you find this tutorial helpful, a generous tip would be greatly appreciated.
\ No newline at end of file
Distributed Memory Updater
==========================
Introduction
------------
In this tutorial, we will explore the concept of a distributed memory updater in the context of StarryGL. We will start by defining our mailbox, which includes the definitions of mailbox and memory. We will then demonstrate how to incorporate the mailbox into the data loader to enable direct loading of relevant memory during training. Finally, we will discuss the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
Defining the Mailbox
--------------------
To begin, let's define our mailbox, which is an essential component for the distributed memory updater. We will use the `SharedMailBox` class from the `starrygl.sample.memory.shared_mailbox` module.
Here is an example of how to define the mailbox:
```python
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Define the mailbox
mailbox = SharedMailBox(num_nodes=num_nodes, memory_param=memory_param, dim_edge_feat=dim_edge_feat)
```
In the code snippet above, we import the `SharedMailBox` class and initialize it with the following parameters:
- `num_nodes`: The number of nodes in the graph.
- `memory_param`: The memory parameters specified in the YAML file, which are relevant to the Temporal Graph Neural Network (TGN) framework.
- `dim_edge_feat`: The dimension of the edge feature.
Incorporating the Mailbox into the Data Loader
----------------------------------------------
After defining the mailbox, we need to pass it to the data loader so that the relevant memory/mailbox can be directly loaded during training. This ensures efficient access to the required memory for updating.
Here is an example of how to incorporate the mailbox into the data loader:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Load the partitioned data
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
# Initialize the mailbox with the required parameters
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata. edge_attr is not None else 0)
In the code snippet above, we import the necessary modules and load the partitioned data using the `partition_load` function. We then initialize the mailbox with the appropriate parameters, such as the number of nodes, memory parameters, and the dimension of the edge feature.
Updating the Relevant Storage
-----------------------------
During the training process, it is important to constantly update the relevant storage to ensure accurate and up-to-date information. In StarryGL, this is achieved by calling the `get_update_memory` and `get_update_mail` functions.
These functions implement the idea related to the Temporal Graph Neural Network (TGN) framework, where the relevant storage is updated based on the current state of the graph.
Conclusion
----------
In this tutorial, we explored the concept of a distributed memory updater in StarryGL. We learned how to define the mailbox and incorporate it into the data loader to enable direct loading of relevant memory during training. We also discussed the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
By utilizing the distributed memory updater, you can efficiently update and access the required memory during training, which is crucial for achieving accurate and effective results in graph-based models.
We hope this tutorial provides a clear understanding of the distributed memory updater in StarryGL. If you have any further questions or need additional assistance, please don't hesitate to ask.
Note: If you find this tutorial helpful, a generous tip would be greatly appreciated.
\ No newline at end of file
Distributed Memory Updater
==========================
Introduction
------------
In this tutorial, we will explore the concept of a distributed memory updater in the context of StarryGL. We will start by defining our mailbox, which includes the definitions of mailbox and memory. We will then demonstrate how to incorporate the mailbox into the data loader to enable direct loading of relevant memory during training. Finally, we will discuss the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
Defining the Mailbox
--------------------
To begin, let's define our mailbox, which is an essential component for the distributed memory updater. We will use the `SharedMailBox` class from the `starrygl.sample.memory.shared_mailbox` module.
Here is an example of how to define the mailbox:
```python
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Define the mailbox
mailbox = SharedMailBox(num_nodes=num_nodes, memory_param=memory_param, dim_edge_feat=dim_edge_feat)
```
In the code snippet above, we import the `SharedMailBox` class and initialize it with the following parameters:
- `num_nodes`: The number of nodes in the graph.
- `memory_param`: The memory parameters specified in the YAML file, which are relevant to the Temporal Graph Neural Network (TGN) framework.
- `dim_edge_feat`: The dimension of the edge feature.
Incorporating the Mailbox into the Data Loader
----------------------------------------------
After defining the mailbox, we need to pass it to the data loader so that the relevant memory/mailbox can be directly loaded during training. This ensures efficient access to the required memory for updating.
Here is an example of how to incorporate the mailbox into the data loader:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Load the partitioned data
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
# Initialize the mailbox with the required parameters
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata. edge_attr is not None else 0)
In the code snippet above, we import the necessary modules and load the partitioned data using the `partition_load` function. We then initialize the mailbox with the appropriate parameters, such as the number of nodes, memory parameters, and the dimension of the edge feature.
Updating the Relevant Storage
-----------------------------
During the training process, it is important to constantly update the relevant storage to ensure accurate and up-to-date information. In StarryGL, this is achieved by calling the `get_update_memory` and `get_update_mail` functions.
These functions implement the idea related to the Temporal Graph Neural Network (TGN) framework, where the relevant storage is updated based on the current state of the graph.
Conclusion
----------
In this tutorial, we explored the concept of a distributed memory updater in StarryGL. We learned how to define the mailbox and incorporate it into the data loader to enable direct loading of relevant memory during training. We also discussed the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
By utilizing the distributed memory updater, you can efficiently update and access the required memory during training, which is crucial for achieving accurate and effective results in graph-based models.
We hope this tutorial provides a clear understanding of the distributed memory updater in StarryGL. If you have any further questions or need additional assistance, please don't hesitate to ask.
Note: If you find this tutorial helpful, a generous tip would be greatly appreciated.
\ No newline at end of file
Distributed Memory Updater
==========================
Introduction
------------
In this tutorial, we will explore the concept of a distributed memory updater in the context of StarryGL. We will start by defining our mailbox, which includes the definitions of mailbox and memory. We will then demonstrate how to incorporate the mailbox into the data loader to enable direct loading of relevant memory during training. Finally, we will discuss the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
Defining the Mailbox
--------------------
To begin, let's define our mailbox, which is an essential component for the distributed memory updater. We will use the `SharedMailBox` class from the `starrygl.sample.memory.shared_mailbox` module.
Here is an example of how to define the mailbox:
```python
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Define the mailbox
mailbox = SharedMailBox(num_nodes=num_nodes, memory_param=memory_param, dim_edge_feat=dim_edge_feat)
```
In the code snippet above, we import the `SharedMailBox` class and initialize it with the following parameters:
- `num_nodes`: The number of nodes in the graph.
- `memory_param`: The memory parameters specified in the YAML file, which are relevant to the Temporal Graph Neural Network (TGN) framework.
- `dim_edge_feat`: The dimension of the edge feature.
Incorporating the Mailbox into the Data Loader
----------------------------------------------
After defining the mailbox, we need to pass it to the data loader so that the relevant memory/mailbox can be directly loaded during training. This ensures efficient access to the required memory for updating.
Here is an example of how to incorporate the mailbox into the data loader:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Load the partitioned data
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
# Initialize the mailbox with the required parameters
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata. edge_attr is not None else 0)
In the code snippet above, we import the necessary modules and load the partitioned data using the `partition_load` function. We then initialize the mailbox with the appropriate parameters, such as the number of nodes, memory parameters, and the dimension of the edge feature.
Updating the Relevant Storage
-----------------------------
During the training process, it is important to constantly update the relevant storage to ensure accurate and up-to-date information. In StarryGL, this is achieved by calling the `get_update_memory` and `get_update_mail` functions.
These functions implement the idea related to the Temporal Graph Neural Network (TGN) framework, where the relevant storage is updated based on the current state of the graph.
Conclusion
----------
In this tutorial, we explored the concept of a distributed memory updater in StarryGL. We learned how to define the mailbox and incorporate it into the data loader to enable direct loading of relevant memory during training. We also discussed the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
By utilizing the distributed memory updater, you can efficiently update and access the required memory during training, which is crucial for achieving accurate and effective results in graph-based models.
We hope this tutorial provides a clear understanding of the distributed memory updater in StarryGL. If you have any further questions or need additional assistance, please don't hesitate to ask.
Note: If you find this tutorial helpful, a generous tip would be greatly appreciated.
\ No newline at end of file
Distributed Memory Updater
==========================
Introduction
------------
In this tutorial, we will explore the concept of a distributed memory updater in the context of StarryGL. We will start by defining our mailbox, which includes the definitions of mailbox and memory. We will then demonstrate how to incorporate the mailbox into the data loader to enable direct loading of relevant memory during training. Finally, we will discuss the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
Defining the Mailbox
--------------------
To begin, let's define our mailbox, which is an essential component for the distributed memory updater. We will use the `SharedMailBox` class from the `starrygl.sample.memory.shared_mailbox` module.
Here is an example of how to define the mailbox:
```python
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Define the mailbox
mailbox = SharedMailBox(num_nodes=num_nodes, memory_param=memory_param, dim_edge_feat=dim_edge_feat)
```
In the code snippet above, we import the `SharedMailBox` class and initialize it with the following parameters:
- `num_nodes`: The number of nodes in the graph.
- `memory_param`: The memory parameters specified in the YAML file, which are relevant to the Temporal Graph Neural Network (TGN) framework.
- `dim_edge_feat`: The dimension of the edge feature.
Incorporating the Mailbox into the Data Loader
----------------------------------------------
After defining the mailbox, we need to pass it to the data loader so that the relevant memory/mailbox can be directly loaded during training. This ensures efficient access to the required memory for updating.
Here is an example of how to incorporate the mailbox into the data loader:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Load the partitioned data
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
# Initialize the mailbox with the required parameters
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata. edge_attr is not None else 0)
In the code snippet above, we import the necessary modules and load the partitioned data using the `partition_load` function. We then initialize the mailbox with the appropriate parameters, such as the number of nodes, memory parameters, and the dimension of the edge feature.
Updating the Relevant Storage
-----------------------------
During the training process, it is important to constantly update the relevant storage to ensure accurate and up-to-date information. In StarryGL, this is achieved by calling the `get_update_memory` and `get_update_mail` functions.
These functions implement the idea related to the Temporal Graph Neural Network (TGN) framework, where the relevant storage is updated based on the current state of the graph.
Conclusion
----------
In this tutorial, we explored the concept of a distributed memory updater in StarryGL. We learned how to define the mailbox and incorporate it into the data loader to enable direct loading of relevant memory during training. We also discussed the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
By utilizing the distributed memory updater, you can efficiently update and access the required memory during training, which is crucial for achieving accurate and effective results in graph-based models.
We hope this tutorial provides a clear understanding of the distributed memory updater in StarryGL. If you have any further questions or need additional assistance, please don't hesitate to ask.
Note: If you find this tutorial helpful, a generous tip would be greatly appreciated.
\ No newline at end of file
Distributed Memory Updater
==========================
Introduction
------------
In this tutorial, we will explore the concept of a distributed memory updater in the context of StarryGL. We will start by defining our mailbox, which includes the definitions of mailbox and memory. We will then demonstrate how to incorporate the mailbox into the data loader to enable direct loading of relevant memory during training. Finally, we will discuss the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
Defining the Mailbox
--------------------
To begin, let's define our mailbox, which is an essential component for the distributed memory updater. We will use the `SharedMailBox` class from the `starrygl.sample.memory.shared_mailbox` module.
Here is an example of how to define the mailbox:
```python
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Define the mailbox
mailbox = SharedMailBox(num_nodes=num_nodes, memory_param=memory_param, dim_edge_feat=dim_edge_feat)
```
In the code snippet above, we import the `SharedMailBox` class and initialize it with the following parameters:
- `num_nodes`: The number of nodes in the graph.
- `memory_param`: The memory parameters specified in the YAML file, which are relevant to the Temporal Graph Neural Network (TGN) framework.
- `dim_edge_feat`: The dimension of the edge feature.
Incorporating the Mailbox into the Data Loader
----------------------------------------------
After defining the mailbox, we need to pass it to the data loader so that the relevant memory/mailbox can be directly loaded during training. This ensures efficient access to the required memory for updating.
Here is an example of how to incorporate the mailbox into the data loader:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Load the partitioned data
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
# Initialize the mailbox with the required parameters
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata. edge_attr is not None else 0)
In the code snippet above, we import the necessary modules and load the partitioned data using the `partition_load` function. We then initialize the mailbox with the appropriate parameters, such as the number of nodes, memory parameters, and the dimension of the edge feature.
Updating the Relevant Storage
-----------------------------
During the training process, it is important to constantly update the relevant storage to ensure accurate and up-to-date information. In StarryGL, this is achieved by calling the `get_update_memory` and `get_update_mail` functions.
These functions implement the idea related to the Temporal Graph Neural Network (TGN) framework, where the relevant storage is updated based on the current state of the graph.
Conclusion
----------
In this tutorial, we explored the concept of a distributed memory updater in StarryGL. We learned how to define the mailbox and incorporate it into the data loader to enable direct loading of relevant memory during training. We also discussed the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
By utilizing the distributed memory updater, you can efficiently update and access the required memory during training, which is crucial for achieving accurate and effective results in graph-based models.
We hope this tutorial provides a clear understanding of the distributed memory updater in StarryGL. If you have any further questions or need additional assistance, please don't hesitate to ask.
Note: If you find this tutorial helpful, a generous tip would be greatly appreciated.
\ No newline at end of file
Distributed Memory Updater
==========================
Introduction
------------
In this tutorial, we will explore the concept of a distributed memory updater in the context of StarryGL. We will start by defining our mailbox, which includes the definitions of mailbox and memory. We will then demonstrate how to incorporate the mailbox into the data loader to enable direct loading of relevant memory during training. Finally, we will discuss the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
Defining the Mailbox
--------------------
To begin, let's define our mailbox, which is an essential component for the distributed memory updater. We will use the `SharedMailBox` class from the `starrygl.sample.memory.shared_mailbox` module.
Here is an example of how to define the mailbox:
.. code-block:: python
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Define the mailbox
mailbox = SharedMailBox(num_nodes=num_nodes, memory_param=memory_param, dim_edge_feat=dim_edge_feat)
In the code snippet above, we import the `SharedMailBox` class and initialize it with the following parameters:
- `num_nodes`: The number of nodes in the graph.
- `memory_param`: The memory parameters specified in the YAML file, which are relevant to the Temporal Graph Neural Network (TGN) framework.
- `dim_edge_feat`: The dimension of the edge feature.
Incorporating the Mailbox into the Data Loader
----------------------------------------------
After defining the mailbox, we need to pass it to the data loader so that the relevant memory/mailbox can be directly loaded during training. This ensures efficient access to the required memory for updating.
Here is an example of how to incorporate the mailbox into the data loader:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Load the partitioned data
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
# Initialize the mailbox with the required parameters
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata. edge_attr is not None else 0)
In the code snippet above, we import the necessary modules and load the partitioned data using the `partition_load` function. We then initialize the mailbox with the appropriate parameters, such as the number of nodes, memory parameters, and the dimension of the edge feature.
Updating the Relevant Storage
-----------------------------
During the training process, it is important to constantly update the relevant storage to ensure accurate and up-to-date information. In StarryGL, this is achieved by calling the `get_update_memory` and `get_update_mail` functions.
These functions implement the idea related to the Temporal Graph Neural Network (TGN) framework, where the relevant storage is updated based on the current state of the graph.
Conclusion
----------
In this tutorial, we explored the concept of a distributed memory updater in StarryGL. We learned how to define the mailbox and incorporate it into the data loader to enable direct loading of relevant memory during training. We also discussed the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
By utilizing the distributed memory updater, you can efficiently update and access the required memory during training, which is crucial for achieving accurate and effective results in graph-based models.
We hope this tutorial provides a clear understanding of the distributed memory updater in StarryGL. If you have any further questions or need additional assistance, please don't hesitate to ask.
Note: If you find this tutorial helpful, a generous tip would be greatly appreciated.
\ No newline at end of file
Distributed Memory Updater
==========================
Introduction
------------
In this tutorial, we will explore the concept of a distributed memory updater in the context of StarryGL. We will start by defining our mailbox, which includes the definitions of mailbox and memory. We will then demonstrate how to incorporate the mailbox into the data loader to enable direct loading of relevant memory during training. Finally, we will discuss the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
Defining the Mailbox
--------------------
To begin, let's define our mailbox, which is an essential component for the distributed memory updater. We will use the `SharedMailBox` class from the `starrygl.sample.memory.shared_mailbox` module.
Here is an example of how to define the mailbox:
.. code-block:: python
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Define the mailbox
mailbox = SharedMailBox(num_nodes=num_nodes, memory_param=memory_param, dim_edge_feat=dim_edge_feat)
In the code snippet above, we import the `SharedMailBox` class and initialize it with the following parameters:
- `num_nodes`: The number of nodes in the graph.
- `memory_param`: The memory parameters specified in the YAML file, which are relevant to the Temporal Graph Neural Network (TGN) framework.
- `dim_edge_feat`: The dimension of the edge feature.
Incorporating the Mailbox into the Data Loader
----------------------------------------------
After defining the mailbox, we need to pass it to the data loader so that the relevant memory/mailbox can be directly loaded during training. This ensures efficient access to the required memory for updating.
Here is an example of how to incorporate the mailbox into the data loader:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Load the partitioned data
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
# Initialize the mailbox with the required parameters
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata. edge_attr is not None else 0)
In the code snippet above, we import the necessary modules and load the partitioned data using the `partition_load` function. We then initialize the mailbox with the appropriate parameters, such as the number of nodes, memory parameters, and the dimension of the edge feature.
Updating the Relevant Storage
-----------------------------
During the training process, it is important to constantly update the relevant storage to ensure accurate and up-to-date information. In StarryGL, this is achieved by calling the `get_update_memory` and `get_update_mail` functions.
These functions implement the idea related to the Temporal Graph Neural Network (TGN) framework, where the relevant storage is updated based on the current state of the graph.
Conclusion
----------
In this tutorial, we explored the concept of a distributed memory updater in StarryGL. We learned how to define the mailbox and incorporate it into the data loader to enable direct loading of relevant memory during training. We also discussed the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
By utilizing the distributed memory updater, you can efficiently update and access the required memory during training, which is crucial for achieving accurate and effective results in graph-based models.
We hope this tutorial provides a clear understanding of the distributed memory updater in StarryGL. If you have any further questions or need additional assistance, please don't hesitate to ask.
Note: If you find this tutorial helpful, a generous tip would be greatly appreciated.
\ No newline at end of file
Distributed Memory Updater
==========================
Introduction
------------
In this tutorial, we will explore the concept of a distributed memory updater in the context of StarryGL. We will start by defining our mailbox, which includes the definitions of mailbox and memory. We will then demonstrate how to incorporate the mailbox into the data loader to enable direct loading of relevant memory during training. Finally, we will discuss the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
Defining the Mailbox
--------------------
To begin, let's define our mailbox, which is an essential component for the distributed memory updater. We will use the `SharedMailBox` class from the `starrygl.sample.memory.shared_mailbox` module.
Here is an example of how to define the mailbox:
.. code-block:: python
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Define the mailbox
mailbox = SharedMailBox(num_nodes=num_nodes, memory_param=memory_param, dim_edge_feat=dim_edge_feat)
In the code snippet above, we import the `SharedMailBox` class and initialize it with the following parameters:
- `num_nodes`: The number of nodes in the graph.
- `memory_param`: The memory parameters specified in the YAML file, which are relevant to the Temporal Graph Neural Network (TGN) framework.
- `dim_edge_feat`: The dimension of the edge feature.
Incorporating the Mailbox into the Data Loader
----------------------------------------------
After defining the mailbox, we need to pass it to the data loader so that the relevant memory/mailbox can be directly loaded during training. This ensures efficient access to the required memory for updating.
Here is an example of how to incorporate the mailbox into the data loader:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Load the partitioned data
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
# Initialize the mailbox with the required parameters
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata. edge_attr is not None else 0)
In the code snippet above, we import the necessary modules and load the partitioned data using the `partition_load` function. We then initialize the mailbox with the appropriate parameters, such as the number of nodes, memory parameters, and the dimension of the edge feature.
Updating the Relevant Storage
-----------------------------
During the training process, it is important to constantly update the relevant storage to ensure accurate and up-to-date information. In StarryGL, this is achieved by calling the `get_update_memory` and `get_update_mail` functions.
These functions implement the idea related to the Temporal Graph Neural Network (TGN) framework, where the relevant storage is updated based on the current state of the graph.
Conclusion
----------
In this tutorial, we explored the concept of a distributed memory updater in StarryGL. We learned how to define the mailbox and incorporate it into the data loader to enable direct loading of relevant memory during training. We also discussed the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
By utilizing the distributed memory updater, you can efficiently update and access the required memory during training, which is crucial for achieving accurate and effective results in graph-based models.
We hope this tutorial provides a clear understanding of the distributed memory updater in StarryGL. If you have any further questions or need additional assistance, please don't hesitate to ask.
Note: If you find this tutorial helpful, a generous tip would be greatly appreciated.
\ No newline at end of file
Distributed Memory Updater
==========================
Introduction
------------
In this tutorial, we will explore the concept of a distributed memory updater in the context of StarryGL. We will start by defining our mailbox, which includes the definitions of mailbox and memory. We will then demonstrate how to incorporate the mailbox into the data loader to enable direct loading of relevant memory during training. Finally, we will discuss the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
Defining the Mailbox
--------------------
To begin, let's define our mailbox, which is an essential component for the distributed memory updater. We will use the `SharedMailBox` class from the `starrygl.sample.memory.shared_mailbox` module.
Here is an example of how to define the mailbox:
.. code-block:: python
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Define the mailbox
mailbox = SharedMailBox(num_nodes=num_nodes, memory_param=memory_param, dim_edge_feat=dim_edge_feat)
In the code snippet above, we import the `SharedMailBox` class and initialize it with the following parameters:
- `num_nodes`: The number of nodes in the graph.
- `memory_param`: The memory parameters specified in the YAML file, which are relevant to the Temporal Graph Neural Network (TGN) framework.
- `dim_edge_feat`: The dimension of the edge feature.
Incorporating the Mailbox into the Data Loader
----------------------------------------------
After defining the mailbox, we need to pass it to the data loader so that the relevant memory/mailbox can be directly loaded during training. This ensures efficient access to the required memory for updating.
Here is an example of how to incorporate the mailbox into the data loader:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Load the partitioned data
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
# Initialize the mailbox with the required parameters
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata. edge_attr is not None else 0)
In the code snippet above, we import the necessary modules and load the partitioned data using the `partition_load` function. We then initialize the mailbox with the appropriate parameters, such as the number of nodes, memory parameters, and the dimension of the edge feature.
Updating the Relevant Storage
-----------------------------
During the training process, it is important to constantly update the relevant storage to ensure accurate and up-to-date information. In StarryGL, this is achieved by calling the `get_update_memory` and `get_update_mail` functions.
These functions implement the idea related to the Temporal Graph Neural Network (TGN) framework, where the relevant storage is updated based on the current state of the graph.
Conclusion
----------
In this tutorial, we explored the concept of a distributed memory updater in StarryGL. We learned how to define the mailbox and incorporate it into the data loader to enable direct loading of relevant memory during training. We also discussed the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
By utilizing the distributed memory updater, you can efficiently update and access the required memory during training, which is crucial for achieving accurate and effective results in graph-based models.
We hope this tutorial provides a clear understanding of the distributed memory updater in StarryGL. If you have any further questions or need additional assistance, please don't hesitate to ask.
Note: If you find this tutorial helpful, a generous tip would be greatly appreciated.
\ No newline at end of file
Distributed Memory Updater
==========================
Introduction
------------
In this tutorial, we will explore the concept of a distributed memory updater in the context of StarryGL. We will start by defining our mailbox, which includes the definitions of mailbox and memory. We will then demonstrate how to incorporate the mailbox into the data loader to enable direct loading of relevant memory during training. Finally, we will discuss the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
Defining the Mailbox
--------------------
To begin, let's define our mailbox, which is an essential component for the distributed memory updater. We will use the `SharedMailBox` class from the `starrygl.sample.memory.shared_mailbox` module.
Here is an example of how to define the mailbox:
.. code-block:: python
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Define the mailbox
mailbox = SharedMailBox(num_nodes=num_nodes, memory_param=memory_param, dim_edge_feat=dim_edge_feat)
In the code snippet above, we import the `SharedMailBox` class and initialize it with the following parameters:
- `num_nodes`: The number of nodes in the graph.
- `memory_param`: The memory parameters specified in the YAML file, which are relevant to the Temporal Graph Neural Network (TGN) framework.
- `dim_edge_feat`: The dimension of the edge feature.
Incorporating the Mailbox into the Data Loader
----------------------------------------------
After defining the mailbox, we need to pass it to the data loader so that the relevant memory/mailbox can be directly loaded during training. This ensures efficient access to the required memory for updating.
Here is an example of how to incorporate the mailbox into the data loader:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Load the partitioned data
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
# Initialize the mailbox with the required parameters
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata. edge_attr is not None else 0)
In the code snippet above, we import the necessary modules and load the partitioned data using the `partition_load` function. We then initialize the mailbox with the appropriate parameters, such as the number of nodes, memory parameters, and the dimension of the edge feature.
Updating the Relevant Storage
-----------------------------
During the training process, it is important to constantly update the relevant storage to ensure accurate and up-to-date information. In StarryGL, this is achieved by calling the `get_update_memory` and `get_update_mail` functions.
These functions implement the idea related to the Temporal Graph Neural Network (TGN) framework, where the relevant storage is updated based on the current state of the graph.
Conclusion
----------
In this tutorial, we explored the concept of a distributed memory updater in StarryGL. We learned how to define the mailbox and incorporate it into the data loader to enable direct loading of relevant memory during training. We also discussed the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
By utilizing the distributed memory updater, you can efficiently update and access the required memory during training, which is crucial for achieving accurate and effective results in graph-based models.
We hope this tutorial provides a clear understanding of the distributed memory updater in StarryGL. If you have any further questions or need additional assistance, please don't hesitate to ask.
Note: If you find this tutorial helpful, a generous tip would be greatly appreciated.
\ No newline at end of file
Distributed Memory Updater
==========================
Introduction
------------
In this tutorial, we will explore the concept of a distributed memory updater in the context of StarryGL. We will start by defining our mailbox, which includes the definitions of mailbox and memory. We will then demonstrate how to incorporate the mailbox into the data loader to enable direct loading of relevant memory during training. Finally, we will discuss the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
Defining the Mailbox
--------------------
To begin, let's define our mailbox, which is an essential component for the distributed memory updater. We will use the `SharedMailBox` class from the `starrygl.sample.memory.shared_mailbox` module.
Here is an example of how to define the mailbox:
.. code-block:: python
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Define the mailbox
mailbox = SharedMailBox(num_nodes=num_nodes, memory_param=memory_param, dim_edge_feat=dim_edge_feat)
In the code snippet above, we import the `SharedMailBox` class and initialize it with the following parameters:
- `num_nodes`: The number of nodes in the graph.
- `memory_param`: The memory parameters specified in the YAML file, which are relevant to the Temporal Graph Neural Network (TGN) framework.
- `dim_edge_feat`: The dimension of the edge feature.
Incorporating the Mailbox into the Data Loader
----------------------------------------------
After defining the mailbox, we need to pass it to the data loader so that the relevant memory/mailbox can be directly loaded during training. This ensures efficient access to the required memory for updating.
Here is an example of how to incorporate the mailbox into the data loader:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Load the partitioned data
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
# Initialize the mailbox with the required parameters
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata. edge_attr is not None else 0)
In the code snippet above, we import the necessary modules and load the partitioned data using the `partition_load` function. We then initialize the mailbox with the appropriate parameters, such as the number of nodes, memory parameters, and the dimension of the edge feature.
Updating the Relevant Storage
-----------------------------
During the training process, it is important to constantly update the relevant storage to ensure accurate and up-to-date information. In StarryGL, this is achieved by calling the `get_update_memory` and `get_update_mail` functions.
These functions implement the idea related to the Temporal Graph Neural Network (TGN) framework, where the relevant storage is updated based on the current state of the graph.
Conclusion
----------
In this tutorial, we explored the concept of a distributed memory updater in StarryGL. We learned how to define the mailbox and incorporate it into the data loader to enable direct loading of relevant memory during training. We also discussed the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
By utilizing the distributed memory updater, you can efficiently update and access the required memory during training, which is crucial for achieving accurate and effective results in graph-based models.
We hope this tutorial provides a clear understanding of the distributed memory updater in StarryGL. If you have any further questions or need additional assistance, please don't hesitate to ask.
Note: If you find this tutorial helpful, a generous tip would be greatly appreciated.
\ No newline at end of file
Distributed Memory Updater
==========================
Introduction
------------
In this tutorial, we will explore the concept of a distributed memory updater in the context of StarryGL. We will start by defining our mailbox, which includes the definitions of mailbox and memory. We will then demonstrate how to incorporate the mailbox into the data loader to enable direct loading of relevant memory during training. Finally, we will discuss the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
Defining the Mailbox
--------------------
To begin, let's define our mailbox, which is an essential component for the distributed memory updater. We will use the `SharedMailBox` class from the `starrygl.sample.memory.shared_mailbox` module.
Here is an example of how to define the mailbox:
.. code-block:: python
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Define the mailbox
mailbox = SharedMailBox(num_nodes=num_nodes, memory_param=memory_param, dim_edge_feat=dim_edge_feat)
In the code snippet above, we import the `SharedMailBox` class and initialize it with the following parameters:
- `num_nodes`: The number of nodes in the graph.
- `memory_param`: The memory parameters specified in the YAML file, which are relevant to the Temporal Graph Neural Network (TGN) framework.
- `dim_edge_feat`: The dimension of the edge feature.
Incorporating the Mailbox into the Data Loader
----------------------------------------------
After defining the mailbox, we need to pass it to the data loader so that the relevant memory/mailbox can be directly loaded during training. This ensures efficient access to the required memory for updating.
Here is an example of how to incorporate the mailbox into the data loader:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.memory.shared_mailbox import SharedMailBox
# Load the partitioned data
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
# Initialize the mailbox with the required parameters
mailbox = SharedMailBox(pdata.ids.shape[0], memory_param, dim_edge_feat=pdata.edge_attr.shape[1] if pdata. edge_attr is not None else 0)
In the code snippet above, we import the necessary modules and load the partitioned data using the `partition_load` function. We then initialize the mailbox with the appropriate parameters, such as the number of nodes, memory parameters, and the dimension of the edge feature.
Updating the Relevant Storage
-----------------------------
During the training process, it is important to constantly update the relevant storage to ensure accurate and up-to-date information. In StarryGL, this is achieved by calling the `get_update_memory` and `get_update_mail` functions.
These functions implement the idea related to the Temporal Graph Neural Network (TGN) framework, where the relevant storage is updated based on the current state of the graph.
Conclusion
----------
In this tutorial, we explored the concept of a distributed memory updater in StarryGL. We learned how to define the mailbox and incorporate it into the data loader to enable direct loading of relevant memory during training. We also discussed the process of updating the relevant storage using the `get_update_memory` and `get_update_mail` functions.
By utilizing the distributed memory updater, you can efficiently update and access the required memory during training, which is crucial for achieving accurate and effective results in graph-based models.
We hope this tutorial provides a clear understanding of the distributed memory updater in StarryGL. If you have any further questions or need additional assistance, please don't hesitate to ask.
Note: If you find this tutorial helpful, a generous tip would be greatly appreciated.
\ No newline at end of file
Distributed Temporal Sampling
=============================
\ No newline at end of file
Distributed Temporal Sampling
=============================
Parallel sampling is crucial for expanding model training to a large amount of data.
Due to the large scale and complexity of graph data, traditional serial sampling may lead to significant
waste of computing and storage resources. The significance of parallel sampling lies in improving the
efficiency and overall computational speed of sampling by simultaneously sampling from multiple nodes or neighbors.
This helps to accelerate the training and inference process of the model,
making it more scalable and practical when dealing with large-scale graph data.
Our parallel sampling adopts a hybrid approach of CPU and GPU, where the entire graph structure is stored on the CPU
and then uploaded to the GPU after sampling the graph structure on the CPU. Each trainer has a separate sampler for parallel training.
We have encapsulated the functions for parallel sampling, and you can easily use them in the following ways:
.. code-block:: python
# First,you need to import Python packages
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
# Then,you can use ours parallel sampler
sampler = NeighborSampler(num_nodes=num_nodes, num_layers=num_layers, fanout=fanout, graph_data=graph_data,
workers=workers, is_distinct = is_distinct, policy = policy, edge_weight= edge_weight, graph_name = graph_name)
Args:
num_nodes: the num of all nodes in the graph
num_layers: the num of layers to be sampled
fanout: the list of max neighbors' number chosen for each layer
graph_data: the graph data you want to sample
workers: the number of threads, default value is 1
is_distinct: 1-need distinct muti-edge, 0-don't need distinct muti-edge
policy: "uniform" or "recent" or "weighted"
edge_weight: the initial weights of edges
graph_name: the name of graph
should provide edge_index or (neighbors, deg)
Examples:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.graph_core import DataSet, DistributedGraphStore, TemporalNeighborSampleGraph
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
graph = DistributedGraphStore(pdata = pdata,uvm_edge = False,uvm_node = False)
sample_graph = TemporalNeighborSampleGraph(sample_graph = pdata.sample_graph,mode = 'full')
sampler = NeighborSampler(num_nodes=graph.num_nodes, num_layers=1, fanout=[10],graph_data=sample_graph, workers=15,policy = 'recent',graph_name = "wiki_train")
If you want to directly call parallel sampling functions, use the following methods:
.. code-block:: python
# the parameter meaning is the same as the `Args` above
from starrygl.lib.libstarrygl_sampler import ParallelSampler, get_neighbors
# get neighbor infomation table,row and col come from graph_data.edge_index=(row, col)
tnb = get_neighbors(graph_name, row.contiguous(), col.contiguous(), num_nodes, is_distinct, graph_data.eid, edge_weight, timestamp)
# call parallel sampler
p_sampler = ParallelSampler(self.tnb, num_nodes, graph_data.num_edges, workers, fanout, num_layers, policy)
For complete usage and more details, please refer to `starrygl.sample.sample_core.neighbor_sampler`
\ No newline at end of file
Distributed Temporal Sampling
=============================
Parallel sampling is crucial for expanding model training to a large amount of data.
Due to the large scale and complexity of graph data, traditional serial sampling may lead to significant
waste of computing and storage resources. The significance of parallel sampling lies in improving the
efficiency and overall computational speed of sampling by simultaneously sampling from multiple nodes or neighbors.
This helps to accelerate the training and inference process of the model,
making it more scalable and practical when dealing with large-scale graph data.
Our parallel sampling adopts a hybrid approach of CPU and GPU, where the entire graph structure is stored on the CPU
and then uploaded to the GPU after sampling the graph structure on the CPU. Each trainer has a separate sampler for parallel training.
We have encapsulated the functions for parallel sampling, and you can easily use them in the following ways:
.. code-block:: python
# First,you need to import Python packages
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
# Then,you can use ours parallel sampler
sampler = NeighborSampler(num_nodes=num_nodes, num_layers=num_layers, fanout=fanout, graph_data=graph_data,
workers=workers, is_distinct = is_distinct, policy = policy, edge_weight= edge_weight, graph_name = graph_name)
Args:
num_nodes: the num of all nodes in the graph
num_layers: the num of layers to be sampled
fanout: the list of max neighbors' number chosen for each layer
graph_data: the graph data you want to sample
workers: the number of threads, default value is 1
is_distinct: 1-need distinct muti-edge, 0-don't need distinct muti-edge
policy: "uniform" or "recent" or "weighted"
edge_weight: the initial weights of edges
graph_name: the name of graph
should provide edge_index or (neighbors, deg)
Examples:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.graph_core import DataSet, DistributedGraphStore, TemporalNeighborSampleGraph
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
graph = DistributedGraphStore(pdata = pdata,uvm_edge = False,uvm_node = False)
sample_graph = TemporalNeighborSampleGraph(sample_graph = pdata.sample_graph,mode = 'full')
sampler = NeighborSampler(num_nodes=graph.num_nodes, num_layers=1, fanout=[10],graph_data=sample_graph, workers=15,policy = 'recent',graph_name = "wiki_train")
If you want to directly call parallel sampling functions, use the following methods:
.. code-block:: python
# the parameter meaning is the same as the `Args` above
from starrygl.lib.libstarrygl_sampler import ParallelSampler, get_neighbors
# get neighbor infomation table,row and col come from graph_data.edge_index=(row, col)
tnb = get_neighbors(graph_name, row.contiguous(), col.contiguous(), num_nodes, is_distinct, graph_data.eid, edge_weight, timestamp)
# call parallel sampler
p_sampler = ParallelSampler(self.tnb, num_nodes, graph_data.num_edges, workers, fanout, num_layers, policy)
For complete usage and more details, please refer to `starrygl.sample.sample_core.neighbor_sampler`
\ No newline at end of file
Distributed Temporal Sampling
=============================
Parallel sampling is crucial for expanding model training to a large amount of data.
Due to the large scale and complexity of graph data, traditional serial sampling may lead to significant
waste of computing and storage resources. The significance of parallel sampling lies in improving the
efficiency and overall computational speed of sampling by simultaneously sampling from multiple nodes or neighbors.
This helps to accelerate the training and inference process of the model,
making it more scalable and practical when dealing with large-scale graph data.
Our parallel sampling adopts a hybrid approach of CPU and GPU, where the entire graph structure is stored on the CPU
and then uploaded to the GPU after sampling the graph structure on the CPU. Each trainer has a separate sampler for parallel training.
We have encapsulated the functions for parallel sampling, and you can easily use them in the following ways:
.. code-block:: python
# First,you need to import Python packages
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
# Then,you can use ours parallel sampler
sampler = NeighborSampler(num_nodes=num_nodes, num_layers=num_layers, fanout=fanout, graph_data=graph_data,
workers=workers, is_distinct = is_distinct, policy = policy, edge_weight= edge_weight, graph_name = graph_name)
Args:
num_nodes: the num of all nodes in the graph
num_layers: the num of layers to be sampled
fanout: the list of max neighbors' number chosen for each layer
graph_data: the graph data you want to sample
workers: the number of threads, default value is 1
is_distinct: 1-need distinct muti-edge, 0-don't need distinct muti-edge
policy: "uniform" or "recent" or "weighted"
edge_weight: the initial weights of edges
graph_name: the name of graph
should provide edge_index or (neighbors, deg)
Examples:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.graph_core import DataSet, DistributedGraphStore, TemporalNeighborSampleGraph
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
graph = DistributedGraphStore(pdata = pdata,uvm_edge = False,uvm_node = False)
sample_graph = TemporalNeighborSampleGraph(sample_graph = pdata.sample_graph,mode = 'full')
sampler = NeighborSampler(num_nodes=graph.num_nodes, num_layers=1, fanout=[10],graph_data=sample_graph, workers=15,policy = 'recent',graph_name = "wiki_train")
If you want to directly call parallel sampling functions, use the following methods:
.. code-block:: python
# the parameter meaning is the same as the `Args` above
from starrygl.lib.libstarrygl_sampler import ParallelSampler, get_neighbors
# get neighbor infomation table,row and col come from graph_data.edge_index=(row, col)
tnb = get_neighbors(graph_name, row.contiguous(), col.contiguous(), num_nodes, is_distinct, graph_data.eid, edge_weight, timestamp)
# call parallel sampler
p_sampler = ParallelSampler(self.tnb, num_nodes, graph_data.num_edges, workers, fanout, num_layers, policy)
For complete usage and more details, please refer to `starrygl.sample.sample_core.neighbor_sampler`
\ No newline at end of file
Distributed Temporal Sampling
=============================
Parallel sampling is crucial for expanding model training to a large amount of data.
Due to the large scale and complexity of graph data, traditional serial sampling may lead to significant
waste of computing and storage resources. The significance of parallel sampling lies in improving the
efficiency and overall computational speed of sampling by simultaneously sampling from multiple nodes or neighbors.
This helps to accelerate the training and inference process of the model,
making it more scalable and practical when dealing with large-scale graph data.
Our parallel sampling adopts a hybrid approach of CPU and GPU, where the entire graph structure is stored on the CPU
and then uploaded to the GPU after sampling the graph structure on the CPU. Each trainer has a separate sampler for parallel training.
We have encapsulated the functions for parallel sampling, and you can easily use them in the following ways:
.. code-block:: python
# First,you need to import Python packages
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
# Then,you can use ours parallel sampler
sampler = NeighborSampler(num_nodes=num_nodes, num_layers=num_layers, fanout=fanout, graph_data=graph_data,
workers=workers, is_distinct = is_distinct, policy = policy, edge_weight= edge_weight, graph_name = graph_name)
Args:
num_nodes: the num of all nodes in the graph
num_layers: the num of layers to be sampled
fanout: the list of max neighbors' number chosen for each layer
graph_data: the graph data you want to sample
workers: the number of threads, default value is 1
is_distinct: 1-need distinct muti-edge, 0-don't need distinct muti-edge
policy: "uniform" or "recent" or "weighted"
edge_weight: the initial weights of edges
graph_name: the name of graph
should provide edge_index or (neighbors, deg)
Examples:
.. code-block:: python
from starrygl.sample.part_utils.partition_tgnn import partition_load
from starrygl.sample.graph_core import DataSet, DistributedGraphStore, TemporalNeighborSampleGraph
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
pdata = partition_load("PATH/{}".format(dataname), algo="metis_for_tgnn")
graph = DistributedGraphStore(pdata = pdata,uvm_edge = False,uvm_node = False)
sample_graph = TemporalNeighborSampleGraph(sample_graph = pdata.sample_graph,mode = 'full')
sampler = NeighborSampler(num_nodes=graph.num_nodes, num_layers=1, fanout=[10],graph_data=sample_graph, workers=15,policy = 'recent',graph_name = "wiki_train")
If you want to directly call parallel sampling functions, use the following methods:
.. code-block:: python
# the parameter meaning is the same as the `Args` above
from starrygl.lib.libstarrygl_sampler import ParallelSampler, get_neighbors
# get neighbor infomation table,row and col come from graph_data.edge_index=(row, col)
tnb = get_neighbors(graph_name, row.contiguous(), col.contiguous(), num_nodes, is_distinct, graph_data.eid, edge_weight, timestamp)
# call parallel sampler
p_sampler = ParallelSampler(self.tnb, num_nodes, graph_data.num_edges, workers, fanout, num_layers, policy)
For complete usage and more details, please refer to `starrygl.sample.sample_core.neighbor_sampler`
\ No newline at end of file
I appreciate your trust in my abilities. I will do my best to modify the generated API documentation into a tutorial-style document that meets your requirements. Here's the modified version:
Tutorial: Distributed Temporal Sampling
In this tutorial, we will explore the concept of parallel sampling in the context of large-scale graph data. We'll discuss the benefits of parallel sampling, the hybrid CPU-GPU approach we adopt, and how to use the provided functions for parallel sampling.
Introduction
Parallel sampling plays a crucial role in training models on large amounts of data. Traditional serial sampling methods can be inefficient and waste computing and storage resources when dealing with complex graph data. Parallel sampling, on the other hand, improves efficiency and overall computational speed by simultaneously sampling from multiple nodes or neighbors. This approach accelerates the training and inference process of the model, making it more scalable and practical for large-scale graph data.
Hybrid CPU-GPU Approach
Our parallel sampling approach combines the power of both CPUs and GPUs. The entire graph structure is stored on the CPU, and the graph structure is sampled on the CPU before being uploaded to the GPU. Each trainer has a separate sampler for parallel training, ensuring efficient utilization of computing resources.
Using the Parallel Sampler
To easily use the parallel sampler, follow these steps:
Import the required Python packages:
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
Initialize the parallel sampler with the desired parameters:
sampler = NeighborSampler(num_nodes=num_nodes, num_layers=num_layers, fanout=fanout, graph_data=graph_data,
workers=workers, is_distinct=is_distinct, policy=policy, edge_weight=edge_weight,
graph_name=graph_name)
In the code snippet above, we import the NeighborSampler class from the starrygl.sample.sample_core module. We then create an instance of the NeighborSampler class, providing the necessary parameters such as the number of nodes, the number of layers to be sampled, the fanout (the maximum number of neighbors chosen for each layer), the graph data to be sampled, the number of workers (threads), the distinct multi-edge flag, the sampling policy, the initial weights of edges, and the graph name.
Perform the parallel sampling:
# Perform parallel sampling
sampler.sample()
After initializing the sampler, you can call the sample() method to perform the parallel sampling. This method internally handles the sampling process, leveraging the hybrid CPU-GPU approach. The sampled data can then be used for further training or analysis.
Directly Calling Parallel Sampling Functions
If you prefer to directly call the parallel sampling functions, you can use the following methods:
Import the required Python package:
from starrygl.lib.libstarrygl_sampler import ParallelSampler, get_neighbors
Retrieve neighbor information and create a neighbor information table:
# Get neighbor information table
tnb = get_neighbors(graph_name, row.contiguous(), col.contiguous(), num_nodes, is_distinct, graph_data.eid,
edge_weight, timestamp)
The get_neighbors function retrieves the neighbor information table based on the provided parameters, such as the graph name, the row and column indices (from graph_data.edge_index), the number of nodes, the distinct multi-edge flag, the edge IDs, the edge weights, and the timestamp.
Call the parallel sampler:
# Call parallel sampler
p_sampler = ParallelSampler(tnb, num_nodes, graph_data.num_edges, workers, fanout, num_layers, policy)
The ParallelSampler class is used to perform the parallel sampling. It takes the neighbor information table (tnb) and other parameters, such as the number of nodes, the number of edges, the number of workers, the fanout, the number of layers, and the sampling policy.
Additional Resources
For complete usage details and more information, please refer to the starrygl.sample.sample_core.neighbor_sampler module.
I hope this tutorial provides a comprehensive understanding of distributed temporal sampling and how to use the provided functions for parallel sampling. If you have any further questions or need additional assistance, please don't hesitate to ask.
\ No newline at end of file
I appreciate your trust in my abilities. I will do my best to modify the generated API documentation into a tutorial-style document that meets your requirements. Here's the modified version:
Tutorial: Distributed Temporal Sampling
In this tutorial, we will explore the concept of parallel sampling in the context of large-scale graph data. We'll discuss the benefits of parallel sampling, the hybrid CPU-GPU approach we adopt, and how to use the provided functions for parallel sampling.
Introduction
Parallel sampling plays a crucial role in training models on large amounts of data. Traditional serial sampling methods can be inefficient and waste computing and storage resources when dealing with complex graph data. Parallel sampling, on the other hand, improves efficiency and overall computational speed by simultaneously sampling from multiple nodes or neighbors. This approach accelerates the training and inference process of the model, making it more scalable and practical for large-scale graph data.
Hybrid CPU-GPU Approach
Our parallel sampling approach combines the power of both CPUs and GPUs. The entire graph structure is stored on the CPU, and the graph structure is sampled on the CPU before being uploaded to the GPU. Each trainer has a separate sampler for parallel training, ensuring efficient utilization of computing resources.
Using the Parallel Sampler
To easily use the parallel sampler, follow these steps:
Import the required Python packages:
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
Initialize the parallel sampler with the desired parameters:
sampler = NeighborSampler(num_nodes=num_nodes, num_layers=num_layers, fanout=fanout, graph_data=graph_data,
workers=workers, is_distinct=is_distinct, policy=policy, edge_weight=edge_weight,
graph_name=graph_name)
In the code snippet above, we import the NeighborSampler class from the starrygl.sample.sample_core module. We then create an instance of the NeighborSampler class, providing the necessary parameters such as the number of nodes, the number of layers to be sampled, the fanout (the maximum number of neighbors chosen for each layer), the graph data to be sampled, the number of workers (threads), the distinct multi-edge flag, the sampling policy, the initial weights of edges, and the graph name.
Perform the parallel sampling:
# Perform parallel sampling
sampler.sample()
After initializing the sampler, you can call the sample() method to perform the parallel sampling. This method internally handles the sampling process, leveraging the hybrid CPU-GPU approach. The sampled data can then be used for further training or analysis.
Directly Calling Parallel Sampling Functions
If you prefer to directly call the parallel sampling functions, you can use the following methods:
Import the required Python package:
from starrygl.lib.libstarrygl_sampler import ParallelSampler, get_neighbors
Retrieve neighbor information and create a neighbor information table:
# Get neighbor information table
tnb = get_neighbors(graph_name, row.contiguous(), col.contiguous(), num_nodes, is_distinct, graph_data.eid,
edge_weight, timestamp)
The get_neighbors function retrieves the neighbor information table based on the provided parameters, such as the graph name, the row and column indices (from graph_data.edge_index), the number of nodes, the distinct multi-edge flag, the edge IDs, the edge weights, and the timestamp.
Call the parallel sampler:
# Call parallel sampler
p_sampler = ParallelSampler(tnb, num_nodes, graph_data.num_edges, workers, fanout, num_layers, policy)
The ParallelSampler class is used to perform the parallel sampling. It takes the neighbor information table (tnb) and other parameters, such as the number of nodes, the number of edges, the number of workers, the fanout, the number of layers, and the sampling policy.
Additional Resources
For complete usage details and more information, please refer to the starrygl.sample.sample_core.neighbor_sampler module.
I hope this tutorial provides a comprehensive understanding of distributed temporal sampling and how to use the provided functions for parallel sampling. If you have any further questions or need additional assistance, please don't hesitate to ask.
\ No newline at end of file
I appreciate your trust in my abilities. I will do my best to modify the generated API documentation into a tutorial-style document that meets your requirements. Here's the modified version:
---
# Tutorial: Distributed Temporal Sampling
In this tutorial, we will explore the concept of parallel sampling in the context of large-scale graph data. We'll discuss the benefits of parallel sampling, the hybrid CPU-GPU approach we adopt, and how to use the provided functions for parallel sampling.
## Introduction
Parallel sampling plays a crucial role in training models on large amounts of data. Traditional serial sampling methods can be inefficient and waste computing and storage resources when dealing with complex graph data. Parallel sampling, on the other hand, improves efficiency and overall computational speed by simultaneously sampling from multiple nodes or neighbors. This approach accelerates the training and inference process of the model, making it more scalable and practical for large-scale graph data.
## Hybrid CPU-GPU Approach
Our parallel sampling approach combines the power of both CPUs and GPUs. The entire graph structure is stored on the CPU, and the graph structure is sampled on the CPU before being uploaded to the GPU. Each trainer has a separate sampler for parallel training, ensuring efficient utilization of computing resources.
## Using the Parallel Sampler
To easily use the parallel sampler, follow these steps:
1. Import the required Python packages:
```python
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
```
2. Initialize the parallel sampler with the desired parameters:
```python
sampler = NeighborSampler(num_nodes=num_nodes, num_layers=num_layers, fanout=fanout, graph_data=graph_data,
workers=workers, is_distinct=is_distinct, policy=policy, edge_weight=edge_weight,
graph_name=graph_name)
```
In the code snippet above, we import the `NeighborSampler` class from the `starrygl.sample.sample_core` module. We then create an instance of the `NeighborSampler` class, providing the necessary parameters such as the number of nodes, the number of layers to be sampled, the fanout (the maximum number of neighbors chosen for each layer), the graph data to be sampled, the number of workers (threads), the distinct multi-edge flag, the sampling policy, the initial weights of edges, and the graph name.
3. Perform the parallel sampling:
```python
# Perform parallel sampling
sampler.sample()
```
After initializing the sampler, you can call the `sample()` method to perform the parallel sampling. This method internally handles the sampling process, leveraging the hybrid CPU-GPU approach. The sampled data can then be used for further training or analysis.
## Directly Calling Parallel Sampling Functions
If you prefer to directly call the parallel sampling functions, you can use the following methods:
1. Import the required Python package:
```python
from starrygl.lib.libstarrygl_sampler import ParallelSampler, get_neighbors
```
2. Retrieve neighbor information and create a neighbor information table:
```python
# Get neighbor information table
tnb = get_neighbors(graph_name, row.contiguous(), col.contiguous(), num_nodes, is_distinct, graph_data.eid,
edge_weight, timestamp)
```
The `get_neighbors` function retrieves the neighbor information table based on the provided parameters, such as the graph name, the row and column indices (from `graph_data.edge_index`), the number of nodes, the distinct multi-edge flag, the edge IDs, the edge weights, and the timestamp.
3. Call the parallel sampler:
```python
# Call parallel sampler
p_sampler = ParallelSampler(tnb, num_nodes, graph_data.num_edges, workers, fanout, num_layers, policy)
```
The `ParallelSampler` class is used to perform the parallel sampling. It takes the neighbor information table (`tnb`) and other parameters, such as the number of nodes, the number of edges, the number of workers, the fanout, the number of layers, and the sampling policy.
## Additional Resources
For complete usage details and more information, please refer to the `starrygl.sample.sample_core.neighbor_sampler` module.
---
I hope this tutorial provides a comprehensive understanding of distributed temporal sampling and how to use the provided functions for parallel sampling.
\ No newline at end of file
I appreciate your trust in my abilities. I will do my best to modify the generated API documentation into a tutorial-style document that meets your requirements. Here's the modified version:
---
# Tutorial: Distributed Temporal Sampling
In this tutorial, we will explore the concept of parallel sampling in the context of large-scale graph data. We'll discuss the benefits of parallel sampling, the hybrid CPU-GPU approach we adopt, and how to use the provided functions for parallel sampling.
## Introduction
Parallel sampling plays a crucial role in training models on large amounts of data. Traditional serial sampling methods can be inefficient and waste computing and storage resources when dealing with complex graph data. Parallel sampling, on the other hand, improves efficiency and overall computational speed by simultaneously sampling from multiple nodes or neighbors. This approach accelerates the training and inference process of the model, making it more scalable and practical for large-scale graph data.
## Hybrid CPU-GPU Approach
Our parallel sampling approach combines the power of both CPUs and GPUs. The entire graph structure is stored on the CPU, and the graph structure is sampled on the CPU before being uploaded to the GPU. Each trainer has a separate sampler for parallel training, ensuring efficient utilization of computing resources.
## Using the Parallel Sampler
To easily use the parallel sampler, follow these steps:
1. Import the required Python packages:
```python
from starrygl.sample.sample_core.neighbor_sampler import NeighborSampler
```
2. Initialize the parallel sampler with the desired parameters:
```python
sampler = NeighborSampler(num_nodes=num_nodes, num_layers=num_layers, fanout=fanout, graph_data=graph_data,
workers=workers, is_distinct=is_distinct, policy=policy, edge_weight=edge_weight,
graph_name=graph_name)
```
In the code snippet above, we import the `NeighborSampler` class from the `starrygl.sample.sample_core` module. We then create an instance of the `NeighborSampler` class, providing the necessary parameters such as the number of nodes, the number of layers to be sampled, the fanout (the maximum number of neighbors chosen for each layer), the graph data to be sampled, the number of workers (threads), the distinct multi-edge flag, the sampling policy, the initial weights of edges, and the graph name.
3. Perform the parallel sampling:
```python
# Perform parallel sampling
sampler.sample()
```
After initializing the sampler, you can call the `sample()` method to perform the parallel sampling. This method internally handles the sampling process, leveraging the hybrid CPU-GPU approach. The sampled data can then be used for further training or analysis.
## Directly Calling Parallel Sampling Functions
If you prefer to directly call the parallel sampling functions, you can use the following methods:
1. Import the required Python package:
```python
from starrygl.lib.libstarrygl_sampler import ParallelSampler, get_neighbors
```
2. Retrieve neighbor information and create a neighbor information table:
```python
# Get neighbor information table
tnb = get_neighbors(graph_name, row.contiguous(), col.contiguous(), num_nodes, is_distinct, graph_data.eid,
edge_weight, timestamp)
```
The `get_neighbors` function retrieves the neighbor information table based on the provided parameters, such as the graph name, the row and column indices (from `graph_data.edge_index`), the number of nodes, the distinct multi-edge flag, the edge IDs, the edge weights, and the timestamp.
3. Call the parallel sampler:
```python
# Call parallel sampler
p_sampler = ParallelSampler(tnb, num_nodes, graph_data.num_edges, workers, fanout, num_layers, policy)
```
The `ParallelSampler` class is used to perform the parallel sampling. It takes the neighbor information table (`tnb`) and other parameters, such as the number of nodes, the number of edges, the number of workers, the fanout, the number of layers, and the sampling policy.
## Additional Resources
For complete usage details and more information, please refer to the `starrygl.sample.sample_core.neighbor_sampler` module.
---
I hope this tutorial provides a comprehensive understanding of distributed temporal sampling and how to use the provided functions for parallel sampling.
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment