Skip to content

Conversation

@dayshah
Copy link
Contributor

@dayshah dayshah commented Dec 8, 2025

Description

Adding a new public api for registering a tensor transport for RDT at runtime. You just need to call register_tensor_transport with a name, a list of supported devices, and a class that implements the TensorTransportManager interface.

Adding documentation for this too.

Testing by just registering NIXL, NCCL, and GLOO at runtime with this API, and having all the existing tests for the transports pass.

Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
…port

Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
@dayshah dayshah added the go add ONLY when ready to merge, run all tests label Dec 8, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new public API, register_tensor_transport, to allow registering custom tensor transports for RDT at runtime. This is a valuable feature for extensibility. The implementation refactors TensorTransportEnum to use strings for transport types and migrates existing transports (NIXL, NCCL, GLOO) to the new registration API. The documentation is also updated accordingly.

However, I've identified a couple of issues. The most significant is that ray.put and ray.get still contain hardcoded checks that only permit "NIXL" and "OBJECT_STORE", which undermines the goal of supporting custom transports. Additionally, there's some duplicated code for validating transport names across different files. My review provides specific feedback to address these points.

Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
@dayshah dayshah force-pushed the bring-your-transport branch from e3a266a to 55d0da9 Compare December 8, 2025 23:40
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
@dayshah dayshah force-pushed the bring-your-transport branch from 55d0da9 to 20918be Compare December 9, 2025 01:34
@dayshah dayshah marked this pull request as ready for review December 9, 2025 06:40
@dayshah dayshah requested a review from a team as a code owner December 9, 2025 06:40
@dayshah dayshah requested a review from a team as a code owner December 9, 2025 06:40
@ray-gardener ray-gardener bot added docs An issue or change related to documentation train Ray Train Related Issue core Issues that should be addressed in Ray Core labels Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issues that should be addressed in Ray Core docs An issue or change related to documentation go add ONLY when ready to merge, run all tests train Ray Train Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants