-
Notifications
You must be signed in to change notification settings - Fork 7k
[core][rdt] Register your own transport at runtime for RDT #59255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
…port Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new public API, register_tensor_transport, to allow registering custom tensor transports for RDT at runtime. This is a valuable feature for extensibility. The implementation refactors TensorTransportEnum to use strings for transport types and migrates existing transports (NIXL, NCCL, GLOO) to the new registration API. The documentation is also updated accordingly.
However, I've identified a couple of issues. The most significant is that ray.put and ray.get still contain hardcoded checks that only permit "NIXL" and "OBJECT_STORE", which undermines the goal of supporting custom transports. Additionally, there's some duplicated code for validating transport names across different files. My review provides specific feedback to address these points.
python/ray/experimental/gpu_object_manager/gpu_object_manager.py
Outdated
Show resolved
Hide resolved
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
e3a266a to
55d0da9
Compare
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
55d0da9 to
20918be
Compare
Description
Adding a new public api for registering a tensor transport for RDT at runtime. You just need to call
register_tensor_transportwith a name, a list of supported devices, and a class that implements theTensorTransportManagerinterface.Adding documentation for this too.
Testing by just registering NIXL, NCCL, and GLOO at runtime with this API, and having all the existing tests for the transports pass.