PyTorch Adds Symmetric Ops for XPU to Support Asynchronous Tensor Parallelism
English summary
PyTorch trunk now enables symmetric communication operations for Intel's XPU backend, allowing computation and communication to overlap and reduce overhead on Intel client GPUs. The symmetric ops are designed for asynchronous tensor parallelism (async TP). The implementation involved backend changes in intel/torch-xpu-ops#2041 and Python op enabling in this pull request (#185102). Operation correctness was verified through tests in intel/torch-xpu-ops#3747, and the PR was approved by multiple reviewers.
Chinese summary
PyTorch 主干代码已为 Intel XPU 后端启用对称通信操作,使计算与通信可重叠执行,以降低 Intel 客户端 GPU 上的张量并行开销。这些对称操作旨在实现异步张量并行(async TP)。实现包括在 intel/torch-xpu-ops#2041 中完成的后端更改,以及本 PR 中启用的 Python 操作。操作正确性已通过 intel/torch-xpu-ops#3747 的测试验证,该 PR 已获多位审核者批准。
Key points
PyTorch trunk enables symmetric communication ops for Intel XPU backend.
PyTorch 主干为 Intel XPU 后端启用对称通信操作。
These ops allow overlapping computation and communication, reducing overhead in tensor parallelism on Intel GPUs.
这些操作可重叠计算与通信,降低 Intel GPU 上张量并行的开销。
The change spans backend modifications in torch-xpu-ops and Python-level op enabling in this PR.
该更改涉及 torch-xpu-ops 的后端修改及本 PR 中的 Python 层面操作启用。
Testing was verified with torch-xpu-ops test suite, and the PR was approved by multiple reviewers.
测试通过 torch-xpu-ops 测试套件验证,PR 获多位审核者批准。