pytorch/pytorch: viable/strict/1781002135
English summary
This release note details a commit that folds the decomposed gelu operation back into the native CUTLASS GELU implementation. The change is part of the inductor and cutlass backend for PyTorch. It aims to improve performance by reducing overhead from the decomposition. This update is likely to enhance efficiency in models using GELU activations.
Chinese summary
该发布说明记录了一项提交,将分解后的GELU操作重新融合回原生的CUTLASS GELU实现中。此更改是PyTorch的inductor和cutlass后端的一部分。旨在通过减少分解带来的开销来提升性能。这一更新可能会提高使用GELU激活函数的模型的效率。
Key points
Fold decomposed gelu back into native CUTLASS GELU
将分解的GELU重新融合到原生CUTLASS GELU
Optimization for inductor and cutlass backend in PyTorch
针对PyTorch的inductor和cutlass后端的优化