details: High-Performance and Scalable GPU Graph Traversal
我读的这篇详细的
BFS
approach
components
Contract-Expand
Two-Phase
gathering
Coarse-Grained, Warp-Based Gathering
Fine-Grained, Scan-Based Gathering
Scan+Warp+CTA Gathering
举例
filter(过滤已经visited的、vertex_frontier->edge_frontier)
bitmask
Warp Culling
History Culling
code
Duane Merrill. 2011. Back40 computing: Fast and efficient software primitives for GPU computing.
http://code.google.com/p/back40computing/
这个现在被合并到 https://nvlabs.github.io/cub/