details: High-Performance and Scalable GPU Graph Traversal
我读的这篇详细的
BFS

approach

components
Contract-Expand

Two-Phase

gathering
Coarse-Grained, Warp-Based Gathering


Fine-Grained, Scan-Based Gathering


Scan+Warp+CTA Gathering


举例

filter(过滤已经visited的、vertex_frontier->edge_frontier)
bitmask

Warp Culling


History Culling

code

Duane Merrill. 2011. Back40 computing: Fast and efficient software primitives for GPU computing.
http://code.google.com/p/back40computing/
这个现在被合并到 https://nvlabs.github.io/cub/