今日目标：性能瓶颈弄清楚，单线程版优化；设计和开写多线程版
周二三四（做ppt）加油！

上午

1
- operf分线程2500x
2
operf结论：

下午

毕设中期答辩

1
- operf choice1，确定单线程的优化方向
- 思考choice1(visited数组)和choice2(visited集合)哪个更适合多线程
  - 都少不了这个竞争，差不多
2
- 重新对比潜力：choice2可以避免锁，决定优化这个和写这个的多线程
- 得到choice3（晚上继续想想！）

晚上

1
- 设计choice3算法
2
- code choice3
3
- code choice3: besides graph load
4
- code choice3: graph load
5
- debug choice3

operf分析

结论：实现choice1+choice2=choice3:
step1：从spine沿父亲搜索到interface，(为每个spine记录interface，并)更新board_safety[leaf]board，同时维护全顶点visited数组(内容是可达的interface)用于优化搜索
board_safety: CSR: leaf_offset, board_safety
可以设计成(Vertex,bool)是因为，如果<2个spine可达这个boardId，则最多只有一个spine可达，之后用来去除visited信息时有这个spine信息就可以
读文件leaf->interface类型边时，分配board_safety
多线程：多个spine一个线程，
per visited[v] a lock 读写锁（读者多+visited[v]只会有一个写者）
per board_safety[leaf][board] a lock 读写锁
step2：
对于board_safety[leaf][board]bool==0的spine，走到interface一路把visited数组清零（这样等下扫visited数组构造时就不会加入这些spine->leaf的路径了），发现visited[v]已经是0了则不用继续走
此过程没有data race，因为都是写成0，即使写相同的visited[v]也没问题；而对于读，不可能读到假的零，而若读到假的非零，也只是变身写者
然后扫描visited数组，(visited[v]是interface)根据visited[v].leaf将p[v]->v记录到对应pattern

step1:
Case2Tree::Graph::risky(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)::{lambda()#1}::operator()() const
step2:
Case2Tree::Graph::risky(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)::{lambda()#2}::operator()() const
output_edges:
Case2Tree::Graph::output_edges(std::vector<unsigned int, std::allocator >&, unsigned long, Case2Tree::Graph::OutputType, std::ostream&)

choice2(not record path when search)

step1: 5,053,860us 一个线程x个spine，perleaf[leaf][board] per lock(由于预先不知道每个board有多少spine，因此不能提前画好位置)
36.095% map[]: perleaf[v][bdid] (perleaf[v]:mapspines>)
create entry in perleaf[v] for each meeting board
解决方案：pre-allocate
21.6855% perleaf[v][bdid].push_back
push_back spine to leaf’s board
解决方案：每个leaf的board的spine的bool数组：不对，我预先不知道每个board有多少spine
perleaf结构修改：为每个spine记录可达的interface即可，perleaf[v][bdid]改成++
13.0142% self
8.1145% isEnable[] 2.7896% std::_Bit_reference::operator bool() const
解决方案：可以仅map isEnable==1的spine，不过要存储中间漏了哪些，用于翻译回string-id
2.4856% parent[] boardId[] lower bound: 125,335us

step2(normal output): 72,152,582us 一个线程一个leaf’s board，无锁
45.8412% output_edges
39,683,920us:
22.6213% visited.insert
15.5867% visited.count
2.8535% ~unordered_set(visited)
解决方案：全顶点visited数组替代unordered_set
且多线程的时候(一个leaf’s board一个线程)，不同的(leaf,board)一个visited数组即可，因为不同的(leaf,board)之间顶点没有交集，不会有竞争
2.1778% self
0.6288% append_edge lower bound: 249,532us
0.3643% parent[] lower bound: 144,568us

choice1(record path when search)

若有办法把属于一个leaf’s board(或者属于同一个leaf)的spine放在一个线程里面，这里就不需要锁了
由spine.isEnable可以知道spine是否可以到达某leaf
step1: 9,852,573us 理想的是只要visited[] parent[] boardId[]
40.5746% map[]: perleaf[v][bdid] (perleaf[v]:mapedges>)
create entry in perleaf[v] for each meeting board
解决方案：pre-allocate
17.5239% perleaf[v][bdid].push_back lower bound: 1,724,200us
push_back edges to leaf’s board
9.9925% self
9.2607% std::vector visited(start[node_ty_num], (VertexIdType)-1ull);
init visited
解决方案：由于interface的vertex-id不等于0（可以在图加载时控制），初始化不用全bit1，0可以
4.4837% isEnable[] 1.4993% std::_Bit_reference::operator bool() const
解决方案：可以仅map isEnable==1的spine，不过要存储中间漏了哪些，用于翻译回string-id
这样还可以缩短self的时间
2.1628% visited[] parent[] boardId[] lower bound: 212,815us

step2: 132,514,316us
94.3131% output_edges
1.1779% ostream::flush
for we delete file?
0.9913% std::basic_filebuf<char, std::char_traits >::sync()
~1% std::basic_ostream
3,975,429us:
map.erase
…