On 2022/4/14 6:04, Andrew Morton wrote: > On Wed, 13 Apr 2022 14:27:54 +0800 "liupeng (DM)" wrote: > >> On 2022/4/13 12:42, Andrew Morton wrote: >>> On Wed, 13 Apr 2022 03:29:12 +0000 Peng Liu wrote: >>> >>>> Certain systems are designed to have sparse/discontiguous nodes. In >>>> this case, nr_online_nodes can not be used to walk through numa node. >>>> Also, a valid node may be greater than nr_online_nodes. >>>> >>>> However, in hugetlb, it is assumed that nodes are contiguous. Recheck >>>> all the places that use nr_online_nodes, and repair them one by one. >>>> >>> What are the runtime effects of this shortcoming? >>> . >> For sparse/discontiguous nodes, the current code may treat a valid node >> as invalid, and will fail to allocate all hugepages on a valid node that >> "nid >= nr_online_nodes". >> >> As David suggested: >> if (tmp >= nr_online_nodes) >> goto invalid; >> >> Just imagine node 0 and node 2 are online, and node 1 is offline. Assuming >> that "node < 2" is valid is wrong. > So do you think we should backport thtis fix into earlier kernel releases? > . I think it is not an urgent bug, because: 1) Qemu does not support sparse node so far, although there are some sparse-node issues to make qemu support sparse node. 2) I don't find an actual normal machine that reports sparse-node and need to use hugepages so far.