From mboxrd@z Thu Jan 1 00:00:00 1970 Subject: Re: [PATCH] populated_map: fix !NUMA case, remove comment From: Lee Schermerhorn In-Reply-To: References: <20070611225213.GB14458@us.ibm.com> <20070611234155.GG14458@us.ibm.com> <20070612000705.GH14458@us.ibm.com> <20070612020257.GF3798@us.ibm.com> <20070612023209.GJ3798@us.ibm.com> <20070612032055.GQ3798@us.ibm.com> <1181660782.5592.50.camel@localhost> Content-Type: text/plain Date: Tue, 12 Jun 2007 15:07:27 -0400 Message-Id: <1181675248.5592.112.camel@localhost> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Christoph Lameter Cc: Nishanth Aravamudan , anton@samba.org, akpm@linux-foundation.org, linux-mm@kvack.org List-ID: On Tue, 2007-06-12 at 11:41 -0700, Christoph Lameter wrote: > On Tue, 12 Jun 2007, Lee Schermerhorn wrote: > > > page = alloc_pages_node(nid, > > GFP_HIGHUSER|__GFP_COMP|GFP_THISNODE, > > HUGETLB_PAGE_ORDER); > > > > I need to get a page that is on nid. On our platform, GFP_HIGHUSER is > > going to specify the zonelist for ZONE_NORMAL. The first zone on this > > list needs to be on-node for nid. With the changes you've made to the > > definition of populated map, I think this won't be the case. I need to > > test your latest patches and fix that, if it's broken. > > Yes that is the intend of the fixes. > > > I still think using policy zone is the "right way" to go, here. After > > all, only pages in the policy zone are controlled by policy, and that's > > the goal of spreading out the huge pages across nodes--to make them > > available to satisfy memory policy at allocation time. But that would > > need some adjustments for x86_64 systems that have some nodes that are > > all/mostly DMA32 and other nodes that are populated in zones > DMA32, if > > we want to allocate huge pages out of the DMA32 zone. > > GFP_THISNODE will work right for that case if we get the intended fix in. OK. So, allocations with 'THISNODE will ensure that we don't get an off-node page if the any zone in the zonelist indicated by the gfp zone happens to point off-node? That wasn't the case previously, and that's why I created the populated map with the semantics I did. I agree it's better to have alloc_page_*() handle this. > > > > > As far as the static variable, and round-robin allocation: the current > > method "works" both for huge pages allocated at boot time and for huge > > pages allocated at run-time vi the vm.nr_hugepages sysctl. By "works", > > I mean that it continues to spread the pages evenly across the > > "populated" nodes. If, however, you use the task local counter to > > interleave fresh huge pages, each write to the nr_hugepages from a > > different task ["echo NN >.../nr_hugepages"] will start at node zero or > > the first populated node--assuming you're interleaving across populated > > nodes and not on-line nodes. That's probably OK if you always change > > We may want to change that behavior. Interleave should start at the local > node and then proceed from there. If there are just a few pages needed > then they would be better placed local to the process. For page interleaving for some memory object, I agree. The usage here was for the allocation of reserved huge pages--trying to spread those evenly across nodes with appropriate [non-DMA] memory. Then, normal interleaving will work as huge pages are allocated from the per node reserved lists until some node's huge pages are exhausted. I think that using a local "cursor", as you propose, will work, tho'--even for spreading huge page allocations for the reserved lists. We may tend to favor low order nodes if one incrementally increases nr_hugepages via the sysctl. But, I don't think that's too regular an occurrence. I'm not sure Nish can use the mempolicy huge page interleaving allocator, tho' That allocates FROM the per node reserved lists, and alloc_fresh_huge_page[_node]() is used to fill those lists. Lee -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org