From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Fri, 13 Jul 2007 13:05:08 -0700 From: Paul Jackson Subject: Re: [PATCH 5/5] [hugetlb] Try to grow pool for MAP_SHARED mappings Message-Id: <20070713130508.6f5b9bbb.pj@sgi.com> In-Reply-To: <20070713151717.17750.44865.stgit@kernel> References: <20070713151621.17750.58171.stgit@kernel> <20070713151717.17750.44865.stgit@kernel> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Adam Litke Cc: linux-mm@kvack.org, mel@skynet.ie, apw@shadowen.org, wli@holomorphy.com, clameter@sgi.com, kenchen@google.com List-ID: Adam wrote: > + /* > + * I haven't figured out how to incorporate this cpuset bodge into > + * the dynamic hugetlb pool yet. Hopefully someone more familiar with > + * cpusets can weigh in on their desired semantics. Maybe we can just > + * drop this check? > + * > if (chg > cpuset_mems_nr(free_huge_pages_node)) > return -ENOMEM; > + */ I can't figure out the value of this check either -- Ken Chen added it, perhaps he can comment. But the cpuset behaviour of this hugetlb stuff looks suspicious to me: 1) The code in alloc_fresh_huge_page() seems to round robin over the entire system, spreading the hugetlb pages uniformly on all nodes. If one a task in one small cpuset starts aggressively allocating hugetlb pages, do you think this will work, Adam -- looks to me like we will end up calling alloc_fresh_huge_page() many times, most of which will fail to alloc_pages_node() anything because the 'static nid' clock hand will be pointing at a node outside of the current tasks cpuset (not in that tasks mems_allowed). Inefficient, but I guess ok. 2) I don't see what keeps us from picking hugetlb pages off -any- node in the system, perhaps way outside the current cpuset. We shouldn't be looking for enough available (free_huge_pages - resv_huge_pages) pages in the whole system. Rather we should be looking for and reserving enough such pages that are in the current tasks cpuset (set in its mems_allowed, to be precise) Folks aren't going to want their hugetlb pages coming from outside their tasks cpuset. 3) If there is some code I missed (good chance) that enforces the rule that a task can only get a hugetlb page from a node in its cpuset, then this uniform global allocation of hugetlb pages, as noted in (1) above, can't be right. Either it will force all nodes, including many nodes outside of the current tasks cpuset, to bulk up on free hugetlb pages, just to get enough of them on nodes allowed by the current tasks cpuset, or else it will fail to get enough on nodes local to the current tasks cpuset. I don't understand the logic well enough to know which, but either way sucks. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson 1.925.600.0401 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org