From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Wed, 16 Feb 2005 02:20:09 -0800 From: Paul Jackson Subject: Re: manual page migration -- issue list Message-Id: <20050216022009.7afb2e6d.pj@sgi.com> In-Reply-To: <20050216092011.GA6616@lnx-holt.americas.sgi.com> References: <42128B25.9030206@sgi.com> <20050215165106.61fd4954.pj@sgi.com> <20050216015622.GB28354@lnx-holt.americas.sgi.com> <20050215202214.4b833bf3.pj@sgi.com> <20050216092011.GA6616@lnx-holt.americas.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Robin Holt Cc: raybry@sgi.com, linux-mm@kvack.org, ak@muc.de, haveblue@us.ibm.com, marcello@cyclades.com, stevel@mwwireless.net, peterc@gelato.unsw.edu.au List-ID: Robin wrote: > What that would result in is a syscall for each > non-overlapping vma per node. My latest, most radical, proposal did not take an address range. It was simply: sys_page_migrate(pid, oldnode, newnode) It would be called once per node. In your example, this would be 128 calls. Nothing "for each non-overlapping vma". Just per node. Until I drove you to near distraction, and you spelled out the details of an example that migrated 96% of the address space in the first call, and only need 3 calls total, I would have presumed that the API: sys_page_migrate(pid, va_start, va_end, count, old_nodes, new_nodes) would have required one call per pid, or 256 calls, for your example. My method did not look insanely worse to me, indeed it would have looked better in this example with two tasks per node, since I did one call per node, and I thought you did one per task. ... However, I see now that you can routinely get by with dramatically fewer calls than the number of tasks, by noticing what portions of the typically huge shared address space have already been covered, and not covering them again. There is no need to convince me that 384 syscalls and 128 full scans is insanely worse than 3 syscalls with 1 full scan, and no need to get frustrated that I cannot see the insanity of it. However, you might have wanted to allow for the possibility, when you reduced what you thought I was proposing to insanity, that rather than my proposing something insane, perhaps we had different numbers ... as happened here. Your numbers for the array API had 80 times fewer system calls than I would have expected, and your numbers for the single parameter call had 3 times _more_ system calls than I had in mind (I had one call per node, period, not one per node per vma or whatever). > How much opposition is there to the array of integers? My opposition to the array was not profound. It needed to provide an advantage, which I didn't see it much did. I now see it provides an advantage, dramatically reducing the number of system calls and scans in typical cases, to substantially fewer than either the number of tasks or of nodes. Ok ... onward. I'll take the node arrays. The next concern that rises to the top for me was best expressed by Andi: > > The main reasons for that is that I don't think external > processes should mess with virtual addresses of another process. > It just feels unclean and has many drawbacks (parsing /proc/*/maps > needs complicated user code, racy, locking difficult). > > In kernel space handling full VMs is much easier and safer due to better > locking facilities. I share Andi's concerns, but I don't see what to do about this. Andi's recommendations seem to be about memory policies (which guide future allocations), and not about migration of already allocated physical pages. So for now at least, his recommendations don't seem like answers to me. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson 1.650.933.1373, 1.925.600.0401 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org