From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <3DAB5DF2.5000002@us.ibm.com> Date: Mon, 14 Oct 2002 17:14:42 -0700 From: Matthew Dobson Reply-To: colpatch@us.ibm.com MIME-Version: 1.0 Subject: Re: [rfc][patch] Memory Binding API v0.3 2.5.41 References: <3DA4D3E4.6080401@us.ibm.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: "Eric W. Biederman" Cc: linux-kernel , linux-mm@kvack.org, LSE , Andrew Morton , Martin Bligh , Michael Hohnbaum List-ID: Eric W. Biederman wrote: > Matthew Dobson writes: >>Greetings & Salutations, >> Here's a wonderful patch that I know you're all dying for... Memory >>Binding! It works just like CPU Affinity (binding) except that it binds a >>processes memory allocations (just buddy allocator for now) to specific memory >>blocks. > Due we want this per numa area or simply per zone? My suspicion is that > internally at least we want this per zone. I think that per memory block is better. We already have a method for allocating from specific zones (GFP_* flags). Also, using per zone binding would involve setting up some way of enumerating the zones, which would not be immediately obvious to the users of the API. The memory block already has a straight-forward definition and an easy way for users to get the appropriate number for the appropriate block (in-kernel topology). I'm not fanatically opposed to per zone binding, though, and if there is a general agreement that it would be better that way, I don't think it would be unreasonably difficult to change it. > The API doesn't make much sense at the moment. Hmm.. That is unfortunate, I'd aimed to make it as simple as possible. > 1) You are operating on tasks and not mm's, or preferably vmas. Correct. There are plans (somewhere inside my cranium) to allow binding at that granularity. For now, per task seemed an appropriate level. > 2) sys_mem_setbinding does not move the mm to the new binding. Also correct. A task may wish to allocate several large data structures from one memory area, rebind, do more allocations, rebind, ad nauseum. There are plans to have a flag that, if set, would force relocation of all currently allocated memory. > 3) You specify a pid and then change current task instead of > the specified one. Yep... That was definitely a typo... fixed. > 4) An ordered zone list is probably the more natural mapping. See my comments above about per zone/memblk. And you reemphasize my point, how do we order the zone lists in such a way that a user of the API can easily know/find out what zone #5 is? > 5) mprotect is the more natural model rather than set_cpu_affinity. Well, I think that may be true for the API you are imagining (per zone, per mm/vma, etc), not the one that I've written. > 6) The code belongs in mm/* not kernel/* Possibly... I just stuck it in with the vast majority of other syscalls in kernel/sys.c. As those changes are just code additions, they can easily be moved if it is deemed appropriate. Cheers! -Matt -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/