From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from spaceape13.eur.corp.google.com (spaceape13.eur.corp.google.com [172.28.16.147]) by smtp-out.google.com with ESMTP id kATLvmpM006946 for ; Wed, 29 Nov 2006 21:57:48 GMT Received: from ug-out-1314.google.com (ugck40.prod.google.com [10.66.112.40]) by spaceape13.eur.corp.google.com with ESMTP id kATLvYEZ029672 for ; Wed, 29 Nov 2006 21:57:38 GMT Received: by ug-out-1314.google.com with SMTP id k40so1955194ugc for ; Wed, 29 Nov 2006 13:57:38 -0800 (PST) Message-ID: <6599ad830611291357w34f9427bje775dfefcd000dfa@mail.gmail.com> Date: Wed, 29 Nov 2006 13:57:37 -0800 From: "Paul Menage" Subject: Re: [RFC][PATCH 1/1] Expose per-node reclaim and migration to userspace In-Reply-To: <456D23A0.9020008@yahoo.com.au> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20061129030655.941148000@menage.corp.google.com> <20061129033826.268090000@menage.corp.google.com> <456D23A0.9020008@yahoo.com.au> Sender: owner-linux-mm@kvack.org Return-Path: To: Nick Piggin Cc: linux-mm@kvack.org, akpm@osdl.org List-ID: On 11/28/06, Nick Piggin wrote: > menage@google.com wrote: > > Currently the page migration APIs allow you to migrate pages from > > particular processes, but don't provide a clean and efficient way to > > migrate and/or reclaim memory from individual nodes. > > The mechanism for that should probably go in mm/migrate.c, shouldn't > it? Quite possibly - I don't have a strong feeling for exactly where the code should go. There's existing code (sys_migrate_pages) that uses the migration mechanism that's in mm/mempolicy.c rather than migrate.c, and this was a pretty simple function to write. > > Also, why don't you scan the lru lists of the zones in the node, which > will a) be much more efficient if there are lots of non LRU pages, and > b) allow you to batch the lru lock. I'll take a look at that. > > > > - a way to trigger try_to_free_pages() for a given node with a given > > minimum priority, vy writing an integer to > > /sys/device/system/node/node/try_to_free_pages > > ... especially not to userspace. Why does this have to be exposed to > userspace at all? We don't need to expose the raw "priority" value, but it would be really nice for user space to be able to specify how hard the kernel should try to free some memory. Then each job can specify a "reclaim pressure", i.e. how much back-pressure should be applied to its allocated memory, so you can get a good idea of how much memory the job is really using for a given level of performance. High reclaim pressure results in a smaller working set but possibly more paging in from disk; low reclaim pressure uses more memory but gets higher performance. > Can you not wire it up to your resource isolation > implementation in the kernel? This *is* the resource isolation implementation (plus the existing cpusets and fake-numa code). The intention is to expose just enough knobs/hooks to userspace that it can be handled there. > > ... yeah it would obviously be much nicer to do it in kernel space, > behind your higher level APIs. I don't think it would - keeping as much of the code as possible in userspace makes development and deployment much faster. We don't really have any higher-level APIs at this point - just userspace middleware manipulating cpusets. Paul -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org