From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from westrelay01.boulder.ibm.com (westrelay01.boulder.ibm.com [9.17.195.10]) by e34.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j35Ivt5b119242 for ; Tue, 5 Apr 2005 14:57:57 -0400 Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by westrelay01.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j35IvtOq197762 for ; Tue, 5 Apr 2005 12:57:55 -0600 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j35IvtHe021971 for ; Tue, 5 Apr 2005 12:57:55 -0600 Subject: Re: [PATCH 1/6] CKRM: Basic changes to the core kernel From: Dave Hansen In-Reply-To: <20050405182240.GE32645@chandralinux.beaverton.ibm.com> References: <20050402031206.GB23284@chandralinux.beaverton.ibm.com> <1112622313.7189.50.camel@localhost> <20050405172519.GC32645@chandralinux.beaverton.ibm.com> <1112723661.19430.71.camel@localhost> <20050405182240.GE32645@chandralinux.beaverton.ibm.com> Content-Type: text/plain Date: Tue, 05 Apr 2005 11:57:50 -0700 Message-Id: <1112727471.19430.116.camel@localhost> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Chandra Seetharaman Cc: ckrm-tech@lists.sourceforge.net, linux-mm List-ID: On Tue, 2005-04-05 at 11:22 -0700, Chandra Seetharaman wrote: > On Tue, Apr 05, 2005 at 10:54:20AM -0700, Dave Hansen wrote: > > If you find a way to track things based on files, you could keep your > > class pointers in the struct address_space, or even in the vma, > > depending on what behavior you want. You could keep anonymous stuff in > > the anon_vma, just like the objrmap code. > > This is the first version of memory controller... Handling shared pages > appropriately are in the plans. Perhaps it's a better idea to wait until you have this more mature version before submitting it. It would be a shame to put all of this per-page stuff in, only to rip it out. Doing it that way isn't very incremental, but I don't think they'd share too much code anyway. > > ... if the class is behaving itself. Somebody trying to take down a > > machine, or a single badly-behaved or runaway app might not behave like > > that. > > There are checks in that code to make sure that a runaway app doesn't > get the kernel into this code path often and bring down the system... > instead the runaway app(its class) is penalised. Penalized how? Reducing the task's scheduler slices? Can you point to the code? > > > Also, the loop is just to wakeup kswapd once.. > > > may be I can get rid of that and use pgdat_list directly. > > > > I'd try to be a little more selective than a big for loop like that. > > 'big' for loop ? in that code path ? > ckrm_class_limit_ok(struct ckrm_mem_res *cls) > { ... > + for (i = 0; i < MAX_NR_ZONES; i++) > + pg_total += cls->pg_total[i]; Sorry, I was confusing this with something equivalent to for_each_node(). That brings another question, though. How does this interact with NUMA? The classes don't appear to track any per-node information. > + if (cls->pg_limit == CKRM_SHARE_DONTCARE) { > + struct ckrm_mem_res *parcls = ckrm_get_res_class(cls->parent, > + mem_rcbs.resid, struct ckrm_mem_res); > + ret = (parcls ? ckrm_class_limit_ok(parcls) : 0); > + } else > + ret = (pg_total <= cls->pg_limit); > + > + return ret; That looks suspiciously like recursion. How is the recursion limited? > > > > SGI's machines? What about an 8-node x44[05]? Why can't you call it > > > > from interrupts? > > > > > > I just wanted to avoid limit related failures in interrupt context, as it > > > might lead to wierd problems. > > > > You mean you didn't want to make your code robust enough to handle it? > > Is there something fundamental keeping you from checking limits when in > > an interrupt? > > It is not the 'checking limit' part that I meant in my reply. It is the > failure due to over limit(that the class is over its limit). > > This is my thinking: if a class is not configured properly, and is over > its limit in interrupt context, we are going to fail the memory alloc, > which 'could' lead to unwanted results in the system depending on how the > interrupt handler treats the alloc failure ;)... No. Interrupt handlers must use GFP_ATOMIC when allocating. This allocations are likely to fail, and the writers of the handlers know it. Interrupts handlers must be equipped to deal with these, or it's a bug in the interrupt handler. Is there any other reason to have the !in_interrupt() part? -- Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org