linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ray Bryant <raybry@sgi.com>
To: Andi Kleen <ak@suse.de>
Cc: Ray Bryant <raybry@austin.rr.com>,
	William Lee Irwin III <wli@holomorphy.com>,
	Andrew Morton <akpm@osdl.org>, linux-mm <linux-mm@kvack.org>,
	Jesse Barnes <jbarnes@sgi.com>, Dan Higgins <djh@sgi.com>,
	lse-tech <lse-tech@lists.sourceforge.net>,
	Brent Casavant <bcasavan@sgi.com>,
	"Martin J. Bligh" <mbligh@aracnet.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Nick Piggin <piggin@cyberone.com.au>, Paul Jackson <pj@sgi.com>,
	Dave Hansen <haveblue@us.ibm.com>
Subject: Re: [PATCH 2/2] mm: eliminate node 0 bias in MPOL_INTERLEAVE
Date: Fri, 24 Sep 2004 01:33:38 -0500	[thread overview]
Message-ID: <4153BFC2.9000500@sgi.com> (raw)
In-Reply-To: <20040923092954.GA4836@wotan.suse.de>

Andi Kleen wrote:
> On Wed, Sep 22, 2004 at 11:32:45PM -0500, Ray Bryant wrote:
> 
>>Each of these cases potentially breaks the (assumed) invariant of
> 
> 
> I would prefer to keep the invariant.
> 

I understand, but read on...

> 
>>+++ linux-2.6.9-rc2-mm1/mm/mempolicy.c	2004-09-21 17:44:58.000000000 -0700
>>@@ -435,7 +435,7 @@ asmlinkage long sys_set_mempolicy(int re
>> 		default_policy[policy] = new;
>> 	}
>> 	if (new && new->policy == MPOL_INTERLEAVE)
>>-		current->il_next = find_first_bit(new->v.nodes, MAX_NUMNODES);
>>+		current->il_next = current->pid % MAX_NUMNODES;
> 
> 
> Please do the find_next/find_first bit here in the slow path. 
> 
> Another useful change may be to check if il_next points to a node
> that is in the current interleaving mask. If yes don't change it.
> This way skew when interleaving policy is set often could be avoided.
> 
> 
>> 	return 0;
>> }
>> 
>>@@ -714,6 +714,11 @@ static unsigned interleave_nodes(struct 
>> 
>> 	nid = me->il_next;
>> 	BUG_ON(nid >= MAX_NUMNODES);
>>+	if (!test_bit(nid, policy->v.nodes)) {
>>+		nid = find_next_bit(policy->v.nodes, MAX_NUMNODES, 1+nid);
>>+		if (nid >= MAX_NUMNODES)
>>+			nid = find_first_bit(policy->v.nodes, MAX_NUMNODES);
>>+	}
> 
> 
> And remove it here.
>

Regardless of whether we remove this or not, then we have a potential problem, 
I think.  The reason is that there is a single il_next for all policies.  So 
we get into trouble if the current process's page allocation policy and
its page cache allocation policy are MPOL_INTERLEAVE, but the node masks for
the two policies are significantly different. Just to be specific, suppose 
there are 64 nodes, and the page allocation policy selects nodes 0-53 and
the page cache allocation policy chooses nodes 54-63.  Further suppose that
allocation requests are page, page cache, page, page cache, etc....

Then if il_next starts out at zero, here are the nodes that will be selected:
(I'm assuming here that the code I inserted above is not present.)

request a page, get 0 and using the page allocation mask, next is set to 1

request page cache, get 1 and using the page cache allocation mask, next is 
set to 54

request a page, get 54 and using the page allocation mask, next is set to 0

request page cache, get 0  and using the page cache allocation mask, next is 
set to 54

request a page, get 54 and using the page allocation mask, next is set to 0
etc...

This is not good.  Generally speaking, all of the pages are allocated from the 
1st page cache node and all of the page cache pages are allocated from the 1st 
page allocation node.

I guess I am back to passing an offset etc in via page cache alloc.  Or we
have to have a second il_next for the page cache policy, and that is more
cruft than we are willing to live with, I expect.

I'll look at Steve's patch and see how he handles this.

> 
>> 	next = find_next_bit(policy->v.nodes, MAX_NUMNODES, 1+nid);
>> 	if (next >= MAX_NUMNODES)
>> 		next = find_first_bit(policy->v.nodes, MAX_NUMNODES);
>>Index: linux-2.6.9-rc2-mm1/kernel/fork.c
>>===================================================================
>>--- linux-2.6.9-rc2-mm1.orig/kernel/fork.c	2004-09-21 16:24:49.000000000 -0700
>>+++ linux-2.6.9-rc2-mm1/kernel/fork.c	2004-09-21 17:41:12.000000000 -0700
>>@@ -873,6 +873,8 @@ static task_t *copy_process(unsigned lon
>> 			goto bad_fork_cleanup;
>> 		}
>> 	}
>>+	/* randomize placement of first page across nodes */
>>+	p->il_next = p->pid % MAX_NUMNODES;
> 
> 
> Same here.
> 
> -Andi
> 


-- 
Best Regards,
Ray
-----------------------------------------------
                   Ray Bryant
512-453-9679 (work)         512-507-7807 (cell)
raybry@sgi.com             raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
            so I installed Linux.
-----------------------------------------------

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

  reply	other threads:[~2004-09-24  6:33 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-09-23  4:32 [PATCH 0/2] mm: memory policy for page cache allocation Ray Bryant
2004-09-23  4:32 ` [PATCH 1/2] mm: page cache mempolicy " Ray Bryant
2004-09-23  9:24   ` Andi Kleen
2004-09-24  4:12     ` Ray Bryant
2004-09-23  4:32 ` [PATCH 2/2] mm: eliminate node 0 bias in MPOL_INTERLEAVE Ray Bryant
2004-09-23  9:29   ` Andi Kleen
2004-09-24  6:33     ` Ray Bryant [this message]
2004-09-24  6:43     ` Ray Bryant
2004-09-23  9:09 ` [PATCH 0/2] mm: memory policy for page cache allocation Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4153BFC2.9000500@sgi.com \
    --to=raybry@sgi.com \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=bcasavan@sgi.com \
    --cc=djh@sgi.com \
    --cc=haveblue@us.ibm.com \
    --cc=jbarnes@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lse-tech@lists.sourceforge.net \
    --cc=mbligh@aracnet.com \
    --cc=piggin@cyberone.com.au \
    --cc=pj@sgi.com \
    --cc=raybry@austin.rr.com \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox