Re: [PATCH 1/9] add defrag flags

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mel Gorman <mel@csn.ul.ie>
To: Paul Jackson <pj@sgi.com>
Cc: Dave Hansen <haveblue@us.ibm.com>,
	mrmacman_g4@mac.com, jschopp@austin.ibm.com, akpm@osdl.org,
	lhms-devel@lists.sourceforge.net, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, kravetz@us.ibm.com
Subject: Re: [PATCH 1/9] add defrag flags
Date: Tue, 27 Sep 2005 14:34:40 +0100 (IST)	[thread overview]
Message-ID: <Pine.LNX.4.58.0509271415460.12421@skynet> (raw)
In-Reply-To: <20050926224439.056eaf8d.pj@sgi.com>

On Mon, 26 Sep 2005, Paul Jackson wrote:

> Dave wrote:
> > I think Joel simply made an error in his description.
>
> Looks like he made the same mistake in the actual code comments:
>
> +/* Allocation type modifiers, group together if possible
> + * __GPF_USER: Allocation for user page or a buffer page
> + * __GFP_KERNRCLM: Short-lived or reclaimable kernel allocation
> + */
> +#define __GFP_USER	0x40000u /* Kernel page that is easily reclaimable */
> +#define __GFP_KERNRCLM	0x80000u /* User is a userspace user */
>
> I'd guess you meant to write more like the following:
>
> #define __GFP_USER   0x40000u /* Page for user address space */
> #define __GFP_KERNRCLM 0x80000u /* Kernel page that is easily reclaimable */
>

yep

> And the block comment seems to needlessly repeat the inline comments,
> add a dubious claim, and omit the interesting stuff ...  In other words:
>
>     Does it actually matter if these two bits are grouped, or not?  I
>     suspect that some of your other code, such as shifting the gfpmask by
>     RCLM_SHIFT bits, _requires_ that these two bits be adjacent.  So the
>     "if possible" in the comment above is misleading.
>

The "if possible" must be misleading. The bits have to beside each other
as assumptions are made later in the code about this. The "group together"
comment refers to the patches that are allocated with gfp flags that
include __GFP_USER or __GFP_KERNNORCLM. Those pages should be "grouped
together if possible". The bits must be grouped that way.

>     And I suspect that gfp.h should contain the RCLM_SHIFT define, or
>     at least mention in comment that RCLM_SHIFT depends on the position
>     of the above two __GFP_* bits.
>
>     And I don't see any mention in the comments in gfp.h that these
>     two bits, in tandem, have an additional meaning - both bits off
>     means, I guess, not reclaimable, well at least not easily.
>
> My HARDWALL patch appears to already be in Linus's kernel, so you
> probably also need to do a global substitute of all instances in
> the kernel of __GFP_HARDWALL, replacing it with __GFP_USER.

I am not sure if that is a good idea as I will explain later.

> Here
> is the list of files I see affected, with a count of the number of
> __GFP_HARDWALL strings in each:
>
>     include/linux/gfp.h:4
>     kernel/cpuset.c:6
>     mm/page_alloc.c:2
>     mm/vmscan.c:4
>
> The comment in the next line looks like it needs to be changed to match
> the code change:
>
> +#define __GFP_BITS_SHIFT 21	/* Room for 20 __GFP_FOO bits */
>
> On the other hand, why did you change __GFP_BITS_SHIFT?  Isn't 20
> enough - just enough?
>

Yep, you're right, it is just enough.

> Why was the flag change in fs/buffer.c:grow_dev_page() to add the
> __GFP_USER bit, not to add the __GFP_KERNRCLM bit?

Because these are buffer pages that get reclaimed very quickly. The
KERNRCLM pages are generally slab pages. These can be reclaimed by reaping
certain slab patches but it's a very hit and miss behavior. Trust me, the
whole scheme works better if buffer pages are treated as __GFP_USER pages,
not __GFP_KERNRCLM.

> Aha - I just read one of the comments above that I cut+pasted.
> It says that __GFP_USER means user *OR* buffer page.  That certainly
> explains the fs/buffer.c code using __GFP_USER.  But it causes me to
> wonder if we can equate __GFP_USER with __GFP_HARDWALL.

I don't think it should be.

> I'm reluctant,
> but more on principal than concrete experience, to modify the meaning
> of hardwall cpusets to constrain both user address space pages *AND*
> buffer pages.  How open would you be to making buffers __GFP_KERNRCLM
> instead of __GFP_USER?
>

Not very open at all. I would prefer to have an additional flag than do
that. The anti-fragmentation does not work anywhere near as well when
buffer pages are KERNRCLM pages. It's because there are large number of
pages that are easily reclaimable by cleaning the buffers and discarding
them. If they were mixed with slab pages, it would not be very effective
when we try to make a large allocation.

> If you have good reason to keep __GFP_USER meanin either user or buffer,
> then perhaps the name __GFP_USER is misleading.
>

Possibly but we are stuck for terminology here. It's hard to think of a
good term that reflects the intention.

> What sort of performance claims can you make for this change?

I don't have figures for this patchset. The figures I do have are for
another version that I'm currently trying to merge with Joels. In my own
set, there are no performance regressions or gains.

> How does
> it impact kernel text size?

Again, based on my own patchset but the figures should be essentially the
same as Joel's;

linux-2.6.13-clean/vmlinux
   text    data     bss     dec     hex filename
2992829  686212  212708 3891749  3b6225 linux-2.6.13-clean/vmlinux

linux-2.6.13-mbuddy-v14/vmlinux
   text    data     bss     dec     hex filename
2995335  687852  212708 3895895  3b7257 linux-2.6.13-mbuddy-v14/vmlinux

Is that what you are looking for?

> Could we see a diffstat for the entire
> patchset?

Don't have this at the moment

> Under what sort of loads or conditions would you expect
> this patchset to do more harm than good?
>

I cannot think of a case where it does more harm. At worst, it does not
help fragmentation. For that to happen, the system needs to be very
heavily loaded under heavy memory pressure for a long time with
RCLM_NORCLM pages been retained for very long periods of time even after
loads ease. In this case, fallbacks will eventually fragment memory.

A second case where it could hurt is in allocator scalability over a large
number of CPUs as there are now additional per-cpu lists. I am having
trouble thinking of a test case that would trigger this case though.
Someone used to dealing with large numbers of processors might be able to
make a suggestion.

-- 
Mel Gorman
Part-time Phd Student                          Java Applications Developer
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2005-09-27 13:34 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-09-26 20:01 [PATCH 0/9] fragmentation avoidance Joel Schopp
2005-09-26 20:03 ` [PATCH 1/9] add defrag flags Joel Schopp
2005-09-27  0:16   ` Kyle Moffett
2005-09-27  0:24     ` Dave Hansen
2005-09-27  0:43       ` Kyle Moffett
2005-09-27  5:44       ` Paul Jackson
2005-09-27 13:34         ` Mel Gorman [this message]
2005-09-27 16:26           ` [Lhms-devel] " Paul Jackson
2005-09-27 18:38         ` Joel Schopp
2005-09-27 19:30           ` Paul Jackson
2005-09-27 21:00             ` [Lhms-devel] " Joel Schopp
2005-09-27 21:23               ` Paul Jackson
2005-09-27 22:03                 ` Joel Schopp
2005-09-27 22:45                   ` Paul Jackson
2005-09-26 20:05 ` [PATCH 2/9] declare defrag structs Joel Schopp
2005-09-26 20:06 ` [PATCH 3/9] initialize defrag Joel Schopp
2005-09-26 20:09 ` [PATCH 4/9] defrag helper functions Joel Schopp
2005-09-26 22:29   ` Alex Bligh - linux-kernel
2005-09-27 16:08     ` Joel Schopp
2005-09-26 20:11 ` [PATCH 5/9] propagate defrag alloc types Joel Schopp
2005-09-26 20:13 ` [PATCH 6/9] fragmentation avoidance core Joel Schopp
2005-09-26 20:14 ` [PATCH 7/9] try harder on large allocations Joel Schopp
2005-09-27  7:21   ` Coywolf Qi Hunt
2005-09-27 16:17     ` Joel Schopp
2005-09-26 20:16 ` [PATCH 8/9] defrag fallback Joel Schopp
2005-09-26 20:17 ` [PATCH 9/9] free memory is user reclaimable Joel Schopp
2005-09-26 20:19 ` [PATCH 10/9] percpu splitout Joel Schopp
2005-09-26 21:49 ` [Lhms-devel] [PATCH 0/9] fragmentation avoidance Joel Schopp

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.58.0509271415460.12421@skynet \
    --to=mel@csn.ul.ie \
    --cc=akpm@osdl.org \
    --cc=haveblue@us.ibm.com \
    --cc=jschopp@austin.ibm.com \
    --cc=kravetz@us.ibm.com \
    --cc=lhms-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mrmacman_g4@mac.com \
    --cc=pj@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox