linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: mel@skynet.ie (Mel Gorman)
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: nicolas.mailhot@laposte.net, clameter@sgi.com, apw@shadowen.org,
	akpm@linux-foundation.org, linux-mm@kvack.org
Subject: Re: [PATCH 2/2] Only check absolute watermarks for ALLOC_HIGH and ALLOC_HARDER allocations
Date: Wed, 16 May 2007 15:00:39 +0100	[thread overview]
Message-ID: <20070516140038.GA10225@skynet.ie> (raw)
In-Reply-To: <464B089C.9070805@yahoo.com.au>

On (16/05/07 23:35), Nick Piggin didst pronounce:
> Mel Gorman wrote:
> >On (16/05/07 22:14), Nick Piggin didst pronounce:
> >
> >>Mel Gorman wrote:
> >>
> >>>zone_watermark_ok() checks if there are enough free pages including a 
> >>>reserve.
> >>>High-order allocations additionally check if there are enough free 
> >>>high-order
> >>>pages in relation to the watermark adjusted based on the requested size. 
> >>>If
> >>>there are not enough free high-order pages available, 0 is returned so 
> >>>that
> >>>the caller enters direct reclaim.
> >>>
> >>>ALLOC_HIGH and ALLOC_HARDER allocations are allowed to dip further into
> >>>the reserves but also take into account if the number of free high-order
> >>>pages meet the adjusted watermarks. As these allocations cannot sleep,
> >>
> >>Why can't ALLOC_HIGH or ALLOC_HARDER sleep? This patch seems wrong to
> >>me.
> >>
> >
> >
> >In page_alloc.c
> >
> >        if ((unlikely(rt_task(p)) && !in_interrupt()) || !wait)
> >                alloc_flags |= ALLOC_HARDER;
> >
> >See the !wait part.
> 
> And the || part.
> 

I doubt a rt_task is thrilled to be entering direct reclaim.

> 
> >The ALLOC_HIGH applies to __GFP_HIGH allocations which are allowed to
> >dip into emergency pools and go below the reserve.
> 
> And some of them can sleep too.
> 

If you feel very strongly about it, I can back out the ALLOC_HIGH part for
__GFP_HIGH allocations but it looks like at a glance that users of __GFP_HIGH
are not too keen on sleeping;

drivers/block/rd.c;
	Comment
	Deep badness.  rd_blkdev_pagecache_IO() needs to allocate
	pagecache pages within a request_fn.  We cannot recur back
	into the filesytem which is mounted atop the ramdisk

fs/ext4/writeback.c;
	Using __GFP_HIGH when allocating bios

kernel/power/swap.c;
	Using __GFP_HIGH when allocating bios

The change is still obeying watermarks, just at order-0 instead of
strictly observing the higher orders.

> 
> >>>they cannot enter direct reclaim so the allocation can fail even though
> >>>the pages are available and the number of free pages is well above the
> >>>watermark for order-0.
> >>>
> >>>This patch alters the behaviour of zone_watermark_ok() slightly. 
> >>>Watermarks
> >>>are still obeyed but when an allocator is flagged ALLOC_HIGH or 
> >>>ALLOC_HARDER,
> >>>we only check that there is sufficient memory over the reserve to satisfy
> >>>the allocation, allocation size is ignored.  This patch also documents
> >>>better what zone_watermark_ok() is doing.
> >>
> >>This is wrong because now you lose the buffering of higher order pages
> >>for more urgent allocation classes against less urgent ones.
> >>
> >
> >
> >ALLOC_HARDER is an urgent allocation class.
> 
> And HIGH is even more, and MEMALLOC even more again.
> 

HIGH => ALLOC_HIGH => obey watermarks at order-0

Somewhat counter-intuitively, with the current code if the allocation is
a really high priority but can sleep, it can actually allocate without any
watermarks at all

> 
> >>Think of how the order-0 allocation buffering works with the watermarks
> >>and consider that we're trying to do the same exact thing for higher order
> >>allocations here.
> >>
> >
> >
> >What actually happens is that high-order allocations fail even though
> >the watermarks are met because they cannot enter direct reclaim.
> 
> Yeah, they fail leaving some spare for more urgent allocations. Like
> how the order-0 allocations work.

order-0 watermarks are still in place. After the patch, it is still not
possible for the allocations to break the watermarks there.

> They should also kick kswapd to start freeing pages _before_ they start
> failing too.
> 

Should prehaps, but from what I read kswapd is only kicked into action
when the first allocation attempt has already failed.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-05-16 14:00 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-14 17:32 [PATCH 0/2] Two patches to address bug report in relation to high-order atomic allocations Mel Gorman
2007-05-14 17:32 ` [PATCH 1/2] Have kswapd keep a minimum order free other than order-0 Mel Gorman
2007-05-14 18:01   ` Christoph Lameter
2007-05-14 18:13     ` Christoph Lameter
2007-05-14 18:24       ` Mel Gorman
2007-05-14 18:52         ` Christoph Lameter
2007-05-15  8:42         ` Nicolas Mailhot
2007-05-15  9:16           ` Mel Gorman
2007-05-16  8:25             ` Nick Piggin
2007-05-16  9:03               ` Mel Gorman
2007-05-16  9:10                 ` Nick Piggin
2007-05-16  9:45                   ` Mel Gorman
2007-05-16 12:28                     ` Nick Piggin
2007-05-16 13:50                       ` Mel Gorman
2007-05-16 14:04                         ` Nick Piggin
2007-05-16 15:32                           ` Mel Gorman
2007-05-16 15:44                             ` Nick Piggin
2007-05-16 16:46                               ` Mel Gorman
2007-05-17  7:09                                 ` Nick Piggin
2007-05-17 12:22                                   ` Andy Whitcroft
2007-05-18  2:25                                     ` Nick Piggin
2007-05-16 15:46                             ` Nick Piggin
2007-05-16 14:20                         ` Nick Piggin
2007-05-16 15:06                           ` Nicolas Mailhot
2007-05-16 15:33                             ` Mel Gorman
2007-05-15 17:09           ` Christoph Lameter
2007-05-15  4:39       ` Christoph Lameter
2007-05-14 18:19     ` Mel Gorman
2007-05-14 17:32 ` [PATCH 2/2] Only check absolute watermarks for ALLOC_HIGH and ALLOC_HARDER allocations Mel Gorman
2007-05-16 12:14   ` Nick Piggin
2007-05-16 13:24     ` Mel Gorman
2007-05-16 13:35       ` Nick Piggin
2007-05-16 14:00         ` Mel Gorman [this message]
2007-05-16 14:11           ` Nick Piggin
2007-05-16 18:28             ` Andy Whitcroft
2007-05-16 18:48               ` Mel Gorman
2007-05-16 19:00                 ` Christoph Lameter
2007-05-17  7:34               ` Nick Piggin
2007-05-14 18:13 ` [PATCH 0/2] Two patches to address bug report in relation to high-order atomic allocations Nicolas Mailhot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070516140038.GA10225@skynet.ie \
    --to=mel@skynet.ie \
    --cc=akpm@linux-foundation.org \
    --cc=apw@shadowen.org \
    --cc=clameter@sgi.com \
    --cc=linux-mm@kvack.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=nicolas.mailhot@laposte.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox