linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Mel Gorman <mgorman@suse.com>, Linux-MM <linux-mm@kvack.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Rik van Riel <riel@redhat.com>, Pintu Kumar <pintu.k@samsung.com>,
	Xishi Qiu <qiuxishi@huawei.com>, Gioh Kim <gioh.kim@lge.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 09/10] mm, page_alloc: Reserve pageblocks for high-order atomic allocations on demand
Date: Fri, 31 Jul 2015 10:28:52 +0200	[thread overview]
Message-ID: <55BB31C4.3000107@suse.cz> (raw)
In-Reply-To: <20150729125334.GC19352@techsingularity.net>

On 07/29/2015 02:53 PM, Mel Gorman wrote:
> On Wed, Jul 29, 2015 at 01:35:17PM +0200, Vlastimil Babka wrote:
>> On 07/20/2015 10:00 AM, Mel Gorman wrote:
>>> +/*
>>> + * Used when an allocation is about to fail under memory pressure. This
>>> + * potentially hurts the reliability of high-order allocations when under
>>> + * intense memory pressure but failed atomic allocations should be easier
>>> + * to recover from than an OOM.
>>> + */
>>> +static void unreserve_highatomic_pageblock(const struct alloc_context *ac)
>>> +{
>>> +	struct zonelist *zonelist = ac->zonelist;
>>> +	unsigned long flags;
>>> +	struct zoneref *z;
>>> +	struct zone *zone;
>>> +	struct page *page;
>>> +	int order;
>>> +
>>> +	for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->high_zoneidx,
>>> +								ac->nodemask) {
>>
>> This fixed order might bias some zones over others wrt unreserving. Is it OK?
>
> I could not think of a situation where it mattered. It'll always be
> preferring highest zone over lower zones. Allocation requests that can
> use any zone that do not care. Allocation requests that are limited to
> lower zones are protected as long as possible.

Hmm... allocation requests will follow fair zone policy and thus the 
highatomic reservations will be spread fairly among all zones? Unless 
the allocations require lower zones of course.

But for unreservations, normal/high allocations failing under memory 
pressure will lead to unreserving highatomic pageblocks first in the 
higher zones and only then the lower zones, and that was my concern. But 
it's true that failing allocations that require lower zones will lead to 
unreserving the lower zones, so it might be ok in the end.

>
>>
>>> +		/* Preserve at least one pageblock */
>>> +		if (zone->nr_reserved_highatomic <= pageblock_nr_pages)
>>> +			continue;
>>> +
>>> +		spin_lock_irqsave(&zone->lock, flags);
>>> +		for (order = 0; order < MAX_ORDER; order++) {
>>
>> Would it make more sense to look in descending order for a higher chance of
>> unreserving a pageblock that's mostly free? Like the traditional page stealing does?
>>
>
> I don't think it's worth the search cost. Traditional page stealing is
> searching because it's trying to minimise events that cause external
> fragmentation. Here we'd gain very little. We are under some memory
> pressure here, if enough pages are not free then another one will get
> freed shortly. Either way, I doubt the difference is measurable.

Hmm, I guess...

>
>>> +			struct free_area *area = &(zone->free_area[order]);
>>> +
>>> +			if (list_empty(&area->free_list[MIGRATE_HIGHATOMIC]))
>>> +				continue;
>>> +
>>> +			page = list_entry(area->free_list[MIGRATE_HIGHATOMIC].next,
>>> +						struct page, lru);
>>> +
>>> +			zone->nr_reserved_highatomic -= pageblock_nr_pages;
>>> +			set_pageblock_migratetype(page, ac->migratetype);
>>
>> Would it make more sense to assume MIGRATE_UNMOVABLE, as high-order allocations
>> present in the pageblock typically would be, and apply the traditional page
>> stealing heuristics to decide if it should be changed to ac->migratetype (if
>> that differs)?
>>
>
> Superb spot, I had to think about this for a while and initially I was
> thinking your suggestion was a no-brainer and obviously the right thing
> to do.
>
> On the pro side, it preserves the fragmentation logic because it'll force
> the normal page stealing logic to be applied.
>
> On the con side, we may reassign the pageblock twice -- once to
> MIGRATE_UNMOVABLE and once to ac->migratetype. That one does not matter
> but the second con is that we inadvertly increase the number of unmovable
> blocks in some cases.
>
> Lets say we default to MIGRATE_UNMOVABLE, ac->migratetype is MIGRATE_MOVABLE
> and there are enough free pages to satisfy the allocation but not steal
> the whole pageblock. The end result is that we have a new unmovable
> pageblock that may not be necessary. The next unmovable allocation
> potentially is forever. They key observation is that previously the
> pageblock could have been short-lived high-order allocations that could
> be completely free soon if it was assigned MIGRATE_MOVABLE. This may not
> apply when SLUB is using high-order allocations but the point still
> holds.

Yeah, I see the point. The obvious counterexample is a pageblock that we 
design as MOVABLE and yet it contains some long-lived unmovable 
allocation. More unmovable allocations could lead to choosing another 
movable block as a fallback, while if we marked this pageblock as 
unmovable, they could go here and not increase fragmentation.

The problem is, we can't known which one is the case. I've toyed with an 
idea of MIGRATE_MIXED blocks that would be for cases where the 
heuristics decide that e.g. an UNMOVABLE block is empty enough to change 
it to MOVABLE, but still it may contain some unmovable allocations. Such 
pageblocks should be preferred fallbacks for future unmovable 
allocations before truly pristine movable pageblocks.

The scenario here could be another where MIGRATE_MIXED would make sense.

> Grouping pages by mobility really needs to strive to keep the number of
> unmovable blocks as low as possible.

More precisely, the number of blocks with unmovable allocations in them. 
Which is much harder objective.

> If ac->migratetype is
> MIGRATE_UNMOVABLE then we lose nothing. If it's any other type then the
> current code keeps the number of unmovable blocks as low as possible.
>
> On that basis I think the current code is fine but it needs a comment to
> record why it's like this.
>
>>> @@ -2175,15 +2257,23 @@ static bool __zone_watermark_ok(struct zone *z, unsigned int order,
>>>   			unsigned long mark, int classzone_idx, int alloc_flags,
>>>   			long free_pages)
>>>   {
>>> -	/* free_pages may go negative - that's OK */
>>>   	long min = mark;
>>>   	int o;
>>>   	long free_cma = 0;
>>>
>>> +	/* free_pages may go negative - that's OK */
>>>   	free_pages -= (1 << order) - 1;
>>> +
>>>   	if (alloc_flags & ALLOC_HIGH)
>>>   		min -= min / 2;
>>> -	if (alloc_flags & ALLOC_HARDER)
>>> +
>>> +	/*
>>> +	 * If the caller is not atomic then discount the reserves. This will
>>> +	 * over-estimate how the atomic reserve but it avoids a search
>>> +	 */
>>> +	if (likely(!(alloc_flags & ALLOC_HARDER)))
>>> +		free_pages -= z->nr_reserved_highatomic;
>>
>> Hm, so in the case the maximum of 10% reserved blocks is already full, we deny
>> the allocation access to another 10% of the memory and push it to reclaim. This
>> seems rather excessive.
>
> It's necessary. If normal callers can use it then the reserve fills with
> normal pages, the memory gets fragmented and high-order atomic allocations
> fail due to fragmentation. Similarly, the number of MIGRATE_HIGHORDER
> pageblocks cannot be unbound or everything else will be continually pushed
> into reclaim even if there is plenty of memory free.

I understand denying normal allocations access to highatomic reserves 
via watermarks is necessary. But my concern is that for each reserved 
pageblock we effectively deny up to two pageblocks-worth-of-pages to 
normal allocations. One pageblock that is marked as MIGRATE_HIGHATOMIC, 
and once it becomes full, free_pages above are decreased twice - once by 
the pageblock becoming full, and then again by subtracting 
z->nr_reserved_highatomic. This extra gap is still usable by highatomic 
allocations, but they will also potentially mark more pageblocks 
highatomic and further increase the gap. In the worst case we have 10% 
of pageblocks marked highatomic and full, and another 10% that is only 
usable by highatomic allocations (but won't be marked as such), and if 
no more highatomic allocations come then the 10% is wasted.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-07-31  8:28 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-20  8:00 [RFC PATCH 00/10] Remove zonelist cache and high-order watermark checking Mel Gorman
2015-07-20  8:00 ` [PATCH 01/10] mm, page_alloc: Delete the zonelist_cache Mel Gorman
2015-07-21 23:47   ` David Rientjes
2015-07-23 10:58     ` Mel Gorman
2015-07-20  8:00 ` [PATCH 02/10] mm, page_alloc: Remove unnecessary parameter from zone_watermark_ok_safe Mel Gorman
2015-07-21 23:49   ` David Rientjes
2015-07-28 12:20   ` Vlastimil Babka
2015-07-20  8:00 ` [PATCH 03/10] mm, page_alloc: Remove unnecessary recalculations for dirty zone balancing Mel Gorman
2015-07-22  0:08   ` David Rientjes
2015-07-23 12:28     ` Mel Gorman
2015-07-28 12:25   ` Vlastimil Babka
2015-07-20  8:00 ` [PATCH 04/10] mm, page_alloc: Remove unnecessary taking of a seqlock when cpusets are disabled Mel Gorman
2015-07-22  0:11   ` David Rientjes
2015-07-28 12:32   ` Vlastimil Babka
2015-07-20  8:00 ` [PATCH 05/10] mm, page_alloc: Remove unnecessary updating of GFP flags during normal operation Mel Gorman
2015-07-28 13:36   ` Vlastimil Babka
2015-07-28 13:47     ` Peter Zijlstra
2015-07-28 15:48     ` Mel Gorman
2015-07-20  8:00 ` [PATCH 06/10] mm, page_alloc: Use jump label to check if page grouping by mobility is enabled Mel Gorman
2015-07-28 13:42   ` Vlastimil Babka
2015-07-20  8:00 ` [PATCH 07/10] mm, page_alloc: Use masks and shifts when converting GFP flags to migrate types Mel Gorman
2015-07-20  8:00 ` [PATCH 08/10] mm, page_alloc: Remove MIGRATE_RESERVE Mel Gorman
2015-07-29  9:59   ` Vlastimil Babka
2015-07-29 12:25     ` Mel Gorman
2015-07-20  8:00 ` [PATCH 09/10] mm, page_alloc: Reserve pageblocks for high-order atomic allocations on demand Mel Gorman
2015-07-29 11:35   ` Vlastimil Babka
2015-07-29 12:53     ` Mel Gorman
2015-07-31  8:28       ` Vlastimil Babka [this message]
2015-07-31  8:43         ` Mel Gorman
2015-07-31  5:54   ` Joonsoo Kim
2015-07-31  7:11     ` Mel Gorman
2015-07-31  7:25       ` Vlastimil Babka
2015-07-31  8:22         ` Mel Gorman
2015-07-31  8:30         ` Joonsoo Kim
2015-07-31  8:26       ` Joonsoo Kim
2015-07-31  8:41         ` Mel Gorman
2015-07-20  8:00 ` [PATCH 10/10] mm, page_alloc: Only enforce watermarks for order-0 allocations Mel Gorman
2015-07-29 12:25   ` Vlastimil Babka
2015-07-29 13:04     ` Mel Gorman
2015-07-31  6:08   ` Joonsoo Kim
2015-07-31  7:19     ` Mel Gorman
2015-07-31  8:40       ` Joonsoo Kim
2015-07-31  6:14 ` [RFC PATCH 00/10] Remove zonelist cache and high-order watermark checking Joonsoo Kim
2015-07-31  7:20   ` Mel Gorman
2015-08-12 10:45 [PATCH 00/10] Remove zonelist cache and high-order watermark checking v2 Mel Gorman
2015-08-12 10:45 ` [PATCH 09/10] mm, page_alloc: Reserve pageblocks for high-order atomic allocations on demand Mel Gorman
2015-09-21 10:52 [PATCH 00/10] Remove zonelist cache and high-order watermark checking v4 Mel Gorman
2015-09-21 10:52 ` [PATCH 09/10] mm, page_alloc: Reserve pageblocks for high-order atomic allocations on demand Mel Gorman
2015-09-24 13:50   ` Michal Hocko
2015-09-25 19:22   ` Johannes Weiner
2015-09-29 21:01   ` Andrew Morton
2015-09-30  8:27     ` Mel Gorman
2015-09-30 14:02       ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55BB31C4.3000107@suse.cz \
    --to=vbabka@suse.cz \
    --cc=gioh.kim@lge.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.com \
    --cc=mgorman@techsingularity.net \
    --cc=pintu.k@samsung.com \
    --cc=qiuxishi@huawei.com \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox