Re: reducing fragmentation of unmovable pages

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: reducing fragmentation of unmovable pages
       [not found]   ` <alpine.DEB.2.10.1711141559590.135872@chino.kir.corp.google.com>
@ 2017-11-16  7:42     ` Vlastimil Babka
  0 siblings, 0 replies; 2+ messages in thread
From: Vlastimil Babka @ 2017-11-16  7:42 UTC (permalink / raw)
  To: David Rientjes
  Cc: Christoph Lameter, Joonsoo Kim, Mel Gorman, Kirill A. Shutemov, linux-mm

(since you say off-list was by mistake at the end of the mail, CC the list)

On 11/15/2017 01:21 AM, David Rientjes wrote:
> On Tue, 7 Nov 2017, Vlastimil Babka wrote:
> 
>>> I'm looking at ways to improve unmovable page fragmentation, specifically
>>> unreclaimable slab, but extendable to all non __GFP_RECLAIMABLE and non
>>> __GFP_MOVABLE allocations.  The big idea is to reduce the amount of fall
>>> back to other migratetypes when trying to allocate non-movable pages.
>>
>> Ultimately, fall back to other migratetype is the only thing that can be
>> done, if existing MIGRATE_UNMOVABLE pageblocks are full. The key goal
>> should then be to fully use existing pageblocks with unmovable pages by
>> further unmovable allocations, instead of scattering them among other
>> pageblocks.
>>
> 
> Yes, MIGRATE_UNMOVABLE pageblocks are already fully utilized with the 
> exception of draining pcp lists.  The goal is to reduce the amount of 
> fallback for unmovable pages to other migrate types by trying to increase 
> the amount of memory available on MIGRATE_UNMOVABLE pageblocks using 
> various means.

OK

>>> Specifically:
>>>
>>>  - Do not steal entire MIGRATE_MOVABLE pageblocks during fallback the 
>>>    vast majority of the time.  The page allocator prefers to fallback to
>>>    larger page orders first to prevent the need for subsequent fallback.
>>
>> Which is consistent with what I said above. If it steals a small part,
>> it would pollute the pageblock with few unmovable pages, likely without
>> marking the pageblock itself as unmovable. The following unmovable
>> allocations would then pollute another pageblock... If they are short
>> lived ones, it's no big deal, but we don't know that.
>>
>>>    As a result, move_freepages_block() typically converts the fallback
>>>    pageblock, MIGRATE_RECLAIMABLE or MIGRATE_MOVABLE, to 
>>>    MIGRATE_UNMOVABLE increasing fragmentation and making it difficult to
>>
>> What exactly does "increasing fragmentation" mean here?
>>
> 
> Sorry, increasing the number of pageblocks that are MIGRATE_UNMOVABLE and 
> not available for various modes of compaction.  I see this as two 
> different problems:
> 
>  - without memory pressure, no reclaimable slab is freed from unmovable
>    pageblocks that could make memory for subsequent MIGRATE_UNMOVABLE
>    allocations available, and

Well, slab reclaim is a story of its own. Due to its internal
fragmentation there cannot guarantee freeing whole pages.

>  - existing movable memory gets stranded on MIGRATE_UNMOVABLE pageblocks
>    due to the conversion and it is hard to switch back to MIGRATE_MOVABLE
>    later.

Yeah the switching back might be a problem in the "theoretical worst
case" scenario quoted below.

>>>    convert back to MIGRATE_MOVABLE due to long-lived slab allocations.
>>
>> Having the long-lived slab allocations spread in multiple pageblocks
>> would be worse. They could be marked MIGRATE_MOVABLE, but in fact
>> contain some unmovable pages, thus still not available for huge pages.
>> Ultimately we don't know which unmovable allocations are long-lived and
>> which aren't. We can only strive to limit the number of pageblocks they
>> pollute. The theoretical worst case is a large burst of unmovable
>> allocations mixing short and long lived ones, where we eventually fill
>> most pageblocks with them, and then the short lived ones are freed and
>> we are left with each pageblock containing few long lived ones. IMHO no
>> scheme can prevent that unless it can predict the allocation age or have
>> truly useful hints, in order to keep the short and long lived ones in
>> separate pageblocks.
>>
> 
> Absent some kind of annotation where we group various types of kmem 
> together, I'm wondering the reverse of what I wrote would actually be 
> better?  In other words, when falling back to MIGRATE_MOVABLE pageblocks, 
> always convert the entire pageblock to MIGRATE_UNMOVABLE with the 
> rationale that we want to exhaust that particular pageblock before falling 
> back again, regardless of whether half of it is free or not.

Yes that's what I was trying to say. We already steal all free pages
from the pageblock in that case, although we don't necessarily mark is
MIGRATE_UNMOVABLE.

So one idea I also tried to develop at some point (IIRC even posted some
version, and maybe Joonsoo did as well?) is to introduce a new
MIGRATE_MIXED migratetype for marking blocks that were used as a
fragmenting fallback, but didn't steal enough to mark them UNMOVABLE or
RECLAIMABLE. Then they are used first in the fallback preference order.

Theoretically this should help the heuristics, because a) we prevent
unmovable allocations falling back to more more MOVABLE pageblocks by
first reusing those already lightly "polluted", because now they are
marked as MIXED and preferred, while previously they would be marked as
MOVABLE and chosen at random. And b) if a burst of unmovable allocations
subsides and short-lived ones are freed, we might now have less
UNMOVABLE pageblocks, where further unmovable allocations will be
contained, while in MIXED pageblocks the existing unmovable allocations
would be only freed, so eventually they might get converted to pure
MOVABLE again.

But for this to fully work, we might need to have more mechanisms for
converting the pageblock marking according to current number of
movable/unmovable pages in the pageblock, than just the fallback events
(which used to only care about free pages, but since commit
02aa0cdd72483 they also count movable ones), or fully freeing pageblock.
Compaction scanner would be a natural fit for that.

>>>  - Trigger kcompactd in the background to migrate eligible memory only
>>>    from MIGRATE_UNMOVABLE pageblocks when the allocator falls back to
>>>    pageblocks of different migratetype.
>>
>> Yeah, there were such suggestions in the past, we could trigger the
>> migration specifically from the pageblock which was used as the fall back.
>>
> 
> Yeah, a MIGRATE_ASYNC-ish type of compaction that migrates from the 
> fallback pageblock to MIGRATE_MOVABLE pageblocks in an attempt to free as 
> much of the fallback pageblock as possible.
> 
>>>  - Trigger shrink_slab() in the background to free reclaimable slab even
>>>    when per-zone watermarks have not been met when falling back to
>>>    pageblocks of different migratetype to hopefully make pages eligible
>>>    from MIGRATE_UNMOVABLE pageblocks for subsequent allocations.
>>
>> That would free MIGRATE_RECLAIMABLE pages, not unmovable. But perhaps
>> still an improvement, because fallback to reclaimable is preferred to
>> movable.
>>
> 
> s/MIGRATE_UNMOVABLE/MIGRATE_RECLAIMABLE/ in mine

Right.

> ,s/preferred to 
> movable/preferred to unmovable/ in yours?

Yes.

> Yeah, so what I was thinking was to trigger shrink_slab() from 
> MIGRATE_RECLAIMABLE pageblocks anytime there is fallback for 
> MIGRATE_UNMOVABLE pages, regardless of whether it falls back to 
> MIGRATE_RECLAIMABLE or MIGRATE_UNMOVABLE.

Could be worth trying, but note the internal fragmentation problem. Also
we wouldn't want to harm performance by shrinking the caches too much.
Maybe the workload would have a natural working set of both reclaimable
and unmovable allocations and we might be thrashing it prematurely.

>>> The goal is to make more MIGRATE_UNMOVABLE memory available for kmem
>>> allocations to avoid falling back to MIGRATE_RECLAIMABLE and 
>>> MIGRATE_MOVABLE pageblocks.  This results in a higher amount of memory
>>> available for hugepage allocation and less work needed to be done by
>>> compaction to constantly try to make MIGRATE_MOVABLE entirely free for
>>> hugepage allocation.
>>>
>>> Thoughts?  More ideas?
>>
>> Well, I don't see why keep such discussions off-list :)
>>
> 
> Unintentional, sorry.  Do you have any other ideas beyond this that might 
> help reduce kmem fragmentation, which makes compaction harder and less 
> memory available for high-order allocations?
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: reducing fragmentation of unmovable pages
       [not found]   ` <alpine.DEB.2.10.1711141635310.139637@chino.kir.corp.google.com>
@ 2017-11-16  7:55     ` Vlastimil Babka
  0 siblings, 0 replies; 2+ messages in thread
From: Vlastimil Babka @ 2017-11-16  7:55 UTC (permalink / raw)
  To: David Rientjes, Mel Gorman
  Cc: Christoph Lameter, Joonsoo Kim, Kirill A. Shutemov, linux-mm

CC linux-mm

On 11/15/2017 01:52 AM, David Rientjes wrote:
> On Tue, 7 Nov 2017, Mel Gorman wrote:
> 
>>>  - Do not steal entire MIGRATE_MOVABLE pageblocks during fallback the 
>>>    vast majority of the time.  The page allocator prefers to fallback to
>>>    larger page orders first to prevent the need for subsequent fallback.
>>>    As a result, move_freepages_block() typically converts the fallback
>>>    pageblock, MIGRATE_RECLAIMABLE or MIGRATE_MOVABLE, to 
>>>    MIGRATE_UNMOVABLE increasing fragmentation and making it difficult to
>>>    convert back to MIGRATE_MOVABLE due to long-lived slab allocations.
>>>
>>
>> The thinking behind it was that once a fallback occurs, the change
>> is potentially permanent. Hence, once the pageblock is unusable for
>> hugepage-sized allocations, the damage should be confined there as much
>> as possible.
>>
> 
> Ok, sounds reasonable, thanks.  I'm wondering if we should do two things 
> when falling back to a MIGRATE_UNMOVABLE pageblock:
> 
>  - trigger kcompactd compaction to migrate all movable memory to
>    MIGRATE_MOVABLE, then
> 
>  - unconditionally convert to MIGRATE_UNMOVABLE to use the entire 
>    pageblock.
> 
> Or is there some perceived benefit to not doing the conversion when less 
> than 1/2 of the pageblock is free?  We lack insight to how short-lived or 
> temporary the allocation is, so it may sit there forever in a 
> MIGRATE_MOVABLE pageblock that just puts more stress on compaction and 
> will never allow the full pageblock to become free.
> 
> This could also be done for MIGRATE_RECLAIMABLE pageblocks:
> 
>  - trigger shrink_slab() in the background to free all reclaimable
>    memory possible, then
> 
>  - unconditionally convert to MIGRATE_UNMOVABLE.
> 
> Subsequent fallback to MIGRATE_RECLAIMABLE may occur, but that should 
> still be preferred over falling back to MIGRATE_MOVABLE.
> 
>>>  - Trigger kcompactd in the background to migrate eligible memory only
>>>    from MIGRATE_UNMOVABLE pageblocks when the allocator falls back to
>>>    pageblocks of different migratetype.
>>>
>>
>> That is potentially worthwhile as long as the cost is willing to be
>> paid. I never kept a list of issues that were encountered when
>> attempting to reduce fallbacks but some of the concerns were;
>>
>> 1. During the migration, minor fault stalls due to migration may increase.
>>
>> 2. It's inherently race-prone if kcompactd or any sort of parallel work
>>    does the work given a list of pageblocks that recently were used
>>    as fallback as a further fallback can occur before the migration is
>>    complete. If the work is done synchronously, the cost is too high
>>    and the calling context may not even allow it.
>>
> 
> Yeah, all the work being proposed here in asynchronous.  What exactly is 
> the concern about additional fallbacks being done while 
> compacting/reclaiming?  If it's a burst of unmovable allocations, I still 
> see a benefit to migrating MIGRATE_MOVABLE pages away from 
> MIGRATE_UNMOVABLE pageblocks and converting the entire fallback to 
> MIGRATE_UNMOVABLE in the hope of consolidating future allocations there.  
> I don't have data that would suggest this would improve the situation, 
> however, but I could collect it if that would be interesting.
> 
>> 3. There is no guarantee that the pages can be migrated due to memory
>>    pressure.
>>
> 
> Right, I think all of this is considered best effort.
> 
>> 4. You also have to take into account that if the movable pages are
>>    migrated out then a future movable allocation request may attempt to
>>    steal it right back. i.e. by reducing potential fallbacks from unmovable
>>    and reclaimable requests, you increase the changes of a future fallback
>>    of a movable one unless you are willing to reclaim in some instances
>>
> 
> Interesting, I hadn't considered that.  I was working under the assumption 
> that MIGRATE_MOVABLE memory wasn't low, which matches the systems that I 
> am trying to optimize this for (some with ~40% of memory free on the 
> system).  I hadn't thought of MIGRATE_MOVABLE allocations doing the same 
> thing to other migrate types.

I don't think this is a big issue, because the criteria for movable
allocation stealing all pages from pageblock are more strict than
unmovable allocation stealing. Also if we are in the situation that
unmovable allocation has to fallback, we are still above at least the
min watermark, which means there is free memory, which thus has to be of
another migratetype - MOVABLE or RECLAIMABLE. The amount of it will be
higher than the single pageblock we are stealing, and movable allocation
will prefer that before trying to steal back. The low/min watermark
would be hit first before exhausting that, so reclaim would then create
more. But yeah this might not be so simple if it's a fallback due to
unmovable high-order allocation and most free memory being in unmovable
pageblocks, but fragmented...

>> 5. If you try partitioning the system and never allowing fallbacks, it
>>    hits into weird OOM issues. If you try and limit it, you need
>>    per-miigrate-type counters for each pageblock in the system. The former
>>    turns into a functional failure. The latter costs too much.
>>
> 
> Ah, this came up before when I proposed a patch for tracking the number of 
> movable pages per zone and limiting synchronous compaction when that 
> number was deemed to be too low.  I remember per-migratetype counters 
> being too costly.
> 
>>>  - Trigger shrink_slab() in the background to free reclaimable slab even
>>>    when per-zone watermarks have not been met when falling back to
>>>    pageblocks of different migratetype to hopefully make pages eligible
>>>    from MIGRATE_UNMOVABLE pageblocks for subsequent allocations.
>>>
>>
>> Worth prototyping but people may claim that you are disrupting the
>> system now by reclaiming slab in case a high-order allocation is needed
>> in the future. This is similar to, but not as bad, as lumpy reclaim was.
>>
> 
> Would it be more palatable if this shrink_slab() when falling back to 
> different migratetypes only occurred if zone_watermark_ok() failed for 
> pageblock_order at high_wmark_pages()?
> 
> Are there any other ideas that you or others may have that reduces kmem 
> fragmentation over pageblocks?
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-11-16  7:55 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <alpine.DEB.2.10.1711061431420.24485@chino.kir.corp.google.com>
     [not found] ` <ba4ddd97-7f9f-c53a-dcd4-a269b2e164f6@suse.cz>
     [not found]   ` <alpine.DEB.2.10.1711141559590.135872@chino.kir.corp.google.com>
2017-11-16  7:42     ` reducing fragmentation of unmovable pages Vlastimil Babka
     [not found] ` <20171107115356.32gly4je5nh4a4fm@suse.de>
     [not found]   ` <alpine.DEB.2.10.1711141635310.139637@chino.kir.corp.google.com>
2017-11-16  7:55     ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox