Re: + mm-introduce-reported-pages.patch added to -mm tree

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	Michal Hocko <mhocko@kernel.org>
Cc: akpm@linux-foundation.org, aarcange@redhat.com,
	dan.j.williams@intel.com, dave.hansen@intel.com,
	konrad.wilk@oracle.com, lcapitulino@redhat.com,
	mgorman@techsingularity.net, mm-commits@vger.kernel.org,
	mst@redhat.com, osalvador@suse.de, pagupta@redhat.com,
	pbonzini@redhat.com, riel@surriel.com, vbabka@suse.cz,
	wei.w.wang@intel.com, willy@infradead.org,
	yang.zhang.wz@gmail.com, linux-mm@kvack.org
Subject: Re: + mm-introduce-reported-pages.patch added to -mm tree
Date: Tue, 12 Nov 2019 23:17:17 +0100	[thread overview]
Message-ID: <2177e6e7-a02a-a75f-3d60-93f1a18f96a5@redhat.com> (raw)
In-Reply-To: <cfeeb6e4-ba73-bbf7-0cc3-d0c011d00971@redhat.com>

On 12.11.19 22:05, David Hildenbrand wrote:
> On 12.11.19 19:34, Alexander Duyck wrote:
>> On Tue, 2019-11-12 at 14:04 +0100, David Hildenbrand wrote:
>>>>>> fact is it is still invasive, just to different parts of the mm subsystem.
>>>>>
>>>>> I'd love to see how it uses the page isolation framework, and only has a 
>>>>> single hook to queue pages. I don't like the way pages are pulled out of 
>>>>> the buddy in Niteshs approach currently. What you have is cleaner.
>>>>
>>>> I don't see how you could use the page isolation framework to pull out
>>>> free pages. Is there a thread somewhere on the topic that I missed?
>>>
>>> It's basically only isolating pages while reporting them, and not
>>> pulling them out of the buddy (IOW, you move the pages to the isolate
>>> queues where nobody is allowed to touch them, and setting the
>>> migratetype properly). This e.g., makes other user of page isolation
>>> (e.g., memory offlining, alloc_contig_range()) play nicely with these
>>> isolated pages. "somebody else just isolated them, please try again."
>>
>> How so? If I understand correctly there isn't anything that prevents you
>> from isolating an already isolated page, is there? Last I knew isolated
> 
> mm/page_isolation.c:set_migratetype_isolate()
> ...
> if (is_migrate_isolate_page(page))
> 	goto out;
> ...
> -> Currently -EBUSY
> 
>> pages are still considered "movable" since they are still buddy pages
>> aren't they?
> 
> They are neither movable nor unmovable AFAIK. They are temporarily
> blocked. E.g., memory offlining currently returns -EBUSY if it cannot
> isolate the page range. alloc_contig_range() does the same. Imagine
> somebody allocating a gigantic page. You certainly cannot move the pages
> that are isolated while allocating the page. But you can signal to the
> caller to try again later.
> 
>>
>> Also this seems like it would have other implications since isolating a
>> page kicks of the memory notifier so as a result a balloon driver would
>> then free the pages back out so that they could be isolated with the
>> assumption the region is going offline.
> 
> Memory notifier? Balloon pages getting freed? No.
> 
> The memory notifier is used for onlining/offlining, it is not involved here.
> 
> I think what you mean is the "isolate notifier", which is only used by
> CMM on PPC.
> 
> See https://lkml.org/lkml/2019/10/31/487, where I rip that notifier out.
> 
>>
>>> start_isolate_page_range()/undo_isolate_page_range()/test_pages_isolated()
>>> along with a lockless check if the page is free.
>>
>> Okay, that part I think I get. However doesn't all that logic more or less
>> ignore the watermarks? It seems like you could cause an OOM if you don't
>> have the necessary checks in place for that.
> 
> Any approach that temporarily blocks some free pages from getting
> allocated will essentially have this issue, no? I think one main design
> point to minimize false OOMs was to limit the number of pages we report
> at a time. Or what do you propose here in addition to that?
> 
>>
>>> I think it should be something like this (ignoring different
>>> migratetypes and such for now)
>>>
>>> 1. Test lockless if page is free: Not free? Done.
>>
>> So this should help to reduce the liklihood of races in the steps below.
>> However it might also be useful if the code had some other check to see if
>> it was done other than just making a pass through the bitmap.
> 
> Yes.
> 
>>
>> One thing I had brought up with Nitesh was the idea of maybe doing some
>> sort of RCU bitmap type approach. Basically while we hold the zone lock we
>> could swap out the old bitmap for a new one. We could probably even keep a
>> counter at the start of the structure so that we could track how many bits
>> are actually set there. Then it becomes less likely of having a race where
>> you free a page and set the bit and the hinting thread tests and clears
>> the bit but doesn't see the freed page since it is not synchronized.
>> Otherwise your notification setup and reporting thread may need a few smp
>> barriers added where necessary.
> 
> Yes, swapping out the bitmap via RCU is also be a way to make memory
> hotplug work.
> 
> I was also thinking about a different bitmap approach. Store for each
> section a bitmap. Use a meta bitmap with a bit for each section that
> contains pages to report. Sparse zones and memory hot(un)plug would not
> be a real issue anymore.
> 
> One could go one step further and only have a bitmap with a bit for each
> section. Only remember that some (large) page was not reported in that
> section (e.g., after buddy merging). In the reporting thread, report all
> free pages within that section. You could end up reporting the same page
> a couple of times, but the question would be if this is relevant at all.
> One would have to prototype and measure that.
> 
> Long story short, I am not 100% a fan of the current "bitmap per zone"
> approach but is is fairly simple to start with :)
> 
>>
>>> 2. start_isolate_page_range(): Busy? Rare race (with other isolate users
>>
>> Doesn't this have the side effect of draining all the percpu caches in
>> order to make certain to flush the pages we isolated from there?
> 
> While alloc_contig_range() e.g., calls lru_add_drain_all(), I don't
> think isolation will. Where did you spot something like this in
> mm/page_isolation.c?
> 
>>
>>> or with an allocation). Done.
>>> 3. test_pages_isolated()
>>
>> So I have reviewed the code and I don't see how this could conflict with
>> other callers isolating the pages. If anything it seems like if another
>> thread has already isolated the pages you would end up getting a false
>> positive, reporting the pages, and pulling them back out of isolation.
> 
> Isolated pages cannot be isolated. This is tracked via the migratetype.
> 
>>
>>> 3a. no? Rare race, page not free anymore. undo_isolate_page_range()
>>
>> I would hope it is rare. However for something like a max order page I
>> could easily see a piece of it having been pulled out. I would think this
>> case would be exceedingly expensive since you would have to put back any
>> pages you had previous moved into isolation.
> 
> I guess it is rare, there is a tiny slot between checking if the page is
> free and isolating it. Would have to see that in action.
> 
>>
>>> 3b. yes? Report, then undo_isolate_page_range()
>>>
>>> If we would run into performance issues with the current page isolation
>>> implementation (esp. locking), I think there are some nice
>>> cleanups/reworks possible of which all current users could benefit
>>> (especially accross pageblocks).
>>
>> To me this feels a lot like what you had for this solution near the start.
>> Only now instead of placing the pages into an array you are tracking a
>> bitmap and then using that bitmap to populate the MIGRATE_ISOLATE lists.
> 
> Now we have a clean MM interface to do that :) And yes, which data
> structure we're using becomes irrelevant.
> 
>>
>> This sounds far more complex to me then it probably needs to be since just
>> holding the pages with the buddy type cleared should be enough to make
>> them temporarily unusable for other threads, and even in your case you are
> 
> If you have a page that is not PageBuddy() and not movable within
> ZONE_MOVABLE, has_unmovable_pages() will WARN_ON_ONCE(zone_idx(zone) ==
> ZONE_MOVABLE). This can be triggered via memory offlining, when
> isolating the page range.
> 
> If your approach does exactly that (clear PageBuddy() on a
> ZONE_MOVABLE), it would be a bug. The only safe way is to have the
> pageblock(s) isolated.
> 

Minor correction: Only if your refcount is > 0.

-- 

Thanks,

David / dhildenb

next prev parent reply	other threads:[~2019-11-12 22:17 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20191106000547.juQRi83gi%akpm@linux-foundation.org>
2019-11-06 12:16 ` Michal Hocko
2019-11-06 14:09   ` David Hildenbrand
2019-11-06 16:35     ` Alexander Duyck
2019-11-06 16:54       ` Michal Hocko
2019-11-06 17:48         ` Alexander Duyck
2019-11-06 22:11           ` Mel Gorman
2019-11-06 23:38             ` David Hildenbrand
2019-11-07  0:20             ` Alexander Duyck
2019-11-07 10:20               ` Mel Gorman
2019-11-07 16:07                 ` Alexander Duyck
2019-11-08  9:43                   ` Mel Gorman
2019-11-08 16:17                     ` Alexander Duyck
2019-11-08 18:41                       ` Mel Gorman
2019-11-08 20:29                         ` Alexander Duyck
2019-11-09 14:57                           ` Mel Gorman
2019-11-10 18:03                             ` Alexander Duyck
2019-11-06 23:33           ` David Hildenbrand
2019-11-07  0:20             ` Dave Hansen
2019-11-07  0:52               ` David Hildenbrand
2019-11-07 17:12                 ` Dave Hansen
2019-11-07 17:46                   ` Michal Hocko
2019-11-07 18:08                     ` Dave Hansen
2019-11-07 18:12                     ` Alexander Duyck
2019-11-08  9:57                       ` Michal Hocko
2019-11-08 16:43                         ` Alexander Duyck
2019-11-07 18:46                   ` Qian Cai
2019-11-07 18:02             ` Alexander Duyck
2019-11-07 19:37               ` Nitesh Narayan Lal
2019-11-07 22:46                 ` Alexander Duyck
2019-11-07 22:43               ` David Hildenbrand
2019-11-08  0:42                 ` Alexander Duyck
2019-11-08  7:06                   ` David Hildenbrand
2019-11-08 17:18                     ` Alexander Duyck
2019-11-12 13:04                       ` David Hildenbrand
2019-11-12 18:34                         ` Alexander Duyck
2019-11-12 21:05                           ` David Hildenbrand
2019-11-12 22:17                             ` David Hildenbrand [this message]
2019-11-12 22:19                             ` Alexander Duyck
2019-11-12 23:10                               ` David Hildenbrand
2019-11-13  0:31                                 ` Alexander Duyck
2019-11-13 18:51                           ` Nitesh Narayan Lal
2019-11-06 16:49   ` Nitesh Narayan Lal
2019-11-11 18:52   ` Nitesh Narayan Lal
2019-11-11 22:00     ` Alexander Duyck
2019-11-12 15:19       ` Nitesh Narayan Lal
2019-11-12 16:18         ` Alexander Duyck
2019-11-13 18:39           ` Nitesh Narayan Lal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2177e6e7-a02a-a75f-3d60-93f1a18f96a5@redhat.com \
    --to=david@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=konrad.wilk@oracle.com \
    --cc=lcapitulino@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=mm-commits@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=osalvador@suse.de \
    --cc=pagupta@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=riel@surriel.com \
    --cc=vbabka@suse.cz \
    --cc=wei.w.wang@intel.com \
    --cc=willy@infradead.org \
    --cc=yang.zhang.wz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox