linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm <linux-mm@kvack.org>
Subject: Re: [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline
Date: Thu, 3 Sep 2020 08:38:06 +0200	[thread overview]
Message-ID: <20200903063806.GM4617@dhcp22.suse.cz> (raw)
In-Reply-To: <e6bf05cb-044c-47a9-3c65-e41b1e42b702@suse.cz>

On Wed 02-09-20 19:51:45, Vlastimil Babka wrote:
> On 9/2/20 5:13 PM, Michal Hocko wrote:
> > On Wed 02-09-20 16:55:05, Vlastimil Babka wrote:
> >> On 9/2/20 4:26 PM, Pavel Tatashin wrote:
> >> > On Wed, Sep 2, 2020 at 10:08 AM Michal Hocko <mhocko@suse.com> wrote:
> >> >>
> >> >> >
> >> >> > Thread#1 - continue
> >> >> >          free_unref_page_commit
> >> >> >            migratetype = get_pcppage_migratetype(page);
> >> >> >               // get old migration type
> >> >> >            list_add(&page->lru, &pcp->lists[migratetype]);
> >> >> >               // add new page to already drained pcp list
> >> >> >
> >> >> > Thread#2
> >> >> > Never drains pcp again, and therefore gets stuck in the loop.
> >> >> >
> >> >> > The fix is to try to drain per-cpu lists again after
> >> >> > check_pages_isolated_cb() fails.
> >> >>
> >> >> But this means that the page is not isolated and so it could be reused
> >> >> for something else. No?
> >> > 
> >> > The page is in a movable zone, has zero references, and the section is
> >> > isolated (i.e. set_pageblock_migratetype(page, MIGRATE_ISOLATE);) is
> >> > set. The page should be offlinable, but it is lost in a pcp list as
> >> > that list is never drained again after the first failure to migrate
> >> > all pages in the range.
> >> 
> >> Yeah. To answer Michal's "it could be reused for something else" - yes, somebody
> >> could allocate it from the pcplist before we do the extra drain. But then it
> >> becomes "visible again" and the loop in __offline_pages() should catch it by
> >> scan_movable_pages() - do_migrate_range(). And this time the pageblock is
> >> already marked as isolated, so the page (freed by migration) won't end up on the
> >> pcplist again.
> > 
> > So the page block is marked MIGRATE_ISOLATE but the allocation itself
> > could be used for non migrateable objects. Or does anything prevent that
> > from happening?
> 
> In a movable zone, the allocation should not be used for non migrateable
> objects. E.g. if the zone was not ZONE_MOVABLE, the offlining could fail
> regardless of this race (analogically for migrating away from CMA pageblocks).
> 
> > We really do depend on isolation to not allow reuse when offlining.
> 
> This is not really different than if the page on pcplist was allocated just a
> moment before the offlining, thus isolation started. We ultimately rely on being
> able to migrate any allocated pages away during the isolation. This "freeing to
> pcplists" race doesn't fundamentally change anything in this regard. We just
> have to guarantee that pages on pcplists will be eventually flushed, to make
> forward progress, and there was a bug in this aspect.

You are right. I managed to confuse myself yesterday. The race is
impossible for !ZONE_MOVABLE because we do PageBuddy check there. And on
the movable zone we are not losing the migrateability property.

Pavel I think this will be a useful information to add to the changelog.
We should also document this in the code to prevent from further
confusion. I would suggest something like the following:

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 242c03121d73..56d4892bceb8 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -170,6 +170,14 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * pageblocks we may have modified and return -EBUSY to caller. This
  * prevents two threads from simultaneously working on overlapping ranges.
  *
+ * Please note that there is no strong synchronization with the page allocator
+ * either. Pages might be freed while their page blocks are marked ISOLATED.
+ * In some cases pages might still end up on pcp lists and that would allow
+ * for their allocation even when they are in fact isolated already. Depending on
+ * how strong of a guarantee the caller needs drain_all_pages might be needed
+ * (e.g. __offline_pages will need to call it after check for isolated range for
+ * a next retry).
+ *
  * Return: the number of isolated pageblocks on success and -EBUSY if any part
  * of range cannot be isolated.
  */
-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2020-09-03  6:38 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-01 12:46 Pavel Tatashin
2020-09-01 18:37 ` David Rientjes
2020-09-02 14:01 ` Michal Hocko
2020-09-02 14:10   ` Michal Hocko
2020-09-02 14:31     ` Pavel Tatashin
2020-09-02 14:49       ` Vlastimil Babka
2020-09-02 14:08 ` Michal Hocko
2020-09-02 14:26   ` Pavel Tatashin
2020-09-02 14:55     ` Vlastimil Babka
2020-09-02 15:13       ` Michal Hocko
2020-09-02 15:40         ` Pavel Tatashin
2020-09-02 17:51         ` Vlastimil Babka
2020-09-03  6:38           ` Michal Hocko [this message]
2020-09-03 18:20             ` David Hildenbrand
2020-09-03 18:23               ` Pavel Tatashin
2020-09-03 18:31                 ` David Hildenbrand
2020-09-04  7:02                   ` Michal Hocko
2020-09-04 14:25                     ` Pavel Tatashin
2020-09-07  7:26                       ` Michal Hocko
2020-09-04  6:32                 ` Vlastimil Babka
2020-09-03  7:07 ` Michal Hocko
2020-09-03 13:43   ` Pavel Tatashin
2020-09-03 13:50 ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200903063806.GM4617@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox