From: Dan Williams <dan.j.williams@intel.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Linux MM <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@redhat.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions
Date: Tue, 12 Jan 2021 01:15:12 -0800 [thread overview]
Message-ID: <CAPcyv4jc_+UvA=y7-AX-wi-nMgDmCiukfkPwTYBye8Es=fecgw@mail.gmail.com> (raw)
In-Reply-To: <20210106095520.GJ13207@dhcp22.suse.cz>
On Wed, Jan 6, 2021 at 1:55 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Tue 05-01-21 20:07:18, Dan Williams wrote:
> > While pfn_to_online_page() is able to determine pfn_valid() at
> > subsection granularity it is not able to reliably determine if a given
> > pfn is also online if the section is mixed with ZONE_DEVICE pfns.
>
> I would call out the problem more explicitly. E.g. something like
> "
> This means that pfn_to_online_page can lead to false positives and allow
> to return a struct page which is not fully initialized because it
> belongs to ZONE_DEVICE (PUT AN EXAMPLE OF SUCH AN UNITIALIZED PAGE
> HERE). That can lead to either crash on PagePoisoned assertion or a
> silently broken page state with debugging disabled.
> "
Apologies for the long pause in this conversation, I went off and
wrote a test to trigger this condition so I could quote it directly.
It turns out soft_offline_page(), even before fixing
pfn_to_online_page(), is broken as it leaks a page reference.
>
> I would also appreciate a more specific note about how the existing HW
> can trigger this. You have mentioned 64MB subsection examples in the
> other email. It would be great to mention it here as well.
Sure.
>
> > Update move_pfn_range_to_zone() to flag (SECTION_TAINT_ZONE_DEVICE) a
> > section that mixes ZONE_DEVICE pfns with other online pfns. With
> > SECTION_TAINT_ZONE_DEVICE to delineate, pfn_to_online_page() can fall
> > back to a slow-path check for ZONE_DEVICE pfns in an online section.
> >
> > With this implementation of pfn_to_online_page() pfn-walkers mostly only
> > need to check section metadata to determine pfn validity. In the
> > rare case of mixed-zone sections the pfn-walker will skip offline
> > ZONE_DEVICE pfns as expected.
>
> The above paragraph is slightly confusing. You do not require
> pfn-walkers to check anything right? Section metadata is something that
> is and should be hidden from them. They are asking for an online page
> and get it or NULL. Nothing more nothing less.
>
Yeah, I'll drop it. I was describing what service pfn_to_online_page()
performs for a pfn-walker, but it's awkwardly worded.
>
> > Other notes:
> >
> > Because the collision case is rare, and for simplicity, the
> > SECTION_TAINT_ZONE_DEVICE flag is never cleared once set.
>
> I do not see a problem with that.
>
> > pfn_to_online_page() was already borderline too large to be a macro /
> > inline function, but the additional logic certainly pushed it over that
> > threshold, and so it is moved to an out-of-line helper.
>
> Worth a separate patch.
>
> The approach is sensible. Thanks!
>
> I was worried that we do not have sufficient space for a new flag but
> the comment explains we have 6 bits available. I haven't double checked
> that for the current state of the code. The comment is quite recent and
> I do not remember any substantial changes in this area. Still something
> that is rather fragile because an unexpected alignment would be a
> runtime failure which is good to stop random corruptions but not ideal.
>
> Is there any way to explicitly test for this? E.g. enforce a shared
> section by qemu and then trigger a pfn walk?
>
> > Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Reported-by: Michal Hocko <mhocko@suse.com>
> > Reported-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
> [...]
>
> > +static int zone_id(const struct zone *zone)
> > +{
> > + struct pglist_data *pgdat = zone->zone_pgdat;
> > +
> > + return zone - pgdat->node_zones;
> > +}
>
> We already have zone_idx()
Noted.
>
> > +
> > +static void section_taint_zone_device(struct zone *zone, unsigned long pfn)
> > +{
> > + struct mem_section *ms = __nr_to_section(pfn_to_section_nr(pfn));
> > +
> > + if (zone_id(zone) != ZONE_DEVICE)
> > + return;
> > +
> > + if (IS_ALIGNED(pfn, PAGES_PER_SECTION))
> > + return;
> > +
> > + ms->section_mem_map |= SECTION_TAINT_ZONE_DEVICE;
> > +}
> > +
> > /*
> > * Associate the pfn range with the given zone, initializing the memmaps
> > * and resizing the pgdat/zone data to span the added pages. After this
> > @@ -707,6 +769,15 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
> > resize_pgdat_range(pgdat, start_pfn, nr_pages);
> > pgdat_resize_unlock(pgdat, &flags);
> >
> > + /*
> > + * Subsection population requires care in pfn_to_online_page().
> > + * Set the taint to enable the slow path detection of
> > + * ZONE_DEVICE pages in an otherwise ZONE_{NORMAL,MOVABLE}
> > + * section.
> > + */
> > + section_taint_zone_device(zone, start_pfn);
> > + section_taint_zone_device(zone, start_pfn + nr_pages);
>
> I think it would be better to add the checks here and only set the flag
> in the called function. SECTION_TAINT_ZONE_DEVICE should go to where we
> define helpers for ther flags.
>
Done.
next prev parent reply other threads:[~2021-01-12 9:15 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-06 4:07 Dan Williams
2021-01-06 9:55 ` Michal Hocko
2021-01-12 9:15 ` Dan Williams [this message]
2021-01-06 9:56 ` David Hildenbrand
2021-01-06 10:42 ` Michal Hocko
2021-01-06 11:22 ` David Hildenbrand
2021-01-06 11:38 ` Michal Hocko
2021-01-06 20:02 ` Dan Williams
2021-01-07 9:15 ` David Hildenbrand
2021-01-12 9:18 ` Dan Williams
2021-01-12 9:44 ` David Hildenbrand
2021-01-12 20:52 ` Dan Williams
2021-01-06 10:04 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAPcyv4jc_+UvA=y7-AX-wi-nMgDmCiukfkPwTYBye8Es=fecgw@mail.gmail.com' \
--to=dan.j.williams@intel.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox