From: Vlastimil Babka <vbabka@suse.cz>
To: Yang Shi <shy828301@gmail.com>, Jann Horn <jannh@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>,
Matthew Wilcox <willy@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Linux-MM <linux-mm@kvack.org>,
kernel list <linux-kernel@vger.kernel.org>,
"Kirill A . Shutemov" <kirill@shutemov.name>,
Jan Kara <jack@suse.cz>, stable <stable@vger.kernel.org>
Subject: Re: [PATCH v2] mm/gup: fix try_grab_compound_head() race with split_huge_page()
Date: Wed, 16 Jun 2021 19:27:05 +0200 [thread overview]
Message-ID: <6d21f8cb-4b72-bdec-386c-684ddbcdada1@suse.cz> (raw)
In-Reply-To: <CAHbLzkomex+fgC8RyogXu-s5o2UrORMO6D2yTsSXW5Wo5z9WRA@mail.gmail.com>
On 6/16/21 1:10 AM, Yang Shi wrote:
> On Tue, Jun 15, 2021 at 5:10 AM Jann Horn <jannh@google.com> wrote:
>>
>> On Tue, Jun 15, 2021 at 8:37 AM John Hubbard <jhubbard@nvidia.com> wrote:
>> > On 6/14/21 6:20 PM, Jann Horn wrote:
>> > > try_grab_compound_head() is used to grab a reference to a page from
>> > > get_user_pages_fast(), which is only protected against concurrent
>> > > freeing of page tables (via local_irq_save()), but not against
>> > > concurrent TLB flushes, freeing of data pages, or splitting of compound
>> > > pages.
>> [...]
>> > Reviewed-by: John Hubbard <jhubbard@nvidia.com>
>>
>> Thanks!
>>
>> [...]
>> > > @@ -55,8 +72,23 @@ static inline struct page *try_get_compound_head(struct page *page, int refs)
>> > > if (WARN_ON_ONCE(page_ref_count(head) < 0))
>> > > return NULL;
>> > > if (unlikely(!page_cache_add_speculative(head, refs)))
>> > > return NULL;
>> > > +
>> > > + /*
>> > > + * At this point we have a stable reference to the head page; but it
>> > > + * could be that between the compound_head() lookup and the refcount
>> > > + * increment, the compound page was split, in which case we'd end up
>> > > + * holding a reference on a page that has nothing to do with the page
>> > > + * we were given anymore.
>> > > + * So now that the head page is stable, recheck that the pages still
>> > > + * belong together.
>> > > + */
>> > > + if (unlikely(compound_head(page) != head)) {
>> >
>> > I was just wondering about what all could happen here. Such as: page gets split,
>> > reallocated into a different-sized compound page, one that still has page pointing
>> > to head. I think that's OK, because we don't look at or change other huge page
>> > fields.
>> >
>> > But I thought I'd mention the idea in case anyone else has any clever ideas about
>> > how this simple check might be insufficient here. It seems fine to me, but I
>> > routinely lack enough imagination about concurrent operations. :)
>>
>> Hmmm... I think the scariest aspect here is probably the interaction
>> with concurrent allocation of a compound page on architectures with
>> store-store reordering (like ARM). *If* the page allocator handled
>> compound pages with lockless, non-atomic percpu freelists, I think it
>> might be possible that the zeroing of tail_page->compound_head in
>> put_page() could be reordered after the page has been freed,
>> reallocated and set to refcount 1 again?
>>
>> That shouldn't be possible at the moment, but it is still a bit scary.
>
> It might be possible after Mel's "mm/page_alloc: Allow high-order
> pages to be stored on the per-cpu lists" patch
> (https://patchwork.kernel.org/project/linux-mm/patch/20210611135753.GC30378@techsingularity.net/).
Those would be percpu indeed, but not "lockless, non-atomic", no? They are
protected by a local_lock.
>>
>>
>> I think the lockless page cache code also has to deal with somewhat
>> similar ordering concerns when it uses page_cache_get_speculative(),
>> e.g. in mapping_get_entry() - first it looks up a page pointer with
>> xas_load(), and any access to the page later on would be a _dependent
>> load_, but if the page then gets freed, reallocated, and inserted into
>> the page cache again before the refcount increment and the re-check
>> using xas_reload(), then there would be no data dependency from
>> xas_reload() to the following use of the page...
>>
>
next prev parent reply other threads:[~2021-06-16 17:27 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-15 1:20 Jann Horn
2021-06-15 2:00 ` Andrew Morton
2021-06-15 2:36 ` Jann Horn
2021-06-15 2:38 ` Jann Horn
2021-06-15 6:37 ` John Hubbard
2021-06-15 12:09 ` Jann Horn
2021-06-15 23:10 ` Yang Shi
2021-06-16 17:27 ` Vlastimil Babka [this message]
2021-06-16 18:40 ` Yang Shi
2021-06-17 16:09 ` Vlastimil Babka
2021-06-18 13:25 ` Jason Gunthorpe
2021-06-18 13:50 ` Matthew Wilcox
2021-06-18 14:58 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6d21f8cb-4b72-bdec-386c-684ddbcdada1@suse.cz \
--to=vbabka@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=jack@suse.cz \
--cc=jannh@google.com \
--cc=jhubbard@nvidia.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=shy828301@gmail.com \
--cc=stable@vger.kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox