From: Michael Ellerman <mpe@ellerman.id.au>
To: Nicholas Piggin <npiggin@gmail.com>,
Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
linuxppc-dev@lists.ozlabs.org,
"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Subject: Re: [PATCH resend] powerpc/64s: fix page table fragment refcount race vs speculative references
Date: Tue, 31 Jul 2018 21:42:22 +1000 [thread overview]
Message-ID: <87600vhbs1.fsf@concordia.ellerman.id.au> (raw)
In-Reply-To: <20180728023255.720d594c@roar.ozlabs.ibm.com>
Nicholas Piggin <npiggin@gmail.com> writes:
> On Fri, 27 Jul 2018 08:38:35 -0700
> Matthew Wilcox <willy@infradead.org> wrote:
>> On Sat, Jul 28, 2018 at 12:29:06AM +1000, Nicholas Piggin wrote:
>> > On Fri, 27 Jul 2018 06:41:56 -0700
>> > Matthew Wilcox <willy@infradead.org> wrote:
>> > > On Fri, Jul 27, 2018 at 09:48:17PM +1000, Nicholas Piggin wrote:
>> > > > The page table fragment allocator uses the main page refcount racily
>> > > > with respect to speculative references. A customer observed a BUG due
>> > > > to page table page refcount underflow in the fragment allocator. This
>> > > > can be caused by the fragment allocator set_page_count stomping on a
>> > > > speculative reference, and then the speculative failure handler
>> > > > decrements the new reference, and the underflow eventually pops when
>> > > > the page tables are freed.
>> > >
>> > > Oof. Can't you fix this instead by using page_ref_add() instead of
>> > > set_page_count()?
>> >
>> > It's ugly doing it that way. The problem is we have a page table
>> > destructor and that would be missed if the spec ref was the last
>> > put. In practice with RCU page table freeing maybe you can say
>> > there will be no spec ref there (unless something changes), but
>> > still it just seems much simpler doing this and avoiding any
>> > complexity or relying on other synchronization.
>>
>> I don't want to rely on the speculative reference not happening by the
>> time the page table is torn down; that's way too black-magic for me.
>> Another possibility would be to use, say, the top 16 bits of the
>> atomic for your counter and call the dtor once the atomic is below 64k.
>> I'm also thinking about overhauling the dtor system so it's not tied to
>> compound pages; anyone with a bit in page_type would be able to use it.
>> That way you'd always get your dtor called, even if the speculative
>> reference was the last one.
>
> Yeah we could look at doing either of those if necessary.
>
>> > > > Any objection to the struct page change to grab the arch specific
>> > > > page table page word for powerpc to use? If not, then this should
>> > > > go via powerpc tree because it's inconsequential for core mm.
>> > >
>> > > I want (eventually) to get to the point where every struct page carries
>> > > a pointer to the struct mm that it belongs to. It's good for debugging
>> > > as well as handling memory errors in page tables.
>> >
>> > That doesn't seem like it should be a problem, there's some spare
>> > words there for arch independent users.
>>
>> Could you take one of the spare words instead then? My intent was to
>> just take the 'x86 pgds only' comment off that member. _pt_pad_2 looks
>> ideal because it'll be initialised to 0 and you'll return it to 0 by
>> the time you're done.
>
> It doesn't matter for powerpc where the atomic_t goes, so I'm fine with
> moving it. But could you juggle the fields with your patch instead? I
> thought it would be nice to using this field that has been already
> tested on x86 not to overlap with any other data for
> bug fix that'll have to be widely backported.
Can we come to a conclusion on this one?
As far as backporting goes pt_mm is new in 4.18-rc so the patch will
need to be manually backported anyway. But I agree with Nick we'd rather
use a slot that is known to be free for arch use.
cheers
next prev parent reply other threads:[~2018-07-31 11:42 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-27 11:48 Nicholas Piggin
2018-07-27 13:41 ` Matthew Wilcox
2018-07-27 14:29 ` Nicholas Piggin
2018-07-27 15:38 ` Matthew Wilcox
2018-07-27 16:32 ` Nicholas Piggin
2018-07-31 11:42 ` Michael Ellerman [this message]
2018-08-01 2:45 ` Nicholas Piggin
2018-08-08 14:26 ` [resend] " Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87600vhbs1.fsf@concordia.ellerman.id.au \
--to=mpe@ellerman.id.au \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=npiggin@gmail.com \
--cc=torvalds@linux-foundation.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox