Re: ptep_get_lockless() on 32-bit x86/mips/sh looks wrong

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Jason Gunthorpe <jgg@nvidia.com>
To: Jann Horn <jannh@google.com>
Cc: Linux-MM <linux-mm@kvack.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Christoph Hellwig <hch@lst.de>,
	kernel list <linux-kernel@vger.kernel.org>,
	David Hildenbrand <david@redhat.com>
Subject: Re: ptep_get_lockless() on 32-bit x86/mips/sh looks wrong
Date: Thu, 6 Oct 2022 13:15:34 -0300	[thread overview]
Message-ID: <Yz7/JtbinFmNpQMM@nvidia.com> (raw)
In-Reply-To: <CAG48ez1Oz4tT-N2Y=Zs6jumu=zOp7SQRZ=V2c+b5bT9P4retJA@mail.gmail.com>

On Thu, Oct 06, 2022 at 05:55:54PM +0200, Jann Horn wrote:

> > I think the argument here has nothing to do with IPIs, but is more a
> > statement on memory ordering.
> 
> The comment above the definition of ptep_get_lockless() claims: "it
> will not switch to a completely different present page without a TLB
> flush in between; something that we are blocking by holding interrupts
> off."

I was always skeptical of that argument..

> > So, it seems plausible this could be OK based only on atomics (I did
> > not check that the present bit is properly placed in the right
> > low/high). Do you see a way the atomics don't work out?
> 
> The race would be something like this, where A is one thread doing
> ptep_get_lockless() and B, C and D are other threads:
> 
> <PTE initially points to address 0x0001000100010000>
> A: read ptep->pte_low, sees low address half 0x00010000
> B: begins MADV_DONTNEED, removes the PTE but doesn't flush TLB yet
> C: page fault installs a new PTE pointing to address 0x0001000200020000
> A: read ptep->pte_high, sees high address half 0x00010002
> C: begins MADV_DONTNEED, removes the PTE but doesn't flush TLB yet
> D: page fault installs a new PTE pointing to address 0x0001000300010000
> A: re-read ptep->pte_low, sees low address half 0x00010000 matching
> the first one
> A: returns physical address 0x000100020x00010000, which was never
> actually in the PTE

Okay, that does seem plausible :(

> So it's not a problem with the memory ordering, it's just that it's
> not possible to atomically read a 64-bit PTE with 32-bit reads when
> the PTE can completely change under you - and ptep_get_lockless() was
> written under the assumption that this can't happen because of TLB
> flush IPIs.

It does seem broken then.

If the arch can't provide an atomci load, I suppose the "easy" way to
fix it would be to use a seqlock to protect it..

Though in practice the chances the PTE changes multiple times between
two loads might be sufficiently rare on already rare 32 bit high mem
systems, perhaps it isn't worth fixing..

Thanks,
Jason

     prev parent reply	other threads:[~2022-10-06 16:15 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-06 15:23 Jann Horn
2022-10-06 15:44 ` Jason Gunthorpe
2022-10-06 15:55   ` Jann Horn
2022-10-06 15:58     ` Jann Horn
2022-10-06 16:15     ` Jason Gunthorpe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yz7/JtbinFmNpQMM@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=david@redhat.com \
    --cc=hch@lst.de \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox