From: Michal Hocko <mhocko@suse.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
Vlastimil Babka <vbabka@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, Hugh Dickins <hughd@google.com>,
"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH 0/2] RFC: READ/WRITE_ONCE vma/mm cleanups
Date: Tue, 5 Mar 2019 14:00:58 +0100 [thread overview]
Message-ID: <20190305130058.GH28468@dhcp22.suse.cz> (raw)
In-Reply-To: <20190304101209.klwojazhtr4s4reu@kshutemo-mobl1>
On Mon 04-03-19 13:12:10, Kirill A. Shutemov wrote:
> On Fri, Mar 01, 2019 at 11:54:52AM -0500, Andrea Arcangeli wrote:
> > Hello Kirill and Vlastimil,
> >
> > On Fri, Mar 01, 2019 at 02:04:38PM +0100, Vlastimil Babka wrote:
> > > On 3/1/19 10:37 AM, Kirill A. Shutemov wrote:
> > > > On Thu, Feb 28, 2019 at 10:55:48PM -0500, Andrea Arcangeli wrote:
> > > >> Hello,
> > > >>
> > > >> This was a well known issue for more than a decade, but until a few
> > > >> months ago we relied on the compiler to stick to atomic accesses and
> > > >> updates while walking and updating pagetables.
> > > >>
> > > >> However now the 64bit native_set_pte finally uses WRITE_ONCE and
> > > >> gup_pmd_range uses READ_ONCE as well.
> > > >>
> > > >> This convert more racy VM places to avoid depending on the expected
> > > >> compiler behavior to achieve kernel runtime correctness.
> > > >>
> > > >> It mostly guarantees gcc to do atomic updates at 64bit granularity
> > > >> (practically not needed) and it also prevents gcc to emit code that
> > > >> risks getting confused if the memory unexpectedly changes under it
> > > >> (unlikely to ever be needed).
> > > >>
> > > >> The list of vm_start/end/pgoff to update isn't complete, I covered the
> > > >> most obvious places, but before wasting too much time at doing a full
> > > >> audit I thought it was safer to post it and get some comment. More
> > > >> updates can be posted incrementally anyway.
> > > >
> > > > The intention is described well to my eyes.
> > > >
> > > > Do I understand correctly, that it's attempt to get away with modifying
> > > > vma's fields under down_read(mmap_sem)?
> >
> > The issue is that we already get away with it, but we do it without
> > READ/WRITE_ONCE. The patch should changes nothing, it should only
> > reduce the dependency on the compiler to do what we expect.
>
> Yes, it is pre-existing problem. And yes, complier may screw this up.
> The patch may reduce dependency on the compiler, but it doesn't mean it
> reduces chance of race.
>
> Consider your changes into __mm_populate() and populate_vma_page_range().
> You put READ_ONCE() in both functions. But populate_vma_page_range() gets
> called from __mm_populate(). Before your change compiler may optimize the
> code and load from the memory once for a field. With your changes complier
> will issue two loads.
>
> It *increases* chances of the race, not reduces them.
>
> The current locking scheme doesn't allow modifying VMA field without
> down_write(mmap_sem).
>
> We do have hacks[1] that try to bypass the limitation, but AFAIK we never
> had a solid explanation why this should work. Sparkling READ_ONCE()
> doesn't help with this, but makes it appears legitimate.
I do agree with Kirill here. Sprinkling {READ,WRITE}_ONCE around just
doesn't solve anything. I am pretty sure that people will not think
about it and we will end up in a similar half covered situation in few
years again. I would rather remove all those hacks and use a saner
locking scheme instead.
> [1] I believe we also touch vm_flags without proper locking to set/clear
> VM_LOCKED.
--
Michal Hocko
SUSE Labs
prev parent reply other threads:[~2019-03-05 13:01 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-01 3:55 Andrea Arcangeli
2019-03-01 3:55 ` [PATCH 1/2] coredump: use READ_ONCE to read mm->flags Andrea Arcangeli
2019-03-01 3:55 ` [PATCH 2/2] mm: use READ/WRITE_ONCE to access anonymous vmas vm_start/vm_end/vm_pgoff Andrea Arcangeli
2019-03-01 9:37 ` [PATCH 0/2] RFC: READ/WRITE_ONCE vma/mm cleanups Kirill A. Shutemov
2019-03-01 13:04 ` Vlastimil Babka
2019-03-01 16:54 ` Andrea Arcangeli
2019-03-01 18:49 ` Davidlohr Bueso
2019-03-04 10:12 ` Kirill A. Shutemov
2019-03-05 13:00 ` Michal Hocko [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190305130058.GH28468@dhcp22.suse.cz \
--to=mhocko@suse.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kirill@shutemov.name \
--cc=linux-mm@kvack.org \
--cc=peterz@infradead.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox