Re: [PATCH] mm: fix a data race in put_page()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: John Hubbard <jhubbard@nvidia.com>
To: Marco Elver <elver@google.com>, Qian Cai <cai@lca.pw>
Cc: Jan Kara <jack@suse.cz>, David Hildenbrand <david@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>, <ira.weiny@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"Paul E. McKenney" <paulmck@kernel.org>
Subject: Re: [PATCH] mm: fix a data race in put_page()
Date: Sat, 8 Feb 2020 17:44:38 -0800	[thread overview]
Message-ID: <5402183a-2372-b442-84d3-c28fb59fa7af@nvidia.com> (raw)
In-Reply-To: <CANpmjNPh0ZXt_t-cZGpM9nm3pzSsb4gzbpGVkhKKVOMdapxwMg@mail.gmail.com>

On 2/7/20 5:17 AM, Marco Elver wrote:
...
>> Yes. I'm grasping at straws now, but...what about the idiom that page_zonenum()
>> uses: a set of bits that are "always" (after a certain early point) read-only?
>> What are your thoughts on that?
> 
> Without annotations it's hard to tell. The problem is that the
> compiler can still emit a word-sized load, even if you're just
> checking 1 bit. The instrumentation emitted for KCSAN only cares about
> loads/stores, where access size is in number of bytes and not bits,
> since that's what the compiler has to emit.  So, strictly speaking
> these are data races: concurrent reads / writes where at least one
> access is plain.
> 
> With the above caveat out of the way, we already have the following
> defaults in KCSAN (after similar requests):
> 1. KCSAN ignores same-value stores, i.e. races with writes that appear
> to write the same value do not result in data race reports.
> 2. KCSAN does not demand aligned writes (including things like 'var++'
> if there are no concurrent writers) up to word size to be marked (with
> READ_ONCE etc.), assuming there is no way for the compiler to screw
> these up. [I still recommend writes to be marked though, if at all
> possible, because I'm still not entirely convinced it's always safe!]
> 
> So, because of (2), KCSAN will not complain if you have something like
> 'flags |= SOME_FLAG' (where the read is marked). Because of (1), it'll
> still complain about 'flags & SOME_FLAG' though, since the load is not
> marked, and only sees this is a word-sized access (assuming flags is a
> long) where a bit changed.
> 
> I don't think it's possible to easily convey to KCSAN which bits of an
> access you only care about, so that we could extend (1).   Ideas?


I'm drawing a blank as far as easy ways forward, so I'm just going accept
the compiler word-level constraints as a "given". I was hoping maybe you
had some magic available, just checking. :)


> 
>>>> A similar thing was brought up before, i.e., anything compared to zero is immune to load-tearing
>>>> issues, but it is rather difficult to implement it in the compiler, so it was settled to use data_race(),
>>>>
>>>> https://lore.kernel.org/lkml/CANpmjNN8J1oWtLPHTgCwbbtTuU_Js-8HD=cozW5cYkm8h-GTBg@mail.gmail.com/#r
>>>>
>>>
>>> Thanks for that link to the previous discussion, good context.
>>>
>>>>>
>>>>> b) Add a new, better-named macro to indicate what's going on. Initial bikeshed-able
>>>>>  candidates:
>>>>>
>>>>>     READ_RO_BITS()
>>>>>     READ_IMMUTABLE_BITS()
>>>>>     ...etc...
>>>>>
> 
> This could work, but 'READ_BITS()' is enough, if KCSAN's same-value
> filter is default on anyway.  Although my preference is also to avoid
> more macros if possible.


So it looks like we're probably stuck with having to annotate the code. Given
that, there is a balance between how many macros, and how much commenting. For
example, if there is a single macro (data_race, for example), then we'll need to
add comments for the various cases, explaining which data_race situation is 
happening.

That's still true, but to a lesser extent if more macros are added. In this case,
I suspect that READ_BITS() makes the commenting easier and shorter. So I'd tentatively
lead towards adding it, but what do others on the list think?



thanks,
-- 
John Hubbard
NVIDIA

> 
>>>> Actually, Linus might hate those kinds of complication rather than a simple data_race() macro,
>>>>
>>>> https://lore.kernel.org/linux-fsdevel/CAHk-=wg5CkOEF8DTez1Qu0XTEFw_oHhxN98bDnFqbY7HL5AB2g@mail.gmail.com/
>>>>
>>>
>>> Another good link. However, my macros above haven't been proposed yet, and I'm perfectly
>>> comfortable proposing something that Linus *might* (or might not!) hate. No point in
>>> guessing about it, IMHO.
>>>
>>> If you want, I'll be happy to put on my flame suit and post a patchset proposing
>>> READ_IMMUTABLE_BITS() (or a better-named thing, if someone has another name idea).  :)
>>>
>>
>> BTW, the current comment said (note, it is called “access” which in this case it does read the whole word
>> rather than those 3 bits, even though it is only those bits are of interested for us),
>>
>> /*
>>  * data_race(): macro to document that accesses in an expression may conflict with
>>  * other concurrent accesses resulting in data races, but the resulting
>>  * behaviour is deemed safe regardless.
>>  *
>>  * This macro *does not* affect normal code generation, but is a hint to tooling
>>  * that data races here should be ignored.
>>  */
>>
>> Macro might have more to say.
> 
> I agree that data_race() would be the most suitable here, since
> technically it's still a data race. It just so happens that we end up
> "throwing away" most of the bits of the access, and just care about a
> few bits.
> 
> Thanks,
> -- Marco
>

next prev parent reply	other threads:[~2020-02-09  1:44 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-06 13:17 Qian Cai
2020-02-06 13:33 ` David Hildenbrand
2020-02-06 13:51   ` Qian Cai
2020-02-06 14:55   ` Jan Kara
2020-02-06 14:59     ` David Hildenbrand
2020-02-06 15:23     ` Qian Cai
2020-02-06 23:34       ` John Hubbard
2020-02-06 23:36         ` John Hubbard
2020-02-06 23:55         ` Qian Cai
2020-02-07  0:18         ` Qian Cai
2020-02-07  0:27           ` John Hubbard
2020-02-07  0:55             ` Qian Cai
2020-02-07 13:17               ` Marco Elver
2020-02-09  1:44                 ` John Hubbard [this message]
2020-02-09  3:10                   ` Qian Cai
2020-02-09  7:12                     ` John Hubbard
2020-02-10  7:48                       ` Marco Elver
2020-02-10 12:16                         ` Qian Cai
2020-02-10 12:58                           ` Marco Elver
2020-02-10 13:36                             ` Qian Cai
2020-02-10 13:38                               ` Marco Elver
2020-02-10 13:55                                 ` Qian Cai
2020-02-10 14:12                                   ` Marco Elver
2020-02-10 14:31                                     ` Qian Cai
2020-02-10 16:23                         ` Qian Cai
2020-02-10 16:33                           ` Marco Elver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5402183a-2372-b442-84d3-c28fb59fa7af@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=cai@lca.pw \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=elver@google.com \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=paulmck@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox