From: Marco Elver <elver@google.com>
To: Qian Cai <cai@lca.pw>
Cc: John Hubbard <jhubbard@nvidia.com>, Jan Kara <jack@suse.cz>,
David Hildenbrand <david@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
ira.weiny@intel.com, Dan Williams <dan.j.williams@intel.com>,
Linux Memory Management List <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
"Paul E. McKenney" <paulmck@kernel.org>,
kasan-dev <kasan-dev@googlegroups.com>
Subject: Re: [PATCH] mm: fix a data race in put_page()
Date: Mon, 10 Feb 2020 17:33:48 +0100 [thread overview]
Message-ID: <CANpmjNPzH6rCQT+Pe_atpm8D68Tt9ChMg276jECxOyNzTUS=NA@mail.gmail.com> (raw)
In-Reply-To: <1581351789.7365.32.camel@lca.pw>
On Mon, 10 Feb 2020 at 17:23, Qian Cai <cai@lca.pw> wrote:
>
> On Mon, 2020-02-10 at 08:48 +0100, Marco Elver wrote:
> > On Sun, 9 Feb 2020 at 08:15, John Hubbard <jhubbard@nvidia.com> wrote:
> > >
> > > On 2/8/20 7:10 PM, Qian Cai wrote:
> > > >
> > > >
> > > > > On Feb 8, 2020, at 8:44 PM, John Hubbard <jhubbard@nvidia.com> wrote:
> > > > >
> > > > > So it looks like we're probably stuck with having to annotate the code. Given
> > > > > that, there is a balance between how many macros, and how much commenting. For
> > > > > example, if there is a single macro (data_race, for example), then we'll need to
> > > > > add comments for the various cases, explaining which data_race situation is
> > > > > happening.
> > > >
> > > > On the other hand, it is perfect fine of not commenting on each data_race() that most of times, people could run git blame to learn more details. Actually, no maintainers from various of subsystems asked for commenting so far.
> > > >
> > >
> > > Well, maybe I'm looking at this wrong. I was thinking that one should attempt to
> > > understand the code on the screen, and that's generally best--but here, maybe
> > > "data_race" is just something that means "tool cruft", really. So mentally we
> > > would move toward visually filtering out the data_race "key word".
> >
> > One thing to note is that 'data_race()' points out concurrency, and
> > that somebody has deemed that the code won't break even with data
> > races. Somebody trying to understand or modify the code should ensure
> > this will still be the case. So, 'data_race()' isn't just tool cruft.
> > It's documentation for something that really isn't obvious from the
> > code alone.
> >
> > Whenever we see a READ_ONCE or other marked access it is obvious to
> > the reader that there are concurrent accesses happening. I'd argue
> > that for intentional data races, we should convey similar information,
> > to avoid breaking the code (of course KCSAN would tell you, but only
> > after the change was done). Even moreso, since changes to code
> > involving 'data_race()' will need re-verification that the data races
> > are still safe.
> >
> > > I really don't like it but at least there is a significant benefit from the tool
> > > that probably makes it worth the visual noise.
> > >
> > > Blue sky thoughts for The Far Future: It would be nice if the tools got a lot
> > > better--maybe in the direction of C language extensions, even if only used in
> > > this project at first.
> >
> > Still thinking about this. What we want to convey is that, while
> > there are races on the particular variable, nobody should be modifying
> > the bits here. Adding a READ_ONCE (or data_race()) would miss a
> > harmful race where somebody modifies these bits, so in principle I
> > agree. However, I think the tool can't automatically tell (even if we
> > had compiler extensions to give us the bits accessed) which bits we
> > care about, because we might have something like:
> >
> > int foo_bar = READ_ONCE(flags) >> FOO_BAR_SHIFT; // need the
> > READ_ONCE because of FOO bits
> > .. (foo_bar & FOO_MASK) .. // FOO bits can be modified concurrently
> > .. (foo_bar & BAR_MASK) .. // nobody should modify BAR bits
> > concurrently though !
> >
> > What we want is to assert that nobody touches a particular set of
> > bits. KCSAN has recently gotten ASSERT_EXCLUSIVE_{WRITER,ACCESS}
> > macros which help assert properties of concurrent code, where bugs
> > won't manifest as data races. Along those lines, I can see the value
> > in doing an exclusivity check on a bitmask of a variable.
> >
> > I don't know how much a READ_BITS macro could help, since it's
> > probably less ergonomic to have to say something like:
> > READ_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT) >> ZONES_PGSHIFT.
> >
> > Here is an alternative:
> >
> > Let's say KCSAN gives you this:
> > /* ... Assert that the bits set in mask are not written
> > concurrently; they may still be read concurrently.
> > The access that immediately follows is assumed to access those
> > bits and safe w.r.t. data races.
> >
> > For example, this may be used when certain bits of @flags may
> > only be modified when holding the appropriate lock,
> > but other bits may still be modified locklessly.
> > ...
> > */
> > #define ASSERT_EXCLUSIVE_BITS(flags, mask) ....
> >
> > Then we can write page_zonenum as follows:
> >
> > static inline enum zone_type page_zonenum(const struct page *page)
> > {
> > + ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT);
> > return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
> > }
>
> Actually, it seems still need to write if I understand correctly,
>
> ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT);
> return data_race((page->flags >> ZONES_PGSHIFT) & ZONES_MASK);
No, I designed it so you won't need 'data_race()' if you don't want
to. I'll send the patches shortly.
> On the other hand, if you really worry about this thing could go wrong, it might
> be better of using READ_ONCE() at the first place where it will be more future-
> proof with the trade-off it might generate less efficient code optimization?
The READ_ONCE() I'd still advocate for, but KCSAN won't complain if
the pattern is as written above.
> Alternatively, is there a way to write this as this?
>
> return ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT);
It's an ASSERT, without KCSAN it should do nothing, so this is wrong.
Also, this won't work because you're no longer returning the same
value. I thought about this for READ_BITS, but you'd need (I wrote
this earlier in the thread that it likely won't be suitable):
READ_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT) >> ZONES_PGSHIFT
to get the equivalent result (notice this will result in a redundant
shift). Because we have all kinds of permutations and variants of how
to extract the same bits out of some flags, it's cleaner to have one
'ASSERT_EXCLUSIVE_BITS' and just give it the bits you care about.
Thanks,
-- Marco
> Kind of ugly but it probably cleaner.
>
> >
> > This will accomplish the following:
> > 1. The current code is not touched, and we do not have to verify that
> > the change is correct without KCSAN.
> > 2. We're not introducing a bunch of special macros to read bits in various ways.
> > 3. KCSAN will assume that the access is safe, and no data race report
> > is generated.
> > 4. If somebody modifies ZONES bits concurrently, KCSAN will tell you
> > about the race.
> > 5. We're documenting the code.
> >
> > Anything I missed?
> >
> > Thanks,
> > -- Marco
> >
> >
> >
> >
> >
> > > thanks,
> > > --
> > > John Hubbard
> > > NVIDIA
> > >
> > > > >
> > > > > That's still true, but to a lesser extent if more macros are added. In this case,
> > > > > I suspect that READ_BITS() makes the commenting easier and shorter. So I'd tentatively
> > > > > lead towards adding it, but what do others on the list think?
> > > >
> > > > Even read bits could be dangerous from data races and confusing at best, so I am not really sure what the value of introducing this new macro. People who like to understand it correctly still need to read the commit logs.
> > > >
> > > > This flags->zonenum is such a special case that I don’t really see it regularly for the last few weeks digging KCSAN reports, so even if it is worth adding READ_BITS(), there are more equally important macros need to be added together to be useful initially. For example, HARMLESS_COUNTERS(), READ_SINGLE_BIT(), READ_IMMUTATABLE_BITS() etc which Linus said exactly wanted to avoid.
> > > >
prev parent reply other threads:[~2020-02-10 16:34 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-06 13:17 Qian Cai
2020-02-06 13:33 ` David Hildenbrand
2020-02-06 13:51 ` Qian Cai
2020-02-06 14:55 ` Jan Kara
2020-02-06 14:59 ` David Hildenbrand
2020-02-06 15:23 ` Qian Cai
2020-02-06 23:34 ` John Hubbard
2020-02-06 23:36 ` John Hubbard
2020-02-06 23:55 ` Qian Cai
2020-02-07 0:18 ` Qian Cai
2020-02-07 0:27 ` John Hubbard
2020-02-07 0:55 ` Qian Cai
2020-02-07 13:17 ` Marco Elver
2020-02-09 1:44 ` John Hubbard
2020-02-09 3:10 ` Qian Cai
2020-02-09 7:12 ` John Hubbard
2020-02-10 7:48 ` Marco Elver
2020-02-10 12:16 ` Qian Cai
2020-02-10 12:58 ` Marco Elver
2020-02-10 13:36 ` Qian Cai
2020-02-10 13:38 ` Marco Elver
2020-02-10 13:55 ` Qian Cai
2020-02-10 14:12 ` Marco Elver
2020-02-10 14:31 ` Qian Cai
2020-02-10 16:23 ` Qian Cai
2020-02-10 16:33 ` Marco Elver [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CANpmjNPzH6rCQT+Pe_atpm8D68Tt9ChMg276jECxOyNzTUS=NA@mail.gmail.com' \
--to=elver@google.com \
--cc=akpm@linux-foundation.org \
--cc=cai@lca.pw \
--cc=dan.j.williams@intel.com \
--cc=david@redhat.com \
--cc=ira.weiny@intel.com \
--cc=jack@suse.cz \
--cc=jhubbard@nvidia.com \
--cc=kasan-dev@googlegroups.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=paulmck@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox