Re: [PATCH v6] hugetlb: Add hugetlb.*.numa_stat file

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mina Almasry <almasrymina@google.com>
To: Marco Elver <elver@google.com>
Cc: Shakeel Butt <shakeelb@google.com>,
	paulmck@kernel.org,  Mike Kravetz <mike.kravetz@oracle.com>,
	Muchun Song <songmuchun@bytedance.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Shuah Khan <shuah@kernel.org>,  Miaohe Lin <linmiaohe@huawei.com>,
	Oscar Salvador <osalvador@suse.de>,
	Michal Hocko <mhocko@suse.com>,
	 David Rientjes <rientjes@google.com>, Jue Wang <juew@google.com>,
	Yang Yao <ygyao@google.com>,  Joanna Li <joannali@google.com>,
	Cannon Matthews <cannonmatthews@google.com>,
	 Linux Memory Management List <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	 kasan-dev@googlegroups.com
Subject: Re: [PATCH v6] hugetlb: Add hugetlb.*.numa_stat file
Date: Tue, 16 Nov 2021 12:48:26 -0800	[thread overview]
Message-ID: <CAHS8izPV20pD8nKEsnEYicaCKLH7A+QTYphWRrtTqcppzoQAWg@mail.gmail.com> (raw)
In-Reply-To: <YZOeUAk8jqO7uiLd@elver.google.com>

On Tue, Nov 16, 2021 at 4:04 AM Marco Elver <elver@google.com> wrote:
>
> On Mon, Nov 15, 2021 at 11:59AM -0800, Shakeel Butt wrote:
> > On Mon, Nov 15, 2021 at 10:55 AM Mina Almasry <almasrymina@google.com> wrote:
> [...]
> > > Sorry I'm still a bit confused. READ_ONCE/WRITE_ONCE isn't documented
> > > to provide atomicity to the write or read, just prevents the compiler
> > > from re-ordering them. Is there something I'm missing, or is the
> > > suggestion to add READ_ONCE/WRITE_ONCE simply to supress the KCSAN
> > > warnings?
>
> It's actually the opposite: READ_ONCE/WRITE_ONCE provide very little
> ordering (modulo dependencies) guarantees, which includes ordering by
> compiler, but are supposed to provide atomicity (when used with properly
> aligned types up to word size [1]; see __READ_ONCE for non-atomic
> variant).
>
> Some more background...
>
> The warnings that KCSAN tells you about are "data races", which occur
> when you have conflicting concurrent accesses, one of which is "plain"
> and at least one write. I think [2] provides a reasonable summary of
> data races and why we should care.
>
> For Linux, our own memory model (LKMM) documents this [3], and says that
> as long as concurrent operations are marked (non-plain; e.g. *ONCE),
> there won't be any data races.
>
> There are multiple reasons why data races are undesirable, one of which
> is to avoid bad compiler transformations [4], because compilers are
> oblivious to concurrency otherwise.
>
> Why do marked operations avoid data races and prevent miscompiles?
> Among other things, because they should be executed atomically. If they
> weren't a lot of code would be buggy (there had been cases where the old
> READ_ONCE could be used on data larger than word size, which certainly
> weren't atomic, but this is no longer possible).
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/asm-generic/rwonce.h#n35
> [2] https://lwn.net/Articles/816850/#Why%20should%20we%20care%20about%20data%20races?
> [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/explanation.txt#n1920
> [4] https://lwn.net/Articles/793253/
>
> Some rules of thumb when to use which marking:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/access-marking.txt
>
> In an ideal world, we'd have all intentionally concurrent accesses
> marked. As-is, KCSAN will find:
>
> A. Data race, where failure due to current compilers is unlikely
>    (supposedly "benign"); merely marking the accesses appropriately is
>    sufficient. Finding a crash for these will require a miscompilation,
>    but otherwise look "benign" at the C-language level.
>
> B. Race-condition bugs where the bug manifests as a data race, too --
>    simply marking things doesn't fix the problem. These are the types of
>    bugs where a data race would point out a more severe issue.
>
> Right now we have way too much of type (A), which means looking for (B)
> requires patience.
>
> > +Paul & Marco
> >
> > Let's ask the experts.
> >
> > We have a "unsigned long usage" variable that is updated within a lock
> > (hugetlb_lock) but is read without the lock.
> >
> > Q1) I think KCSAN will complain about it and READ_ONCE() in the
> > unlocked read path should be good enough to silent KCSAN. So, the
> > question is should we still use WRITE_ONCE() as well for usage within
> > hugetlb_lock?
>
> KCSAN's default config will forgive the lack of WRITE_ONCE().
> Technically it's still a data race (which KCSAN can find with a config
> change), but can be forgiven because compilers are less likely to cause
> trouble for writes (background: https://lwn.net/Articles/816854/ bit
> about "Unmarked writes (aligned and up to word size)...").
>
> I would mark both if feasible, as it clearly documents the fact the
> write can be read concurrently.
>
> > Q2) Second question is more about 64 bit archs breaking a 64 bit write
> > into two 32 bit writes. Is this a real issue? If yes, then the
> > combination of READ_ONCE()/WRITE_ONCE() are good enough for the given
> > use-case?
>
> Per above, probably unlikely, but allowed. WRITE_ONCE should prevent it,
> and at least relieve you to not worry about it (and shift the burden to
> WRITE_ONCE's implementation).
>

Thank you very much for the detailed response. I can add READ_ONCE()
at the no-lock read site, that is no issue.

However, for the writes that happen while holding the lock, the write
is like so:
+               h_cg->nodeinfo[page_to_nid(page)]->usage[idx] += nr_pages;

And like so:
+               h_cg->nodeinfo[page_to_nid(page)]->usage[idx] -= nr_pages;

I.e. they are increments/decrements. Sorry if I missed it but I can't
find an INC_ONCE(), and it seems wrong to me to do something like:

+               WRITE_ONCE(h_cg->nodeinfo[page_to_nid(page)]->usage[idx],
+
h_cg->nodeinfo[page_to_nid(page)] + nr_pages);

I know we're holding a lock anyway so there is no race, but to the
casual reader this looks wrong as there is a race between the fetch of
the value and the WRITE_ONCE(). What to do here? Seems to me the most
reasonable thing to do is just READ_ONCE() and leave the write plain?


> Thanks,
> -- Marco

next prev parent reply	other threads:[~2021-11-16 20:49 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-11  1:50 Mina Almasry
2021-11-11  2:04 ` Shakeel Butt
2021-11-11  2:36 ` Muchun Song
2021-11-11  2:59   ` Shakeel Butt
2021-11-12 23:36   ` Mike Kravetz
2021-11-13  2:44     ` Muchun Song
2021-11-13 14:48       ` Mina Almasry
2021-11-13 19:15         ` Shakeel Butt
2021-11-14 13:43           ` Muchun Song
2021-11-15 18:22             ` Mike Kravetz
2021-11-15 18:55               ` Mina Almasry
2021-11-15 19:59                 ` Shakeel Butt
2021-11-16 12:04                   ` Marco Elver
2021-11-16 20:48                     ` Mina Almasry [this message]
2021-11-16 20:59                       ` Shakeel Butt
2021-11-16 21:47                         ` Marco Elver
2021-11-16 21:53                           ` Mina Almasry
2021-11-17  5:54                       ` Muchun Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHS8izPV20pD8nKEsnEYicaCKLH7A+QTYphWRrtTqcppzoQAWg@mail.gmail.com \
    --to=almasrymina@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cannonmatthews@google.com \
    --cc=elver@google.com \
    --cc=joannali@google.com \
    --cc=juew@google.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=osalvador@suse.de \
    --cc=paulmck@kernel.org \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=songmuchun@bytedance.com \
    --cc=ygyao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox