From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=fDmo=QD=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id EC452C433F5
	for <linux-mm@archiver.kernel.org>; Tue, 16 Nov 2021 20:49:00 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 76EAB60E9B
	for <linux-mm@archiver.kernel.org>; Tue, 16 Nov 2021 20:49:00 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 76EAB60E9B
Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org
Received: by kanga.kvack.org (Postfix)
	id E11616B007E; Tue, 16 Nov 2021 15:48:49 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id DC0836B0080; Tue, 16 Nov 2021 15:48:49 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id C92AE6B0081; Tue, 16 Nov 2021 15:48:49 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0164.hostedemail.com [216.40.44.164])
	by kanga.kvack.org (Postfix) with ESMTP id B8D256B007E
	for <linux-mm@kvack.org>; Tue, 16 Nov 2021 15:48:49 -0500 (EST)
Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay05.hostedemail.com (Postfix) with ESMTP id 73C50181A8DC0
	for <linux-mm@kvack.org>; Tue, 16 Nov 2021 20:48:39 +0000 (UTC)
X-FDA: 78815981958.06.79CC122
Received: from mail-io1-f50.google.com (mail-io1-f50.google.com [209.85.166.50])
	by imf29.hostedemail.com (Postfix) with ESMTP id 18D899000269
	for <linux-mm@kvack.org>; Tue, 16 Nov 2021 20:48:38 +0000 (UTC)
Received: by mail-io1-f50.google.com with SMTP id v23so216476iom.12
        for <linux-mm@kvack.org>; Tue, 16 Nov 2021 12:48:38 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=jOfzVabuJm7V2x9GPx6NulAgjy/f+/1BiIEsCRGFw8Q=;
        b=d0QjB9OrJW42f2CMSTT8nigsMK2YcTu1IunDXCe0HtQ2Y1GWEgPhBelSqYd5QLHGEp
         Ezd+ILnJ+Y1rdWDl6d0UaWMpOO1jaTyL9gi4xmBqYYq14gNFr+nnikihI0KYBdQ59iN8
         FduVfjTxyMHy5Uuu0iDkkL/qQf7p4zEBqhWJcuDNEYlV6/REdUGx3j3KSCGv1LrVF+om
         TH4rw9HabOrGdfXDgMJXVvYt3eF4YIMl3Co3zRR2kg4znyu0PiXp0Q+PA5tNDUK0TRwO
         u4NXTOsyue2Pp3Wu8bB7cKR8BcSZZ+OIxxpYUIHPprtH/cdp2yoXexhoiN5SyFvfuHxk
         UkFQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=jOfzVabuJm7V2x9GPx6NulAgjy/f+/1BiIEsCRGFw8Q=;
        b=CLFRalyiFUWnUA7OuuglE81EAJtEg9kF4lUJ87mReNqwJedWcbtHoun7s9MnejMVVH
         RmxPXL16aqs+CyopOqZuri8IeCYtCRT9ZmgFqmeOFEUXINPuM5V6RKcRg3O2fJM7hCh+
         LfVj8lswWYcbQneDbzPpRh5NWS4myMcsKYDNcM/b1UhpNmTKrpI9lqC6bEJK51k1UCtM
         Y67oFWePCQkI4ODCRu5NGn4BTlSja7G6WGQbY5jOvFXpJhsXNq8vUYXQHRt+u06aUxTL
         6E+7HrURb59bR/chjhA03ZZAlOg3zbN78gkmyud8a+KOsvq1/0TTBXizRiMT7p1txb1f
         gRWg==
X-Gm-Message-State: AOAM532vki97Z1986inOrjx9Z33STKOTrlDimvtYmu3JrArUbTADwP0j
	jf0tGcmdn4H5v9X5H2s+uIRiVSHRLvfkF47NSd/7SA==
X-Google-Smtp-Source: ABdhPJyxRw6ndMv9AwzMheU8BKwMb0mb+M+t7Xe0OUOvGrTtgJvgG5eenrNbhYWbG72zG3fjIFLJku6BZ2DJdUv96Sw=
X-Received: by 2002:a5e:cb0d:: with SMTP id p13mr7165308iom.71.1637095718250;
 Tue, 16 Nov 2021 12:48:38 -0800 (PST)
MIME-Version: 1.0
References: <20211111015037.4092956-1-almasrymina@google.com>
 <CAMZfGtWj5LU0ygDpH9B58R48kM8w3tnowQDD53VNMifSs5uvig@mail.gmail.com>
 <cfa5a07d-1a2a-abee-ef8c-63c5480af23d@oracle.com> <CAMZfGtVjrMC1+fm6JjQfwFHeZN3dcddaAogZsHFEtL4HJyhYUw@mail.gmail.com>
 <CAHS8izPjJRf50yAtB0iZmVBi1LNKVHGmLb6ayx7U2+j8fzSgJA@mail.gmail.com>
 <CALvZod7VPD1rn6E9_1q6VzvXQeHDeE=zPRpr9dBcj5iGPTGKfA@mail.gmail.com>
 <CAMZfGtWJGqbji3OexrGi-uuZ6_LzdUs0q9Vd66SwH93_nfLJLA@mail.gmail.com>
 <6887a91a-9ec8-e06e-4507-b2dff701a147@oracle.com> <CAHS8izP3aOZ6MOOH-eMQ2HzJy2Y8B6NYY-FfJiyoKLGu7_OoJA@mail.gmail.com>
 <CALvZod7UEo100GLg+HW-CG6rp7gPJhdjYtcPfzaPMS7Yxa=ZPA@mail.gmail.com> <YZOeUAk8jqO7uiLd@elver.google.com>
In-Reply-To: <YZOeUAk8jqO7uiLd@elver.google.com>
From: Mina Almasry <almasrymina@google.com>
Date: Tue, 16 Nov 2021 12:48:26 -0800
Message-ID: <CAHS8izPV20pD8nKEsnEYicaCKLH7A+QTYphWRrtTqcppzoQAWg@mail.gmail.com>
Subject: Re: [PATCH v6] hugetlb: Add hugetlb.*.numa_stat file
To: Marco Elver <elver@google.com>
Cc: Shakeel Butt <shakeelb@google.com>, paulmck@kernel.org, 
	Mike Kravetz <mike.kravetz@oracle.com>, Muchun Song <songmuchun@bytedance.com>, 
	Andrew Morton <akpm@linux-foundation.org>, Shuah Khan <shuah@kernel.org>, 
	Miaohe Lin <linmiaohe@huawei.com>, Oscar Salvador <osalvador@suse.de>, Michal Hocko <mhocko@suse.com>, 
	David Rientjes <rientjes@google.com>, Jue Wang <juew@google.com>, Yang Yao <ygyao@google.com>, 
	Joanna Li <joannali@google.com>, Cannon Matthews <cannonmatthews@google.com>, 
	Linux Memory Management List <linux-mm@kvack.org>, LKML <linux-kernel@vger.kernel.org>, 
	kasan-dev@googlegroups.com
Content-Type: text/plain; charset="UTF-8"
X-Rspamd-Server: rspam10
X-Rspamd-Queue-Id: 18D899000269
X-Stat-Signature: ccrmbs69qdynf8g65t8rbipeecpnp3r8
Authentication-Results: imf29.hostedemail.com;
	dkim=pass header.d=google.com header.s=20210112 header.b=d0QjB9Or;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf29.hostedemail.com: domain of almasrymina@google.com designates 209.85.166.50 as permitted sender) smtp.mailfrom=almasrymina@google.com
X-HE-Tag: 1637095718-885191
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue, Nov 16, 2021 at 4:04 AM Marco Elver <elver@google.com> wrote:
>
> On Mon, Nov 15, 2021 at 11:59AM -0800, Shakeel Butt wrote:
> > On Mon, Nov 15, 2021 at 10:55 AM Mina Almasry <almasrymina@google.com> wrote:
> [...]
> > > Sorry I'm still a bit confused. READ_ONCE/WRITE_ONCE isn't documented
> > > to provide atomicity to the write or read, just prevents the compiler
> > > from re-ordering them. Is there something I'm missing, or is the
> > > suggestion to add READ_ONCE/WRITE_ONCE simply to supress the KCSAN
> > > warnings?
>
> It's actually the opposite: READ_ONCE/WRITE_ONCE provide very little
> ordering (modulo dependencies) guarantees, which includes ordering by
> compiler, but are supposed to provide atomicity (when used with properly
> aligned types up to word size [1]; see __READ_ONCE for non-atomic
> variant).
>
> Some more background...
>
> The warnings that KCSAN tells you about are "data races", which occur
> when you have conflicting concurrent accesses, one of which is "plain"
> and at least one write. I think [2] provides a reasonable summary of
> data races and why we should care.
>
> For Linux, our own memory model (LKMM) documents this [3], and says that
> as long as concurrent operations are marked (non-plain; e.g. *ONCE),
> there won't be any data races.
>
> There are multiple reasons why data races are undesirable, one of which
> is to avoid bad compiler transformations [4], because compilers are
> oblivious to concurrency otherwise.
>
> Why do marked operations avoid data races and prevent miscompiles?
> Among other things, because they should be executed atomically. If they
> weren't a lot of code would be buggy (there had been cases where the old
> READ_ONCE could be used on data larger than word size, which certainly
> weren't atomic, but this is no longer possible).
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/asm-generic/rwonce.h#n35
> [2] https://lwn.net/Articles/816850/#Why%20should%20we%20care%20about%20data%20races?
> [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/explanation.txt#n1920
> [4] https://lwn.net/Articles/793253/
>
> Some rules of thumb when to use which marking:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/access-marking.txt
>
> In an ideal world, we'd have all intentionally concurrent accesses
> marked. As-is, KCSAN will find:
>
> A. Data race, where failure due to current compilers is unlikely
>    (supposedly "benign"); merely marking the accesses appropriately is
>    sufficient. Finding a crash for these will require a miscompilation,
>    but otherwise look "benign" at the C-language level.
>
> B. Race-condition bugs where the bug manifests as a data race, too --
>    simply marking things doesn't fix the problem. These are the types of
>    bugs where a data race would point out a more severe issue.
>
> Right now we have way too much of type (A), which means looking for (B)
> requires patience.
>
> > +Paul & Marco
> >
> > Let's ask the experts.
> >
> > We have a "unsigned long usage" variable that is updated within a lock
> > (hugetlb_lock) but is read without the lock.
> >
> > Q1) I think KCSAN will complain about it and READ_ONCE() in the
> > unlocked read path should be good enough to silent KCSAN. So, the
> > question is should we still use WRITE_ONCE() as well for usage within
> > hugetlb_lock?
>
> KCSAN's default config will forgive the lack of WRITE_ONCE().
> Technically it's still a data race (which KCSAN can find with a config
> change), but can be forgiven because compilers are less likely to cause
> trouble for writes (background: https://lwn.net/Articles/816854/ bit
> about "Unmarked writes (aligned and up to word size)...").
>
> I would mark both if feasible, as it clearly documents the fact the
> write can be read concurrently.
>
> > Q2) Second question is more about 64 bit archs breaking a 64 bit write
> > into two 32 bit writes. Is this a real issue? If yes, then the
> > combination of READ_ONCE()/WRITE_ONCE() are good enough for the given
> > use-case?
>
> Per above, probably unlikely, but allowed. WRITE_ONCE should prevent it,
> and at least relieve you to not worry about it (and shift the burden to
> WRITE_ONCE's implementation).
>

Thank you very much for the detailed response. I can add READ_ONCE()
at the no-lock read site, that is no issue.

However, for the writes that happen while holding the lock, the write
is like so:
+               h_cg->nodeinfo[page_to_nid(page)]->usage[idx] += nr_pages;

And like so:
+               h_cg->nodeinfo[page_to_nid(page)]->usage[idx] -= nr_pages;

I.e. they are increments/decrements. Sorry if I missed it but I can't
find an INC_ONCE(), and it seems wrong to me to do something like:

+               WRITE_ONCE(h_cg->nodeinfo[page_to_nid(page)]->usage[idx],
+
h_cg->nodeinfo[page_to_nid(page)] + nr_pages);

I know we're holding a lock anyway so there is no race, but to the
casual reader this looks wrong as there is a race between the fetch of
the value and the WRITE_ONCE(). What to do here? Seems to me the most
reasonable thing to do is just READ_ONCE() and leave the write plain?


> Thanks,
> -- Marco