From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6EBB1C433EF for ; Wed, 17 Nov 2021 05:55:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EE9986321B for ; Wed, 17 Nov 2021 05:55:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org EE9986321B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 830176B0072; Wed, 17 Nov 2021 00:55:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7DF036B0073; Wed, 17 Nov 2021 00:55:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6CE5C6B0074; Wed, 17 Nov 2021 00:55:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0036.hostedemail.com [216.40.44.36]) by kanga.kvack.org (Postfix) with ESMTP id 5CD626B0072 for ; Wed, 17 Nov 2021 00:55:16 -0500 (EST) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 1C6501837E1A6 for ; Wed, 17 Nov 2021 05:55:06 +0000 (UTC) X-FDA: 78817359012.02.04DFCC5 Received: from mail-yb1-f177.google.com (mail-yb1-f177.google.com [209.85.219.177]) by imf31.hostedemail.com (Postfix) with ESMTP id BD158105298B for ; Wed, 17 Nov 2021 05:55:04 +0000 (UTC) Received: by mail-yb1-f177.google.com with SMTP id q74so3958574ybq.11 for ; Tue, 16 Nov 2021 21:55:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=BrkwcwkttrB5ifjfBku0kt7bp7zUGWIamThouDnJlbw=; b=XeG50oA0pSmQIl4s9m19nCoQ/qhJfkx1CWWZt9PKu2D9JHuRk1/OA25YipxEXSDTIh gGnxfnvCeM+7y5i0yYI6XlSfXSIOdb151vTk1oGutMuQAe2vxYPbNTEdhzjMCNP3M+zJ txEXg/ciI4r471SnHQQRgEtGiT20/RmxMK+Z6ImS225WwQL44am4PqdnhIA713wXYLxa AhTXUoupPo4n/kP630i6X26xd0fJEc254ziKwtSqRM8UUmOuLoQfzwOD+U6FKSK/FG2y G7dpck4sPbhjSI11TZHU3d/ZGxHxyqO0WdcmybROB4+z+sNkRgFhWfW4LxFUfNpTOVPw Zi7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=BrkwcwkttrB5ifjfBku0kt7bp7zUGWIamThouDnJlbw=; b=mEjjDJkC5ourRhAv2qaXyGWBuh+3lwEYhbTD4NZW6Ce+2nHsK+kM1Th5usx+WBUscF D+lU1Xd55IvcGem/tGvplA3Zs/el1zOoYssGmKWF26EGsb5gxDnlIeaWmLTCFy1j7Kkp dQXNyLT8b1qiEOzAa9JKKLNPBloe2iLvS0U3vMQitJ/YzCzwxXSDg0NwsrpYVhv6nAr2 B5lrPgdrw9pj3CqjsRL2ciy6QeiO4hFrTZW/VQtmrcMPYfO8AUt35suhmAY/35YPXjKe v8NZfK5cwzikoNtuf7H6qklqmZUUpX9ul5qdOOQ9mWaGIQhLaaNbWvBWLx10FkYhS+sM mjaA== X-Gm-Message-State: AOAM531ROM8HNAMWZuncWjM+5vqSwzRm5ERLVGMegKCZ6CWrEO+K90tj WPmy7y/g+xlliFGWJLWeEq7cHItIR3UI/EA2Qfkz8Q== X-Google-Smtp-Source: ABdhPJyeq8yH1nEP2iL4JiJ9qDuYdG/Fd+x7IIVcNcW1bY/kl+StZbqwvb9KcV41WeEPG79niDldKPMWLEpibzAcxlA= X-Received: by 2002:a25:ef0b:: with SMTP id g11mr15001602ybd.404.1637128503870; Tue, 16 Nov 2021 21:55:03 -0800 (PST) MIME-Version: 1.0 References: <20211111015037.4092956-1-almasrymina@google.com> <6887a91a-9ec8-e06e-4507-b2dff701a147@oracle.com> In-Reply-To: From: Muchun Song Date: Wed, 17 Nov 2021 13:54:25 +0800 Message-ID: Subject: Re: [PATCH v6] hugetlb: Add hugetlb.*.numa_stat file To: Mina Almasry Cc: Marco Elver , Shakeel Butt , "Paul E. McKenney" , Mike Kravetz , Andrew Morton , Shuah Khan , Miaohe Lin , Oscar Salvador , Michal Hocko , David Rientjes , Jue Wang , Yang Yao , Joanna Li , Cannon Matthews , Linux Memory Management List , LKML , kasan-dev@googlegroups.com Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: c73e4m6fxqbdwnz4uiw4sfhsi8oydo33 X-Rspamd-Queue-Id: BD158105298B X-Rspamd-Server: rspam07 Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=XeG50oA0; spf=pass (imf31.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.219.177 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-HE-Tag: 1637128504-877233 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Nov 17, 2021 at 4:48 AM Mina Almasry wrote: > > On Tue, Nov 16, 2021 at 4:04 AM Marco Elver wrote: > > > > On Mon, Nov 15, 2021 at 11:59AM -0800, Shakeel Butt wrote: > > > On Mon, Nov 15, 2021 at 10:55 AM Mina Almasry wrote: > > [...] > > > > Sorry I'm still a bit confused. READ_ONCE/WRITE_ONCE isn't documented > > > > to provide atomicity to the write or read, just prevents the compiler > > > > from re-ordering them. Is there something I'm missing, or is the > > > > suggestion to add READ_ONCE/WRITE_ONCE simply to supress the KCSAN > > > > warnings? > > > > It's actually the opposite: READ_ONCE/WRITE_ONCE provide very little > > ordering (modulo dependencies) guarantees, which includes ordering by > > compiler, but are supposed to provide atomicity (when used with properly > > aligned types up to word size [1]; see __READ_ONCE for non-atomic > > variant). > > > > Some more background... > > > > The warnings that KCSAN tells you about are "data races", which occur > > when you have conflicting concurrent accesses, one of which is "plain" > > and at least one write. I think [2] provides a reasonable summary of > > data races and why we should care. > > > > For Linux, our own memory model (LKMM) documents this [3], and says that > > as long as concurrent operations are marked (non-plain; e.g. *ONCE), > > there won't be any data races. > > > > There are multiple reasons why data races are undesirable, one of which > > is to avoid bad compiler transformations [4], because compilers are > > oblivious to concurrency otherwise. > > > > Why do marked operations avoid data races and prevent miscompiles? > > Among other things, because they should be executed atomically. If they > > weren't a lot of code would be buggy (there had been cases where the old > > READ_ONCE could be used on data larger than word size, which certainly > > weren't atomic, but this is no longer possible). > > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/asm-generic/rwonce.h#n35 > > [2] https://lwn.net/Articles/816850/#Why%20should%20we%20care%20about%20data%20races? > > [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/explanation.txt#n1920 > > [4] https://lwn.net/Articles/793253/ > > > > Some rules of thumb when to use which marking: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/access-marking.txt > > > > In an ideal world, we'd have all intentionally concurrent accesses > > marked. As-is, KCSAN will find: > > > > A. Data race, where failure due to current compilers is unlikely > > (supposedly "benign"); merely marking the accesses appropriately is > > sufficient. Finding a crash for these will require a miscompilation, > > but otherwise look "benign" at the C-language level. > > > > B. Race-condition bugs where the bug manifests as a data race, too -- > > simply marking things doesn't fix the problem. These are the types of > > bugs where a data race would point out a more severe issue. > > > > Right now we have way too much of type (A), which means looking for (B) > > requires patience. > > > > > +Paul & Marco > > > > > > Let's ask the experts. > > > > > > We have a "unsigned long usage" variable that is updated within a lock > > > (hugetlb_lock) but is read without the lock. > > > > > > Q1) I think KCSAN will complain about it and READ_ONCE() in the > > > unlocked read path should be good enough to silent KCSAN. So, the > > > question is should we still use WRITE_ONCE() as well for usage within > > > hugetlb_lock? > > > > KCSAN's default config will forgive the lack of WRITE_ONCE(). > > Technically it's still a data race (which KCSAN can find with a config > > change), but can be forgiven because compilers are less likely to cause > > trouble for writes (background: https://lwn.net/Articles/816854/ bit > > about "Unmarked writes (aligned and up to word size)..."). > > > > I would mark both if feasible, as it clearly documents the fact the > > write can be read concurrently. > > > > > Q2) Second question is more about 64 bit archs breaking a 64 bit write > > > into two 32 bit writes. Is this a real issue? If yes, then the > > > combination of READ_ONCE()/WRITE_ONCE() are good enough for the given > > > use-case? > > > > Per above, probably unlikely, but allowed. WRITE_ONCE should prevent it, > > and at least relieve you to not worry about it (and shift the burden to > > WRITE_ONCE's implementation). > > > > Thank you very much for the detailed response. I can add READ_ONCE() > at the no-lock read site, that is no issue. > > However, for the writes that happen while holding the lock, the write > is like so: > + h_cg->nodeinfo[page_to_nid(page)]->usage[idx] += nr_pages; > > And like so: > + h_cg->nodeinfo[page_to_nid(page)]->usage[idx] -= nr_pages; > > I.e. they are increments/decrements. Sorry if I missed it but I can't > find an INC_ONCE(), and it seems wrong to me to do something like: > > + WRITE_ONCE(h_cg->nodeinfo[page_to_nid(page)]->usage[idx], > + > h_cg->nodeinfo[page_to_nid(page)] + nr_pages); How about using a local variable to cache h_cg->nodeinfo[page_to_nid(page)]->usage[idx], like the following. long usage = h_cg->nodeinfo[page_to_nid(page)]->usage[idx]; usage += nr_pages; WRITE_ONCE(h_cg->nodeinfo[page_to_nid(page)]->usage[idx], usage); Does this look more comfortable? > > I know we're holding a lock anyway so there is no race, but to the > casual reader this looks wrong as there is a race between the fetch of > the value and the WRITE_ONCE(). What to do here? Seems to me the most It's not an issue, because fetching is a read operation and the path of reading a stat file is also a read operation. Both are "plain" operations. > reasonable thing to do is just READ_ONCE() and leave the write plain? I suggest using WRITE_ONCE() here and READ_ONCE() in the reading. Thanks. > > > > Thanks, > > -- Marco