From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F62EC433EF for ; Tue, 16 Nov 2021 21:54:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 477E761B39 for ; Tue, 16 Nov 2021 21:54:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 477E761B39 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id E15636B0071; Tue, 16 Nov 2021 16:54:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DC4C16B0072; Tue, 16 Nov 2021 16:54:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CB3686B0073; Tue, 16 Nov 2021 16:54:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0191.hostedemail.com [216.40.44.191]) by kanga.kvack.org (Postfix) with ESMTP id B9FEB6B0071 for ; Tue, 16 Nov 2021 16:54:13 -0500 (EST) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 840B3852E3 for ; Tue, 16 Nov 2021 21:54:03 +0000 (UTC) X-FDA: 78816146766.31.D91EB4C Received: from mail-il1-f171.google.com (mail-il1-f171.google.com [209.85.166.171]) by imf26.hostedemail.com (Postfix) with ESMTP id D0C3220019CF for ; Tue, 16 Nov 2021 21:54:03 +0000 (UTC) Received: by mail-il1-f171.google.com with SMTP id l8so662261ilv.3 for ; Tue, 16 Nov 2021 13:54:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=NZVUIohL2barCI4V/DC/WSWlEx32HYD+S0UuqJH4pmw=; b=J14Gg3v0wDj1dDeKycL5ynNxtXEerxkkN0j2Hk4bq7P9Wl+j1YEna1LEGecZ8O+iWE bsIcWfI5/ShCKxbLwmT1uq7eYdWNYTPLJMHXAyhYPfKntfLUIRCsZt/cNDSrz+Uq6fs/ DunVGV7JGzpyoKOeve2e9TuS/4+CVZNUY0QbB6FKpMNQzjQI+kJlWqQxSK72hbJSGPnV McEXyUFrPCevFYAIqcjd+ssJ/LzvNvdfdFxsHroZkwMAywzDbnEEKsxFENsTrVTJ8TFP HPob0jJmXWavIzdGZwCzwEKka+Nru9gXM5xO5S+rajbA9kyYTJKU5GIM8AePoQUQgtf1 wAYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=NZVUIohL2barCI4V/DC/WSWlEx32HYD+S0UuqJH4pmw=; b=shQJtpCplZZfgeg8ezUKzW4zWn8U62BM3cfDOvMi5avSYd+k/bEZvfJdjUuzpM4cVu 3NEuuzd8e6Gy9/3lxIBjO1xEljptqlkzNJwqvtVhSSGDINspqV4570SfVdYWhGf6Ymoa MylK9Ykrs0154ELE27AvZ+yJ3tNw88IO4KSuGRqXsFYV72B40sca34If3AEDz72wi0mc +GQV9rJmNuXM+K9n90I/QuQdxC192JjnrG4C3uWbSja6yyVUOF0bfpgCGjgh9O61U+M4 LNbyU3ts7QetTsgly5paxgljIj0f8hs2i1on8j80Erlds0KYxCRk7rfyZW4LJ9ckCvtm XATQ== X-Gm-Message-State: AOAM531mg9gpcncZow/JXkrcf3MR4qwQFbyLyXdarzKECzPu2vs5csoF FzAwB12Dj3WKoDrIm2vVZsMoy7JmWiKFztoVGfkrtQ== X-Google-Smtp-Source: ABdhPJz2MEtLIXeGRj2hnQVHakndrY2gWgC2prN5Fr4jZksKOoPobpJVp6WCTqdVcwTdFXmtBWMnU3soxLdG4kaLi+c= X-Received: by 2002:a05:6e02:1c46:: with SMTP id d6mr6886998ilg.79.1637099642366; Tue, 16 Nov 2021 13:54:02 -0800 (PST) MIME-Version: 1.0 References: <6887a91a-9ec8-e06e-4507-b2dff701a147@oracle.com> In-Reply-To: From: Mina Almasry Date: Tue, 16 Nov 2021 13:53:51 -0800 Message-ID: Subject: Re: [PATCH v6] hugetlb: Add hugetlb.*.numa_stat file To: Marco Elver Cc: Shakeel Butt , paulmck@kernel.org, Mike Kravetz , Muchun Song , Andrew Morton , Shuah Khan , Miaohe Lin , Oscar Salvador , Michal Hocko , David Rientjes , Jue Wang , Yang Yao , Joanna Li , Cannon Matthews , Linux Memory Management List , LKML , kasan-dev@googlegroups.com Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=J14Gg3v0; spf=pass (imf26.hostedemail.com: domain of almasrymina@google.com designates 209.85.166.171 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: i49suct5yatbicw43q3a8wp8jjg53qzg X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: D0C3220019CF X-HE-Tag: 1637099643-538999 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Nov 16, 2021 at 1:48 PM Marco Elver wrote: > > On Tue, Nov 16, 2021 at 12:59PM -0800, Shakeel Butt wrote: > > On Tue, Nov 16, 2021 at 12:48 PM Mina Almasry wrote: > [...] > > > > Per above, probably unlikely, but allowed. WRITE_ONCE should prevent it, > > > > and at least relieve you to not worry about it (and shift the burden to > > > > WRITE_ONCE's implementation). > > > > > > > > > > Thank you very much for the detailed response. I can add READ_ONCE() > > > at the no-lock read site, that is no issue. > > > > > > However, for the writes that happen while holding the lock, the write > > > is like so: > > > + h_cg->nodeinfo[page_to_nid(page)]->usage[idx] += nr_pages; > > > > > > And like so: > > > + h_cg->nodeinfo[page_to_nid(page)]->usage[idx] -= nr_pages; > > > > > > I.e. they are increments/decrements. Sorry if I missed it but I can't > > > find an INC_ONCE(), and it seems wrong to me to do something like: > > > > > > + WRITE_ONCE(h_cg->nodeinfo[page_to_nid(page)]->usage[idx], > > > + > > > h_cg->nodeinfo[page_to_nid(page)] + nr_pages); > > From what I gather there are no concurrent writers, right? > > WRITE_ONCE(a, a + X) is perfectly fine. What it says is that you can > have concurrent readers here, but no concurrent writers (and KCSAN will > still check that). Maybe we need a more convenient macro for this idiom > one day.. > > Though I think for something like > > h_cg->nodeinfo[page_to_nid(page)]->usage[idx] += nr_pages; > > it seems there wants to be an temporary long* so that you could write > WRITE_ONCE(*usage, *usage + nr_pages) or something. > Ah, perfect, OK I can do this, and maybe add a comment explaining that we don't have concurrent writers. > > > I know we're holding a lock anyway so there is no race, but to the > > > casual reader this looks wrong as there is a race between the fetch of > > > the value and the WRITE_ONCE(). What to do here? Seems to me the most > > > reasonable thing to do is just READ_ONCE() and leave the write plain? > > > > > > > > > > How about atomic_long_t? > > That would work of course; if this is very hot path code it might be > excessive if you don't have concurrent writers. > > Looking at the patch in more detail, the counter is a stat counter that > can be read from a stat file, correct? Hypothetically, what would happen > if the reader of 'usage' reads approximate values? > > If the answer is "nothing", then this could classify as an entirely > "benign" data race and you could only use data_race() on the reader and > leave the writers unmarked using normal +=/-=. Check if it fits > "Data-Racy Reads for Approximate Diagnostics" [1]. > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/access-marking.txt#n74 Thank you very much for your quick responses. I think if the usage returns a garbage/approximate value once in a while people will notice and I can see it causing issues. I think it's worth doing it 'properly' here. I'll upload another version with these changes.