From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17DFEC433EF for ; Tue, 17 May 2022 19:28:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 373FB6B0072; Tue, 17 May 2022 15:28:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 322DD6B0073; Tue, 17 May 2022 15:28:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1C4C66B0074; Tue, 17 May 2022 15:28:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 105256B0072 for ; Tue, 17 May 2022 15:28:29 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id CD46E1205AA for ; Tue, 17 May 2022 19:28:28 +0000 (UTC) X-FDA: 79476221496.23.E081606 Received: from mail-qt1-f171.google.com (mail-qt1-f171.google.com [209.85.160.171]) by imf01.hostedemail.com (Postfix) with ESMTP id 3883F400BA for ; Tue, 17 May 2022 19:28:13 +0000 (UTC) Received: by mail-qt1-f171.google.com with SMTP id x22so134922qto.2 for ; Tue, 17 May 2022 12:28:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=T+GTnHqdgmyXOm9RkzWehTMYO1EHztDKBHLGlaNOOrk=; b=kQ25g3ZuZJkfW9Uvs/URM4ncmDrq6i0piKeLO7/nWk0OvMMj4CNIcNIbNxqIwugSF8 rtDHDiUWwuKQe5I2FGGjr1cfLxJSjxy4lzzZ6NOfPzQUxtwXF3qxduuzeZ0nu2kuVNJL CHmvtctmC3GEY1CrimqTyybUbnHxYCK1tHjMGfOhCgtED6jdXvujJhEXRGt0nXXxZyzf QqQHSWf8Yn0VKEqzNDzMcxQszBPpcSfrxHwi58uNqxFfRPCBuPPfpyX4SOEV/NTxEewK 4lJpIzWcxzX7C7dHvoh6nK5YeN5B6L2d0NDAXty2hRuoQceuXW1H2XEl7j7r42Ddkxzc 5bSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=T+GTnHqdgmyXOm9RkzWehTMYO1EHztDKBHLGlaNOOrk=; b=djSlBANzNnIexYBqOM35dNknKobCugVcagQ1wOIqr8DtuVWZ/eCRZpXKtKM0kSfIr0 JMrSIZFuuLeAbDRRwo+0HKeuo/FP4q9UQdFc5EzlEz2Se7Ga/WK+rseNzUooQO2hQVAL JQauXsfMZRdmflp6zQgehgNRU89XCjoWUPh0EDa6kEjxHWsHt287XoXhLnimazfeN8SW icKg2OHoDLWXmOSOLWCleTP+F/kjfcgP5Ko0sFMaOaQYptXrrPJ1BD3/LHrg67wjgHu6 V52Gtrbs3LULzsi5oZ43tYU8umxfNhMx1soOJKUDKB8sttirj2RqFhPrqXRSfzlTs+6i xalA== X-Gm-Message-State: AOAM531yKRcRn3gEL5BdVafNhfe6pqQ98mhHm/Bn1mZdRcZSIEGZrJ5L B4kQ2zkLTk75+FHx7QzwSke9rg== X-Google-Smtp-Source: ABdhPJxwA9mNXuOulavrU7Eibv9h7BDB+IkcxLIqfNM1emBRCPvM2g7PLBMn9dePNbP/MWUY5k/8wA== X-Received: by 2002:ac8:5913:0:b0:2f3:a402:1054 with SMTP id 19-20020ac85913000000b002f3a4021054mr21594800qty.513.1652815707493; Tue, 17 May 2022 12:28:27 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-142-162-113-129.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.162.113.129]) by smtp.gmail.com with ESMTPSA id x124-20020a37ae82000000b0069fc13ce1eesm22395qke.31.2022.05.17.12.28.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 May 2022 12:28:26 -0700 (PDT) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1nr2rq-0089si-04; Tue, 17 May 2022 16:28:26 -0300 Date: Tue, 17 May 2022 16:28:25 -0300 From: Jason Gunthorpe To: John Hubbard Cc: Minchan Kim , "Paul E. McKenney" , Andrew Morton , linux-mm , LKML , John Dias , David Hildenbrand Subject: Re: [PATCH v4] mm: fix is_pinnable_page against on cma page Message-ID: <20220517192825.GM63055@ziepe.ca> References: <20220512002207.GJ1790663@paulmck-ThinkPad-P17-Gen-1> <20220512004949.GK1790663@paulmck-ThinkPad-P17-Gen-1> <0accce46-fac6-cdfb-db7f-d08396bf9d35@nvidia.com> <20220517140049.GF63055@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=ziepe.ca header.s=google header.b=kQ25g3Zu; dmarc=none; spf=pass (imf01.hostedemail.com: domain of jgg@ziepe.ca designates 209.85.160.171 as permitted sender) smtp.mailfrom=jgg@ziepe.ca X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 3883F400BA X-Rspam-User: X-Stat-Signature: dyftut76zht5bg7tmorfenwnydk57ezx X-HE-Tag: 1652815693-248273 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000026, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, May 17, 2022 at 11:12:37AM -0700, John Hubbard wrote: > On 5/17/22 07:00, Jason Gunthorpe wrote: > > > > It does change the generated code slightly. I don't know if this will > > > > affect performance here or not. But just for completeness, here you go: > > > > > > > > free_one_page() originally has this (just showing the changed parts): > > > > > > > > mov 0x8(%rdx,%rax,8),%rbx > > > > and $0x3f,%ecx > > > > shr %cl,%rbx > > > > and $0x7, > > > > > > > > > > > > And after applying this diff: > > > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > > index 0e42038382c1..df1f8e9a294f 100644 > > > > +++ b/mm/page_alloc.c > > > > @@ -482,7 +482,7 @@ unsigned long __get_pfnblock_flags_mask(const struct > > > > page *page, > > > > word_bitidx = bitidx / BITS_PER_LONG; > > > > bitidx &= (BITS_PER_LONG-1); > > > > > > > > - word = bitmap[word_bitidx]; > > > > + word = READ_ONCE(bitmap[word_bitidx]); > > > > return (word >> bitidx) & mask; > > > > } > > > > > > > > > > > > ...it now does an extra memory dereference: > > > > > > > > lea 0x8(%rdx,%rax,8),%rax > > > > and $0x3f,%ecx > > > > mov (%rax),%rbx > > > > shr %cl,%rbx > > > > and $0x7,%ebx > > > > Where is the extra memory reference? 'lea' is not a memory reference, > > it is just some maths? > > If you compare this to the snippet above, you'll see that there is > an extra mov statement, and that one dereferences a pointer from > %rax: > > mov (%rax),%rbx That is the same move as: mov 0x8(%rdx,%rax,8),%rbx Except that the EA calculation was done in advance and stored in rax. lea isn't a memory reference, it is just computing the pointer value that 0x8(%rdx,%rax,8) represents. ie the lea computes %rax = %rdx + %rax*8 + 6 Which is then fed into the mov. Maybe it is an optimization to allow one pipe to do the shr and an other to the EA - IDK, seems like a random thing for the compiler to do. > > IMHO putting a READ_ONCE on something that is not a memory load from > > shared data is nonsense - if a simple == has a stability risk then so > > does the '(word >> bitidx) & mask'. > > Doing something like this: > > int __x = y(); > int x = READ_ONCE(__x); > > is just awful! I agree. Really, y() should handle any barriers, because > otherwise it really does look pointless, and people reading the code > need something that is clearer. My first reaction was that this was > pointless and wrong, and it turns out that that's only about 80% true: > as long as LTO-of-the-future doesn't arrive, and as long as no one > refactors y() to be inline. It is always pointless, because look at the whole flow: y(): word = bitmap[word_bitidx]; return (word >> bitidx) & mask; int __x = y() int x = READ_ONCE(__x) The requirement here is for a coherent read of bitmap[word_bitidx] that is not mangled by multi-load, load tearing or other theoritcal compiler problems. Stated more clearly if bitmap[] has values {A,B,C} then the result of all this in x should always be {A',B',C'} according to the given maths, never, ever an impossible value D. The nature of the compiler problems we are trying to prevent is that the compiler my take a computation that seems deterministic and corrupt it somehow by making an assumption that the memory is stable when it is not. For instance == may not work right if the compiler decides to do: if (READ_ONCE(a) == 1 && READ_ONCE(a) == 1) This also means that >> and & could malfunction. So when LTO expands and inlines everything and you get this: if (READ_ONCE((bitmap[word_bitidx] >> bitidx) & mask) == MIGRATE_CMA) It is still not safe because the >> and & could have multi-loaded or torn the read in some way that it is no longer a sane calculation. (for instance imagine that the compiler decides to read the value byte-at-a-time without READ_ONCE) ie the memory may have had A racing turning into C but x computed into D not A'. Paul can correct me, but I understand we do not have a list of allowed operations that are exempted from the READ_ONCE() requirement. ie it is not just conditional branching that requires READ_ONCE(). This is why READ_ONCE() must always be on the memory load, because the point is to sanitize away the uncertainty that comes with an unlocked read of unstable memory contents. READ_ONCE() samples the value in memory, and removes all tearing, multiload, etc "instability" that may effect down stream computations. In this way down stream compulations become reliable. Jason