From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A73EECAAD5 for ; Tue, 30 Aug 2022 18:53:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5D7CC8D0002; Tue, 30 Aug 2022 14:53:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5868B8D0001; Tue, 30 Aug 2022 14:53:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 427258D0002; Tue, 30 Aug 2022 14:53:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 301B58D0001 for ; Tue, 30 Aug 2022 14:53:13 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id EC50D160281 for ; Tue, 30 Aug 2022 18:53:12 +0000 (UTC) X-FDA: 79857156624.18.5C741B7 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf06.hostedemail.com (Postfix) with ESMTP id 8B7C8180037 for ; Tue, 30 Aug 2022 18:53:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661885592; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3ghkL3HWufEx19QvN7xz+8k8R4rKZfYVWdUF1YonDVk=; b=hptJfB/kOzF2NCivboJv/P4WzuaiAYj33e7U+XIfaMMgbEMDJSc1jutdN0yqSDOzX7eIQf hibglyeHsrkJ0Yl3L/w6fOwyoHyy3GzSyLMylVuUxxujxKNdIhB9ZIYkQKH7zNWe0mkPjM KGCyZANqqXWi+Tv86tw8QDUvXwkRuVc= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-639--3xP5enuPU6rJPId_yc0lg-1; Tue, 30 Aug 2022 14:53:09 -0400 X-MC-Unique: -3xP5enuPU6rJPId_yc0lg-1 Received: by mail-wm1-f69.google.com with SMTP id h133-20020a1c218b000000b003a5fa79008bso10496074wmh.5 for ; Tue, 30 Aug 2022 11:53:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc; bh=3ghkL3HWufEx19QvN7xz+8k8R4rKZfYVWdUF1YonDVk=; b=t92EVkl1KLBCQukGckyHFi06vB2M0mVVFNPeyVoinwZCz4o8+v7MrNevFIBwOYKW3J AHr3TcW6QRrJBata8ttzKkunK5yHiXbZpjtYWaGYCuR27tTBkg3B9vSTJ7EIVuD6wbQB xqsfTMBFgzuyWqkBMZg4RQPiogYl799TBAqJRB7osJdieg5wv4HHkvmTCGK/B0B1YDxG YA8fLpk1w6mbhXKL1OJ3LT8gzqXji1lV9uoAWGr3wRF4PDZ7a+qJ8ClnUwVYLb+TpmLE pAcTj3X/zfql42D0C1ZowsL7HQLELAx84WvHTrUQTbZ29D1B/h6WNh7BhEuKGZak9npV g0Lw== X-Gm-Message-State: ACgBeo1HiEuvNCouNTqbefGCDs5dsDNUN3tGwP7ARTgjyQBI+rLrYbpp YHaVt7uh6ci0KEjBb3U1qrnB9BKVK5emLUWIzFahlrUYmkb16B+C6W9x1As/VEZ583RQDVnMwN9 kvofo/a5Y8Bs= X-Received: by 2002:a05:600c:4e49:b0:3a5:dd23:90d7 with SMTP id e9-20020a05600c4e4900b003a5dd2390d7mr10542418wmq.41.1661885588415; Tue, 30 Aug 2022 11:53:08 -0700 (PDT) X-Google-Smtp-Source: AA6agR6xPd1ofzTGAMjW/LlczuN37PG9NBD6QwEUuSCHZVsEQen/hctS0Csc6k5meeHh9wWVX8hZVA== X-Received: by 2002:a05:600c:4e49:b0:3a5:dd23:90d7 with SMTP id e9-20020a05600c4e4900b003a5dd2390d7mr10542396wmq.41.1661885588108; Tue, 30 Aug 2022 11:53:08 -0700 (PDT) Received: from ?IPV6:2003:cb:c70a:1000:ecb4:919b:e3d3:e20b? (p200300cbc70a1000ecb4919be3d3e20b.dip0.t-ipconnect.de. [2003:cb:c70a:1000:ecb4:919b:e3d3:e20b]) by smtp.gmail.com with ESMTPSA id i9-20020adfefc9000000b002251c75c09csm11972556wrp.90.2022.08.30.11.53.07 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 30 Aug 2022 11:53:07 -0700 (PDT) Message-ID: <9ce3aaaa-71a6-5a81-16a3-36e6763feb91@redhat.com> Date: Tue, 30 Aug 2022 20:53:06 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.12.0 To: Jason Gunthorpe Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Mel Gorman , John Hubbard , "Matthew Wilcox (Oracle)" , Andrea Arcangeli , Hugh Dickins , Peter Xu References: <20220825164659.89824-1-david@redhat.com> <20220825164659.89824-3-david@redhat.com> <1892f6de-fd22-0e8b-3ff6-4c8641e1c68e@redhat.com> <2e20c90d-4d1f-dd83-aa63-9d8d17021263@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v1 2/3] mm/gup: use gup_can_follow_protnone() also in GUP-fast In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661885592; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3ghkL3HWufEx19QvN7xz+8k8R4rKZfYVWdUF1YonDVk=; b=JsRBWVI8zLViVpKrD27t36g3C56urjBgT7zI8urlrS8GlSIhEKqEtF8e9Hnq3Er1Ar672/ 9JJLBvftksKJUvIMcQcTh8PoJH5Wa1RRBaE+BzU1Kyx9uoe+c3vtjvykW86gX3oHeGDfaZ iACBmyx0QJgxr9r8wS769ZmkB7eOTEE= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="hptJfB/k"; spf=pass (imf06.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661885592; a=rsa-sha256; cv=none; b=vVHECLCITlpJoS9koT0ViOe3oZBiH9kRadeMhIPxieZftUjZIHgg2O/BI7aIUwqq4eAJcx mhHmF0rxun74fUFrc7nqDxktTYot2XPB44lZ0UsBIgcnjyk78ESBkEtEqtu7A7oRONEqwx YG5lWCdb50sEEWVw/fOuCocBmYfPvVk= Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="hptJfB/k"; spf=pass (imf06.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Stat-Signature: boqass5r6rb6cyi5ksiqg6wa7iinbzw1 X-Rspamd-Queue-Id: 8B7C8180037 X-Rspam-User: X-Rspamd-Server: rspam03 X-HE-Tag: 1661885592-818754 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 30.08.22 20:45, Jason Gunthorpe wrote: > On Tue, Aug 30, 2022 at 08:23:52PM +0200, David Hildenbrand wrote: >> ... and looking into the details of TLB flush and GUP-fast interaction >> nowadays, that case is no longer relevant. A TLB flush is no longer >> sufficient to stop concurrent GUP-fast ever since we introduced generic >> RCU GUP-fast. > > Yes, we've had RCU GUP fast for a while, and it is more widely used > now, IIRC. > > It has been a bit, but if I remember, GUP fast in RCU mode worked on a > few principles: > > - The PTE page must not be freed without RCU > - The PTE page content must be convertable to a struct page using the > usual rules (eg PTE Special) > - That struct page refcount may go from 0->1 inside the RCU > - In the case the refcount goes from 0->1 there must be sufficient > barriers such that GUP fast observing the refcount of 1 will also > observe the PTE entry has changed. ie before the refcount is > dropped in the zap it has to clear the PTE entry, the refcount > decr has to be a 'release' and the refcount incr in gup fast has be > to be an 'acquire'. > - The rest of the system must tolerate speculative refcount > increments from GUP on any random page > > The basic idea being that if GUP fast obtains a valid reference on a > page *and* the PTE entry has not changed then everything is fine. > > The tricks with TLB invalidation are just a "poor mans" RCU, and > arguably these days aren't really needed since I think we could make > everything use real RCU always without penalty if we really wanted. > > Today we can create a unique 'struct pagetable_page' as Matthew has > been doing in other places that guarentees a rcu_head is always > available for every page used in a page table. Using that we could > drop the code in the TLB flusher that allocates memory for the > rcu_head and hopes for the best. (Or even is the common struct page > rcu_head already guarenteed to exist for pagetable pages now a days?) > > IMHO that is the main reason we still have the non-RCU mode at all.. Good, I managed to attract the attention of someone who understands that machinery :) While validating whether GUP-fast and PageAnonExclusive code work correctly, I started looking at the whole RCU GUP-fast machinery. I do have a patch to improve PageAnonExclusive clearing (I think we're missing memory barriers to make it work as expected in any possible case), but I also stumbled eventually over a more generic issue that might need memory barriers. Any thoughts whether I am missing something or this is actually missing memory barriers? >From ce8c941c11d1f60cea87a3e4d941041dc6b79900 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 29 Aug 2022 16:57:07 +0200 Subject: [PATCH] mm/gup: update refcount+pincount before testing if the PTE changed mm/ksm.c:write_protect_page() has to make sure that no unknown references to a mapped page exist and that no additional ones with write permissions are possible -- unknown references could have write permissions and modify the page afterwards. Conceptually, mm/ksm.c:write_protect_page() consists of: (1) Clear/invalidate PTE (2) Check if there are unknown references; back off if so. (3) Update PTE (e.g., map it R/O) Conceptually, GUP-fast code consists of: (1) Read the PTE (2) Increment refcount/pincount of the mapped page (3) Check if the PTE changed by re-reading it; back off if so. To make sure GUP-fast won't be able to grab additional references after clearing the PTE, but will properly detect the change and back off, we need a memory barrier between updating the recount/pincount and checking if it changed. try_grab_folio() doesn't necessarily imply a memory barrier, so add an explicit smp_mb__after_atomic() after the atomic RMW operation to increment the refcount and pincount. ptep_clear_flush() used to clear the PTE and flush the TLB should imply a memory barrier for flushing the TLB, so don't add another one for now. PageAnonExclusive handling requires further care and will be handled separately. Fixes: 2667f50e8b81 ("mm: introduce a general RCU get_user_pages_fast()") Signed-off-by: David Hildenbrand --- mm/gup.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/mm/gup.c b/mm/gup.c index 5abdaf487460..0008b808f484 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2392,6 +2392,14 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, goto pte_unmap; } + /* + * Update refcount/pincount before testing for changed PTE. This + * is required for code like mm/ksm.c:write_protect_page() that + * wants to make sure that a page has no unknown references + * after clearing the PTE. + */ + smp_mb__after_atomic(); + if (unlikely(pte_val(pte) != pte_val(*ptep))) { gup_put_folio(folio, 1, flags); goto pte_unmap; @@ -2577,6 +2585,9 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, if (!folio) return 0; + /* See gup_pte_range(). */ + smp_mb__after_atomic(); + if (unlikely(pte_val(pte) != pte_val(*ptep))) { gup_put_folio(folio, refs, flags); return 0; @@ -2643,6 +2654,9 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, if (!folio) return 0; + /* See gup_pte_range(). */ + smp_mb__after_atomic(); + if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) { gup_put_folio(folio, refs, flags); return 0; @@ -2683,6 +2697,9 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, if (!folio) return 0; + /* See gup_pte_range(). */ + smp_mb__after_atomic(); + if (unlikely(pud_val(orig) != pud_val(*pudp))) { gup_put_folio(folio, refs, flags); return 0; -- 2.37.1 -- Thanks, David / dhildenb