From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56673C4332F for ; Mon, 19 Dec 2022 01:40:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 654798E0002; Sun, 18 Dec 2022 20:40:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 604B58E0001; Sun, 18 Dec 2022 20:40:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4CBBD8E0002; Sun, 18 Dec 2022 20:40:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 3DD9B8E0001 for ; Sun, 18 Dec 2022 20:40:45 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 0DF8280A75 for ; Mon, 19 Dec 2022 01:40:45 +0000 (UTC) X-FDA: 80257351650.03.8319615 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf02.hostedemail.com (Postfix) with ESMTP id 4773180007 for ; Mon, 19 Dec 2022 01:40:43 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=PATNVhgd; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf02.hostedemail.com: domain of chenhuacai@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chenhuacai@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1671414043; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Vp5g7BVqGGjzYQDcoMp/pSIzSH9VRtUvJAcKnlscQCo=; b=em2+JmaPeuWno6cIrIHWb5oBspQWskeD2yn2COU2jPSnIQEuch1sRyAgtr6EQfWOrV9lga BEf6Z9smWuDtqPzst+thXS2thFBxGBC/848uH+xfghvVcBRyrmgQMaZPKvg3/jw3aE1TAY fT7rwB3bvX/X/w8Sy7Dztaamc/UD/SU= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=PATNVhgd; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf02.hostedemail.com: domain of chenhuacai@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chenhuacai@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671414043; a=rsa-sha256; cv=none; b=WJ4nlGZ7lrbecPE3zrgBMANHN2pcb7OvMZ3vgKFpZs5/TP59KURGWxDaiWyCIanO7uNSnP lhCv9qE3IwtbnB4YawLPQwn68nBx27wY4dpqhWB/W8EMg4hczMGxiiwvvwe/IJAmCQdTMV sVOYwEklJ0GZwiV9CYAAfzxEWY89lhw= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 243B160E06 for ; Mon, 19 Dec 2022 01:40:42 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 07DF8C43392 for ; Mon, 19 Dec 2022 01:40:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1671414041; bh=9nyV2VLryTNQ2RWGz3ZIj4+lGtx6hnYLztBjAV0zvPk=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=PATNVhgdHjYWGiZiASI0bBaWPGO5qTJR2J725H79e6EhmAT4XY1N6hsV0m/tQJEp7 a/2WvulFJ6gOCpeo5QTXLzunPUQjlh2it+91pkfEEgDCo6RRGWx0krLoQTsVX7gm9R EVdLgzQN6omHyhybQOq0Ih61Cy92rrRlJTIMKPPQ05AzuTaeoo0eXxzs5b9a/1j2Pe cbjteZLbAKsUC5nPsrAsxmzdAcCBOcZpfEX5kvBGMV5m1XRhGlo26443cbMEi9DMsc DM2vxo07g5jBYvi3+mwzcTrBmo8h2hYFNguAYLVO1Y7c/iAYn7ysl1MKA9AbHfYbS/ rYDwQhhKDNexw== Received: by mail-ej1-f41.google.com with SMTP id jo4so9281258ejb.7 for ; Sun, 18 Dec 2022 17:40:40 -0800 (PST) X-Gm-Message-State: ANoB5pmCHqOK693FpJcQsbDI1XzQjtrJ0Yz1+izBXRmGY51GBCz5fkzL FMkO+HqVMOV+tFZcvKxX/shLMHF0Oh/gtU9Ft2I= X-Google-Smtp-Source: AA0mqf7fa22WDTII8PuIoosVINgEu5ldbNcdzS6oKOztHUJ2Qlsma3hMLsf/nMU+opz2u3k/iVMluUrTunlVexMitRk= X-Received: by 2002:a17:906:f116:b0:7c1:764:5e08 with SMTP id gv22-20020a170906f11600b007c107645e08mr12961272ejb.72.1671414039185; Sun, 18 Dec 2022 17:40:39 -0800 (PST) MIME-Version: 1.0 References: <20221206144730.163732-1-david@redhat.com> In-Reply-To: From: Huacai Chen Date: Mon, 19 Dec 2022 09:40:31 +0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH mm-unstable RFC 00/26] mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, Andrew Morton , Hugh Dickins , John Hubbard , Jason Gunthorpe , Mike Rapoport , Yang Shi , Vlastimil Babka , Nadav Amit , Andrea Arcangeli , Peter Xu , linux-mm@kvack.org, x86@kernel.org, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, loongarch@lists.linux.dev, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, openrisc@lists.librecores.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, linux-xtensa@linux-xtensa.org, Albert Ou , Anton Ivanov , Borislav Petkov , Brian Cain , Christophe Leroy , Chris Zankel , Dave Hansen , "David S. Miller" , Dinh Nguyen , Geert Uytterhoeven , Greg Ungerer , Guo Ren , Helge Deller , "H. Peter Anvin" , Ingo Molnar , Ivan Kokshaysky , "James E.J. Bottomley" , Johannes Berg , Matt Turner , Max Filippov , Michael Ellerman , Michal Simek , Nicholas Piggin , Palmer Dabbelt , Paul Walmsley , Richard Henderson , Richard Weinberger , Rich Felker , Russell King , Stafford Horne , Stefan Kristiansson , Thomas Bogendoerfer , Thomas Gleixner , Vineet Gupta , WANG Xuerui , Yoshinori Sato Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4773180007 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: 1f7zd6oy1apno13i6amzkhecpr4zcygf X-HE-Tag: 1671414043-446227 X-HE-Meta: U2FsdGVkX18HGiqoHvpJPK2swzre+r/KIQU4wUVPWp0n+tmDPJ996p2tD9CRCJaEYo7raoPDgluGz2/8bBXyH/x7jaLAGVOta11aQ3QoK54NU8xACwV2QQUQNDUExGrUte8qtIgSqHGWXWIJ83nkRHM9FUZYFCk40cKZe5PhamcMJLZvsCzGEuGyPgEnZN5ymQuitQCMqJ+DfpORJRAbaoQglje3yYgtmLEA8qgr5x5s7YLUXghyXc16V3lMTGAbnwUBm8fuo9weLK/xtgHZvhF+0vjD2T6jrL1mOZZz/T/LSU39cPQADh5lrYeU8zvMo+X3fwBn1EHOheqNxz+AIqdUn1ZvQy6VczPHJ7IGspN5I7y7TYCeVMV/Q6wQGigqG1Opk03vLabpdFvxgQnCotw7OwZ/l+Fxg48Sz2vlJMsHRmP13+1laJpj3MkHRGALxqX/gDDg0/Hn0Lafo2bHN/W71KX2MzoAjE7/65FYTHHGlB/Qw9Lx5tYvZXwsCXiGdeJHLRdZ2y1i+FU0rubwhI1XOzP6yn+8ORCr93hmvfMPhhUY1FSa8w7wYiDAztk1onyKNgOmYod6/4K8FZ3KvLT5ISLpiFzjf95QcAwLlkVPGDBUKna0kfdtVR5hphLsHyM9AUHxvzg3AaWdRK1CS30x7FPPKrXjhYM4T42vhG8QtYw/A6w8gC67gRYpdFGtUweqUwVb9erj7fqXBth6bsoxsKglfcMUTUNRJ/a/gRuqguPNzoCY3PJdFK8kUkxFSjnoj5VJ1NyUacNy6z2EIxerL/8oke3UEW6l3GEg96EX1v4m8Z1NKPwCNL6YclK9IbX96KDcwUMYgjZcz8hJtudYqJD/Vs+pWo/raOZziIQgKsYPj0htMJL9zva/cvH55gqVLUrPbS2ydc6oW3viiMOoVOkjxgQSFh+dm13LiNCxcvPj8wFkdVOkENTo8qyZ/mvH54Z1XjSwZYwq0JN 02AnVlK+ 66KJ2RQcJ/zDQtAa+3k/CId2zN96yGbymL+U1CzSb7x4BbNfCpA2U8HmZsJBebPOZrM3CwQgf778pikCaE1vR0QJQwjOisaySCS8GEG5Q+7ITSDhOXbrmZ81l9AWF+mAK6PuhtkLaHCeiLRlqTKr5P3AdDw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, Dec 18, 2022 at 5:59 PM David Hildenbrand wrote: > > On 18.12.22 04:32, Huacai Chen wrote: > > Hi, David, > > > > What is the opposite of exclusive here? Shared or inclusive? I prefer > > pte_swp_mkshared() or pte_swp_mkinclusive() rather than > > pte_swp_clear_exclusive(). Existing examples: dirty/clean, young/old > > ... > > Hi Huacai, > > thanks for having a look! > > Please note that this series doesn't add these primitives but merely > implements them on all remaining architectures. > > Having that said, the semantics are "exclusive" vs. "maybe shared", not > "exclusive" vs. "shared" or sth. else. It would have to be > pte_swp_mkmaybe_shared(). > > > Note that this naming matches just the way we handle it for the other > pte_swp_ flags we have, namely: > > pte_swp_mksoft_dirty() > pte_swp_soft_dirty() > pte_swp_clear_soft_dirty() > > and > > pte_swp_mkuffd_wp() > pte_swp_uffd_wp() > pte_swp_clear_uffd_wp() > > > For example, we also (thankfully) didn't call it pte_mksoft_clean(). > Grepping for "pte_swp.*soft_dirty" gives you the full picture. > > Thanks! OK, got it. Huacai > > David > > > > > Huacai > > > > On Tue, Dec 6, 2022 at 10:48 PM David Hildenbrand wrote: > >> > >> This is the follow-up on [1]: > >> [PATCH v2 0/8] mm: COW fixes part 3: reliable GUP R/W FOLL_GET of > >> anonymous pages > >> > >> After we implemented __HAVE_ARCH_PTE_SWP_EXCLUSIVE on most prominent > >> enterprise architectures, implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all > >> remaining architectures that support swap PTEs. > >> > >> This makes sure that exclusive anonymous pages will stay exclusive, even > >> after they were swapped out -- for example, making GUP R/W FOLL_GET of > >> anonymous pages reliable. Details can be found in [1]. > >> > >> This primarily fixes remaining known O_DIRECT memory corruptions that can > >> happen on concurrent swapout, whereby we can lose DMA reads to a page > >> (modifying the user page by writing to it). > >> > >> To verify, there are two test cases (requiring swap space, obviously): > >> (1) The O_DIRECT+swapout test case [2] from Andrea. This test case tries > >> triggering a race condition. > >> (2) My vmsplice() test case [3] that tries to detect if the exclusive > >> marker was lost during swapout, not relying on a race condition. > >> > >> > >> For example, on 32bit x86 (with and without PAE), my test case fails > >> without these patches: > >> $ ./test_swp_exclusive > >> FAIL: page was replaced during COW > >> But succeeds with these patches: > >> $ ./test_swp_exclusive > >> PASS: page was not replaced during COW > >> > >> > >> Why implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE for all architectures, even > >> the ones where swap support might be in a questionable state? This is the > >> first step towards removing "readable_exclusive" migration entries, and > >> instead using pte_swp_exclusive() also with (readable) migration entries > >> instead (as suggested by Peter). The only missing piece for that is > >> supporting pmd_swp_exclusive() on relevant architectures with THP > >> migration support. > >> > >> As all relevant architectures now implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE,, > >> we can drop __HAVE_ARCH_PTE_SWP_EXCLUSIVE in the last patch. > >> > >> > >> RFC because some of the swap PTE layouts are really tricky and I really > >> need some feedback related to deciphering these layouts and "using yet > >> unused PTE bits in swap PTEs". I tried cross-compiling all relevant setups > >> (phew, I might only miss some power/nohash variants), but only tested on > >> x86 so far. > >> > >> CCing arch maintainers only on this cover letter and on the respective > >> patch(es). > >> > >> > >> [1] https://lkml.kernel.org/r/20220329164329.208407-1-david@redhat.com > >> [2] https://gitlab.com/aarcange/kernel-testcases-for-v5.11/-/blob/main/page_count_do_wp_page-swap.c > >> [3] https://gitlab.com/davidhildenbrand/scratchspace/-/blob/main/test_swp_exclusive.c > >> > >> David Hildenbrand (26): > >> mm/debug_vm_pgtable: more pte_swp_exclusive() sanity checks > >> alpha/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > >> arc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > >> arm/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > >> csky/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > >> hexagon/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > >> ia64/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > >> loongarch/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > >> m68k/mm: remove dummy __swp definitions for nommu > >> m68k/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > >> microblaze/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > >> mips/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > >> nios2/mm: refactor swap PTE layout > >> nios2/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > >> openrisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > >> parisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > >> powerpc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit book3s > >> powerpc/nohash/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > >> riscv/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > >> sh/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > >> sparc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit > >> sparc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 64bit > >> um/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > >> x86/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE also on 32bit > >> xtensa/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > >> mm: remove __HAVE_ARCH_PTE_SWP_EXCLUSIVE > >> > >> arch/alpha/include/asm/pgtable.h | 40 ++++++++- > >> arch/arc/include/asm/pgtable-bits-arcv2.h | 26 +++++- > >> arch/arm/include/asm/pgtable-2level.h | 3 + > >> arch/arm/include/asm/pgtable-3level.h | 3 + > >> arch/arm/include/asm/pgtable.h | 34 ++++++-- > >> arch/arm64/include/asm/pgtable.h | 1 - > >> arch/csky/abiv1/inc/abi/pgtable-bits.h | 13 ++- > >> arch/csky/abiv2/inc/abi/pgtable-bits.h | 19 ++-- > >> arch/csky/include/asm/pgtable.h | 17 ++++ > >> arch/hexagon/include/asm/pgtable.h | 36 ++++++-- > >> arch/ia64/include/asm/pgtable.h | 31 ++++++- > >> arch/loongarch/include/asm/pgtable-bits.h | 4 + > >> arch/loongarch/include/asm/pgtable.h | 38 +++++++- > >> arch/m68k/include/asm/mcf_pgtable.h | 35 +++++++- > >> arch/m68k/include/asm/motorola_pgtable.h | 37 +++++++- > >> arch/m68k/include/asm/pgtable_no.h | 6 -- > >> arch/m68k/include/asm/sun3_pgtable.h | 38 +++++++- > >> arch/microblaze/include/asm/pgtable.h | 44 +++++++--- > >> arch/mips/include/asm/pgtable-32.h | 86 ++++++++++++++++--- > >> arch/mips/include/asm/pgtable-64.h | 23 ++++- > >> arch/mips/include/asm/pgtable.h | 35 ++++++++ > >> arch/nios2/include/asm/pgtable-bits.h | 3 + > >> arch/nios2/include/asm/pgtable.h | 37 ++++++-- > >> arch/openrisc/include/asm/pgtable.h | 40 +++++++-- > >> arch/parisc/include/asm/pgtable.h | 40 ++++++++- > >> arch/powerpc/include/asm/book3s/32/pgtable.h | 37 ++++++-- > >> arch/powerpc/include/asm/book3s/64/pgtable.h | 1 - > >> arch/powerpc/include/asm/nohash/32/pgtable.h | 22 +++-- > >> arch/powerpc/include/asm/nohash/32/pte-40x.h | 6 +- > >> arch/powerpc/include/asm/nohash/32/pte-44x.h | 18 +--- > >> arch/powerpc/include/asm/nohash/32/pte-85xx.h | 4 +- > >> arch/powerpc/include/asm/nohash/64/pgtable.h | 24 +++++- > >> arch/powerpc/include/asm/nohash/pgtable.h | 15 ++++ > >> arch/powerpc/include/asm/nohash/pte-e500.h | 1 - > >> arch/riscv/include/asm/pgtable-bits.h | 3 + > >> arch/riscv/include/asm/pgtable.h | 28 ++++-- > >> arch/s390/include/asm/pgtable.h | 1 - > >> arch/sh/include/asm/pgtable_32.h | 53 +++++++++--- > >> arch/sparc/include/asm/pgtable_32.h | 26 +++++- > >> arch/sparc/include/asm/pgtable_64.h | 37 +++++++- > >> arch/sparc/include/asm/pgtsrmmu.h | 14 +-- > >> arch/um/include/asm/pgtable.h | 36 +++++++- > >> arch/x86/include/asm/pgtable-2level.h | 26 ++++-- > >> arch/x86/include/asm/pgtable-3level.h | 26 +++++- > >> arch/x86/include/asm/pgtable.h | 3 - > >> arch/xtensa/include/asm/pgtable.h | 31 +++++-- > >> include/linux/pgtable.h | 29 ------- > >> mm/debug_vm_pgtable.c | 25 +++++- > >> mm/memory.c | 4 - > >> mm/rmap.c | 11 --- > >> 50 files changed, 943 insertions(+), 227 deletions(-) > >> > >> -- > >> 2.38.1 > >> > >> > > > > -- > Thanks, > > David / dhildenb >