From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6497BC4332F for ; Sun, 18 Dec 2022 03:33:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 495358E0002; Sat, 17 Dec 2022 22:33:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 445DB8E0001; Sat, 17 Dec 2022 22:33:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E6368E0002; Sat, 17 Dec 2022 22:33:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1EA468E0001 for ; Sat, 17 Dec 2022 22:33:19 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id E6E40AAC76 for ; Sun, 18 Dec 2022 03:33:18 +0000 (UTC) X-FDA: 80254006476.16.C1A479D Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf01.hostedemail.com (Postfix) with ESMTP id 2741E4000A for ; Sun, 18 Dec 2022 03:33:16 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=NZznuv2H; spf=pass (imf01.hostedemail.com: domain of chenhuacai@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chenhuacai@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1671334397; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tK1h6rGW1w/pCrLdaiwtuFHR9iNXPQm28wm9Zk/n+XU=; b=p9y8gVV3agDKXDwS4W0hshf6PBQTmU3sk9TZfP/k9pd7hfwCR2PZfzCusW/cnQtGchL76F cEos0ZD2qENFjx3m4O0gUCT4jOAtxL72gFoq2O57DjKPOskVzcZIck0Hu8o7U4qr7lSUmk UIq8QlJpd37xTABkB9sm8b7sY1dYVag= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=NZznuv2H; spf=pass (imf01.hostedemail.com: domain of chenhuacai@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chenhuacai@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671334397; a=rsa-sha256; cv=none; b=8GnRHhcBte2IJZSpXHFZi7/h5Fel0B/F1TDGdCo11THN0ppXfu5XqscrU3OVUSu388kRWU 26XgtielSB5V5FyqwChcQWHKEJQs09FUDwiOPw20J9nYX2ck0B1Ou/qLW/O+qN4GMuGgJQ 4F9F8YYfBMRB8urZUlXL+z96ewVQsXg= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 09A4560C81 for ; Sun, 18 Dec 2022 03:33:16 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1988FC433A7 for ; Sun, 18 Dec 2022 03:33:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1671334394; bh=tIkYFzXEO2chGeVvpnhnUSRiaYqUOvaPo7G7agH5seU=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=NZznuv2H9MQueEsr6jgEx1junfwjYsvNcOR43Yi/brcuSQR9UH+m6CrgXjrTxCtE7 e6bwe9cJqXk1WuwXoKDMvpo9TzcCqL2J7/rfInWmXCzjpBl8AaPX8/ZDKlqMvr8Jnb yPLDT135jBzOiGZQnkegHTEPk2QAKPw71lNMeRicomC31rduSeqQEgDS6bhnzMGqcq lEoicK81SZ/d6NH3t3vJk88ScyMf0ePdwVfZuCHv/jvCCwGUN6mZAZhI4t+dG610gH fki7MqLM2MPF400PykUCrGAk2XSSyU7i3CvLgBvt+bZ5XueXYRrGIX178MJ5NirAWw 7omV8lnwwK5iA== Received: by mail-ed1-f42.google.com with SMTP id s5so8608897edc.12 for ; Sat, 17 Dec 2022 19:33:13 -0800 (PST) X-Gm-Message-State: ANoB5pmXZL1730cZEyI11CGfHwTPQEhdZJw6Obunu8qMTUAqJ4LvVH8P J2ZXIJigu4uhKQ9FJUXMLB7urTAQbqISxIvPo28= X-Google-Smtp-Source: AA0mqf6HJrjOIUv4bzcPb2/bKtCuYat7PlsqCSDeflk0KMk+BHjoAVHTjhW6I8Yv82IxystWMfrMsjTObq3olIShGs4= X-Received: by 2002:a05:6402:5003:b0:462:a25f:f0f2 with SMTP id p3-20020a056402500300b00462a25ff0f2mr35194911eda.156.1671334392205; Sat, 17 Dec 2022 19:33:12 -0800 (PST) MIME-Version: 1.0 References: <20221206144730.163732-1-david@redhat.com> In-Reply-To: <20221206144730.163732-1-david@redhat.com> From: Huacai Chen Date: Sun, 18 Dec 2022 11:32:59 +0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH mm-unstable RFC 00/26] mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, Andrew Morton , Hugh Dickins , John Hubbard , Jason Gunthorpe , Mike Rapoport , Yang Shi , Vlastimil Babka , Nadav Amit , Andrea Arcangeli , Peter Xu , linux-mm@kvack.org, x86@kernel.org, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, loongarch@lists.linux.dev, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, openrisc@lists.librecores.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, linux-xtensa@linux-xtensa.org, Albert Ou , Anton Ivanov , Borislav Petkov , Brian Cain , Christophe Leroy , Chris Zankel , Dave Hansen , "David S. Miller" , Dinh Nguyen , Geert Uytterhoeven , Greg Ungerer , Guo Ren , Helge Deller , "H. Peter Anvin" , Ingo Molnar , Ivan Kokshaysky , "James E.J. Bottomley" , Johannes Berg , Matt Turner , Max Filippov , Michael Ellerman , Michal Simek , Nicholas Piggin , Palmer Dabbelt , Paul Walmsley , Richard Henderson , Richard Weinberger , Rich Felker , Russell King , Stafford Horne , Stefan Kristiansson , Thomas Bogendoerfer , Thomas Gleixner , Vineet Gupta , WANG Xuerui , Yoshinori Sato Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 2741E4000A X-Stat-Signature: zz73y3rgidtiot1j5zue689ufhmaoc6k X-HE-Tag: 1671334396-404879 X-HE-Meta: U2FsdGVkX1+x9tW4FG2rc6hOJwVUVHNKFCET8ppfaC4xYJWG5m3o2a4AmGrR1COjbFQdMTqiEi7yZxU59x62R8+HykXRbYb0vrpimUgYsdze8jCcMV7rQ+mIK/Ia4iuJ2kYSzoyaaUWSRXzlXXRilB429ADA2mOfxmn5NxqXyQGq2kcJZWxULgDX6xZ+kTVB6DzXRS1+aSWXdI76wPx9brut/b30rL1Mwgsq2f3BoaYO8BuLg0PnR/Yd+HufyuTG6OJiHB3eBUTC5g8gFiN6QPLuU0DA/qBdzaOU74O9llEr8IGFmLLaN5ZHeIROHA8g/h2RbQsDLsi5dyoM9HOV5FJj8r9zHmdWKLxN6CFIpReFOFoqL532wwX9TXxql7KYpPj9YPNbGdfABQSopU7f5QqPhd06L+IU+inzrfGpxsF9EQm7xJl4joBgZNpslUrCDWuJdmLBXC6pJj7G2RVebDKGSM/zRTI5CHPbHycOUMsxoBk4Iv27457+qJnu6qznDcE2ED6ZprRG4I0tX9tg6R/hkgv7z20st9OuUXJR1qf6CjittBNRRoZQajjua+w9Y0nzXR1W7C4IAtACb31SI4aHOikWyjjw7OULghSylVCRRo+ZV7PeTY+QhwSa/M5PktK1kIoW1qRunBcK8lTHHiK7fQln4eTb9lL7RmY/WScmZ0vJ5f1fTAzFAxhehq9eZuAqTbscts94jSpN5hvK6ycnjLaiyGeuodLNO2z32OFr7UeQgFvXgAxoce6mFvzrSLpu5jLL0mcvOOAD1aJJP5UlIkNMR21OHzXQrqm50mWwI943cvsOjHHx2wVIPhhrwe/NnotjuMvpzeQG59KlSLRDdu1S470dPU5Ywia6pwJic45RZsE+aWNYcGiD8HcXcD4/pIYIvOMUA1mFBYIrjv3m0mU1YbMmcqrodxxJTepaytv0Y6+s3qrj+MGR4fv2W/zgYxjKKXjAQ/MFIKK Y3tvrv1s 5hvuzkMzf8T8OsveuBHQRbhalsd81/Q5QBNTZOhY6uGfVs8yQFnUoAP48Ww5bqCr1UquYenAU6OtqIm3cR7IAVQd+IPrmUNOT0QHMAP2M+NLRjdo1RIgcrppxWqk/olOtij2KtJ3p0sQJn/U8xBUToJT+yg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, David, What is the opposite of exclusive here? Shared or inclusive? I prefer pte_swp_mkshared() or pte_swp_mkinclusive() rather than pte_swp_clear_exclusive(). Existing examples: dirty/clean, young/old ... Huacai On Tue, Dec 6, 2022 at 10:48 PM David Hildenbrand wrote: > > This is the follow-up on [1]: > [PATCH v2 0/8] mm: COW fixes part 3: reliable GUP R/W FOLL_GET of > anonymous pages > > After we implemented __HAVE_ARCH_PTE_SWP_EXCLUSIVE on most prominent > enterprise architectures, implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all > remaining architectures that support swap PTEs. > > This makes sure that exclusive anonymous pages will stay exclusive, even > after they were swapped out -- for example, making GUP R/W FOLL_GET of > anonymous pages reliable. Details can be found in [1]. > > This primarily fixes remaining known O_DIRECT memory corruptions that can > happen on concurrent swapout, whereby we can lose DMA reads to a page > (modifying the user page by writing to it). > > To verify, there are two test cases (requiring swap space, obviously): > (1) The O_DIRECT+swapout test case [2] from Andrea. This test case tries > triggering a race condition. > (2) My vmsplice() test case [3] that tries to detect if the exclusive > marker was lost during swapout, not relying on a race condition. > > > For example, on 32bit x86 (with and without PAE), my test case fails > without these patches: > $ ./test_swp_exclusive > FAIL: page was replaced during COW > But succeeds with these patches: > $ ./test_swp_exclusive > PASS: page was not replaced during COW > > > Why implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE for all architectures, even > the ones where swap support might be in a questionable state? This is the > first step towards removing "readable_exclusive" migration entries, and > instead using pte_swp_exclusive() also with (readable) migration entries > instead (as suggested by Peter). The only missing piece for that is > supporting pmd_swp_exclusive() on relevant architectures with THP > migration support. > > As all relevant architectures now implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE,, > we can drop __HAVE_ARCH_PTE_SWP_EXCLUSIVE in the last patch. > > > RFC because some of the swap PTE layouts are really tricky and I really > need some feedback related to deciphering these layouts and "using yet > unused PTE bits in swap PTEs". I tried cross-compiling all relevant setups > (phew, I might only miss some power/nohash variants), but only tested on > x86 so far. > > CCing arch maintainers only on this cover letter and on the respective > patch(es). > > > [1] https://lkml.kernel.org/r/20220329164329.208407-1-david@redhat.com > [2] https://gitlab.com/aarcange/kernel-testcases-for-v5.11/-/blob/main/page_count_do_wp_page-swap.c > [3] https://gitlab.com/davidhildenbrand/scratchspace/-/blob/main/test_swp_exclusive.c > > David Hildenbrand (26): > mm/debug_vm_pgtable: more pte_swp_exclusive() sanity checks > alpha/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > arc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > arm/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > csky/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > hexagon/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > ia64/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > loongarch/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > m68k/mm: remove dummy __swp definitions for nommu > m68k/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > microblaze/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > mips/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > nios2/mm: refactor swap PTE layout > nios2/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > openrisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > parisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > powerpc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit book3s > powerpc/nohash/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > riscv/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > sh/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > sparc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit > sparc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 64bit > um/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > x86/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE also on 32bit > xtensa/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE > mm: remove __HAVE_ARCH_PTE_SWP_EXCLUSIVE > > arch/alpha/include/asm/pgtable.h | 40 ++++++++- > arch/arc/include/asm/pgtable-bits-arcv2.h | 26 +++++- > arch/arm/include/asm/pgtable-2level.h | 3 + > arch/arm/include/asm/pgtable-3level.h | 3 + > arch/arm/include/asm/pgtable.h | 34 ++++++-- > arch/arm64/include/asm/pgtable.h | 1 - > arch/csky/abiv1/inc/abi/pgtable-bits.h | 13 ++- > arch/csky/abiv2/inc/abi/pgtable-bits.h | 19 ++-- > arch/csky/include/asm/pgtable.h | 17 ++++ > arch/hexagon/include/asm/pgtable.h | 36 ++++++-- > arch/ia64/include/asm/pgtable.h | 31 ++++++- > arch/loongarch/include/asm/pgtable-bits.h | 4 + > arch/loongarch/include/asm/pgtable.h | 38 +++++++- > arch/m68k/include/asm/mcf_pgtable.h | 35 +++++++- > arch/m68k/include/asm/motorola_pgtable.h | 37 +++++++- > arch/m68k/include/asm/pgtable_no.h | 6 -- > arch/m68k/include/asm/sun3_pgtable.h | 38 +++++++- > arch/microblaze/include/asm/pgtable.h | 44 +++++++--- > arch/mips/include/asm/pgtable-32.h | 86 ++++++++++++++++--- > arch/mips/include/asm/pgtable-64.h | 23 ++++- > arch/mips/include/asm/pgtable.h | 35 ++++++++ > arch/nios2/include/asm/pgtable-bits.h | 3 + > arch/nios2/include/asm/pgtable.h | 37 ++++++-- > arch/openrisc/include/asm/pgtable.h | 40 +++++++-- > arch/parisc/include/asm/pgtable.h | 40 ++++++++- > arch/powerpc/include/asm/book3s/32/pgtable.h | 37 ++++++-- > arch/powerpc/include/asm/book3s/64/pgtable.h | 1 - > arch/powerpc/include/asm/nohash/32/pgtable.h | 22 +++-- > arch/powerpc/include/asm/nohash/32/pte-40x.h | 6 +- > arch/powerpc/include/asm/nohash/32/pte-44x.h | 18 +--- > arch/powerpc/include/asm/nohash/32/pte-85xx.h | 4 +- > arch/powerpc/include/asm/nohash/64/pgtable.h | 24 +++++- > arch/powerpc/include/asm/nohash/pgtable.h | 15 ++++ > arch/powerpc/include/asm/nohash/pte-e500.h | 1 - > arch/riscv/include/asm/pgtable-bits.h | 3 + > arch/riscv/include/asm/pgtable.h | 28 ++++-- > arch/s390/include/asm/pgtable.h | 1 - > arch/sh/include/asm/pgtable_32.h | 53 +++++++++--- > arch/sparc/include/asm/pgtable_32.h | 26 +++++- > arch/sparc/include/asm/pgtable_64.h | 37 +++++++- > arch/sparc/include/asm/pgtsrmmu.h | 14 +-- > arch/um/include/asm/pgtable.h | 36 +++++++- > arch/x86/include/asm/pgtable-2level.h | 26 ++++-- > arch/x86/include/asm/pgtable-3level.h | 26 +++++- > arch/x86/include/asm/pgtable.h | 3 - > arch/xtensa/include/asm/pgtable.h | 31 +++++-- > include/linux/pgtable.h | 29 ------- > mm/debug_vm_pgtable.c | 25 +++++- > mm/memory.c | 4 - > mm/rmap.c | 11 --- > 50 files changed, 943 insertions(+), 227 deletions(-) > > -- > 2.38.1 > >