From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 381DEEB64DA for ; Mon, 10 Jul 2023 22:00:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 675AF8D0002; Mon, 10 Jul 2023 18:00:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6270F8D0001; Mon, 10 Jul 2023 18:00:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4ED748D0002; Mon, 10 Jul 2023 18:00:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 40C1E8D0001 for ; Mon, 10 Jul 2023 18:00:37 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id EDB6E1402AD for ; Mon, 10 Jul 2023 22:00:36 +0000 (UTC) X-FDA: 80997072072.25.5355283 Received: from mail-ej1-f53.google.com (mail-ej1-f53.google.com [209.85.218.53]) by imf17.hostedemail.com (Postfix) with ESMTP id B9AFD40002 for ; Mon, 10 Jul 2023 22:00:32 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=IELdvlh9; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of axelrasmussen@google.com designates 209.85.218.53 as permitted sender) smtp.mailfrom=axelrasmussen@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689026433; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QfZzCSrgvGz9BGtS4wsvtCeE34B7/lj4xCbe4JnIP2Q=; b=hTKZO03Bz/r8zQ8m0C4CiAR7rZ1isu73/S4TfyJUO1+vDbLvM+3wl1JHgHLLwww2m1QCYK XrAhZrJVEJ3ezwtpdsMfjRwSudBrWeYdChpYwMWGEWUgLA8aSbhooapgcErh3Sbkowm+ly YrE5LpyUeRRig3PZY6yA1jimsGKI6Zs= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=IELdvlh9; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of axelrasmussen@google.com designates 209.85.218.53 as permitted sender) smtp.mailfrom=axelrasmussen@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689026433; a=rsa-sha256; cv=none; b=v6Jllm/lTfhADICtrvWD4RofxSV22Jlrx130R1G5Ba37LcVu/39xZt3s/awU6448vyVQcy 3H8mpiMPceoFCQqXAAY/e1sZ0/6Xnr3oBc2cj7RZRS07iLKvDEki83Dpw7EXT3gVVjk/ij uSqKTwRJe6syi3G6yVI0t2m4wZwtReQ= Received: by mail-ej1-f53.google.com with SMTP id a640c23a62f3a-9891c73e0fbso994977966b.1 for ; Mon, 10 Jul 2023 15:00:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689026432; x=1691618432; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=QfZzCSrgvGz9BGtS4wsvtCeE34B7/lj4xCbe4JnIP2Q=; b=IELdvlh9d5ofdqTbS2z8RhDc+X1hOJTuBZf0ZbiY48jlt9lPfkJVZe/Wb+ANJ63yBp 3tbyTl8zoOZLeyBV+rFCAm3x940PyUDWPt8floL10TItqyH+FrB/lAGZJtOMKZtZt+LE AXebzx74RigJZLsDEDj+71KrNZ7AIkM3jSIZhPTcrbtV9iT+zovNNmAfibmyJQAcAG3s gar9gMyoH8kJeZoRZEbQRzmJOZpL3w6TvRyJ05iXZGAzAKumLDVIkbWrwobqCp/EnIhf WQ1hrqXvH6LHXODojzsxf3hq+oeedejJX94Hat/Qve2IHSIsFzVJsuq7ZMf2QnUCPaxw LZVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689026432; x=1691618432; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QfZzCSrgvGz9BGtS4wsvtCeE34B7/lj4xCbe4JnIP2Q=; b=POwh8job3CDiaGgd4YM4H/QyoyaxQUW19SQ42/uQelRVB+0CWcBl+B/9d/qd+oh3lT E5arA+gj2wdLyLBWK+RgOr5u+/pRokPOVZG2whMMN+/HulEH3FwxFPAaT2geWmGw6+B5 fJ6Ia33IK5Q6qpA8gtzavFYSE+rz3N3MlaBHIW1sV2omqm6awJsO9TDRmca2YT3NHsJI CbDBwIs0drrnI02KUY9SxagzT4CBjgJr/F4v1KIaRprEuZJSPWmJAwa2AATv7WJPkamq 47t1rd8Nz8k/kImDDD6/HjWfB1OnDWk43ewMalnXTeabpQWLl4C5yiQ+hLb0EIMuOGo+ THJw== X-Gm-Message-State: ABy/qLas01Cp1H3SBOszFCmkhfB4TisstH/v6KKwGy8vQ7bG90kVhM22 z2kM5pMUwFKUmqOcuF9Wkakn/8G/TIuL3oSIxp85gA== X-Google-Smtp-Source: APBJJlF6PnMVGMOTSmSf3fdPKR5ba1hPx/zIlHrvaJ4KBennSkMIDBj6INtn7vwYR7M3C9Di2GejevZRUk2uTbDHu9g= X-Received: by 2002:a17:907:c29:b0:993:e85c:4ad6 with SMTP id ga41-20020a1709070c2900b00993e85c4ad6mr13335087ejc.7.1689026431632; Mon, 10 Jul 2023 15:00:31 -0700 (PDT) MIME-Version: 1.0 References: <20230707215540.2324998-1-axelrasmussen@google.com> <20230707215540.2324998-2-axelrasmussen@google.com> <20230708180850.bc938ab49fbfb38b83c367c8@linux-foundation.org> In-Reply-To: From: Axel Rasmussen Date: Mon, 10 Jul 2023 14:59:55 -0700 Message-ID: Subject: Re: [PATCH v4 1/8] mm: make PTE_MARKER_SWAPIN_ERROR more general To: Andrew Morton Cc: Alexander Viro , Brian Geffon , Christian Brauner , David Hildenbrand , Gaosheng Cui , Huang Ying , Hugh Dickins , James Houghton , "Jan Alexander Steffens (heftig)" , Jiaqi Yan , Jonathan Corbet , Kefeng Wang , "Liam R. Howlett" , Miaohe Lin , Mike Kravetz , "Mike Rapoport (IBM)" , Muchun Song , Nadav Amit , Naoya Horiguchi , Peter Xu , Ryan Roberts , Shuah Khan , Suleiman Souhlal , Suren Baghdasaryan , "T.J. Alumbaugh" , Yu Zhao , ZhangPeng , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: B9AFD40002 X-Stat-Signature: 7ygpp8mo157hfiuu7btopz3uauf4rx7p X-Rspam-User: X-HE-Tag: 1689026432-375090 X-HE-Meta: U2FsdGVkX183/tWcBLbAyBUi1beDWB2H0kvkrVKT4h1YCamWJTrwY/up/B8i29b8zO6kAJcHzI1LL1fM5SAlapQAjJmIPT7Azsyo89ECOexWpHq4azegQdoPMemXEcNYmCBBDLrWjfqpo/Z+zsUKkfMJP4r7YXIbdN+U8T4u+SrAMvwTtMLUhbwZ3s8ps48OcmFaVhxx5IbrQ4Frt+15Z6+LbvkaPzggiJ25AwlyyuvwgnFe0bTDzzqUDGlYc4sSTxsJpQIRd/ztURRKle8bHvG4Ib+SNXRGJJQp1X5G7JfjHDhQjz0DkxgUvNvwek6WseuMRMWVWmqm/spSSQqzNMO3SLGqRqphVRtrGSzk9GI0jTqlsiMsDrg69ulNFdkEiyczsB9JWswa6iPQq98jpifPTkBYs7Bl0Ha9Zb6dvarwny8PxDBWaIUa1uXToOtW0VHRcRl8W+69Krra/KoNQ0eLE7r9GjH+RMh5Dq50I1Xab0XwLFwvXwZIxVgU0bHeWcp0NOyFeYEGKPBRCm3hvGlixb7SGubVs8DVZk2IAfiMz39s+qCdSYLbgWXUqQAUkGC4QPib7W4/OE3i8Z4aXYV48gi5LrYEIjl/aFgo1h/l2S7HFU6ndBuh6ce0RsDdGUAeMHAQSNxI0evtWgO5tlhMotnMfUtG1ge1PHE+Ruu/5zQ9oClzTTjmrtS70VaggzZm4zmLHZg66eDBzrCVpbrgEJDFrHMrUzAxuMGPBNZBVjmPT76tgtN2iICIeG27SDyg1DYFo5AlKR2pv9AIKIzE1Ka1kMPdlA/M0BhiB8/fr6Uno2YdIpSDflSTDvIdJMnok1jFtVhBGX4sJKclXXaDoQbokGxPSS0/Ud+PS/k33y0tll2H1NUk/FuaPyGw9wCDnm+DUZilUWvb1qE6/IgwrV0MY9Gz1UUxdt6zFvsYPXXccF/tkliZt3YPMKuyd6zjnPTt/cUmg8GmQAm fFloeAfN Refw/4nlDEf5aL7HdDQFCpfHzVkSVYlkTQTm4ia7wIm95plECG5HPxC5Qd5OfmNt8ngx5hCtTBJpHgSxuGw2yVRKwuo7tSPyq9PPmycutLVhl19m1+TiIWuLagtQImlZAEK2pTs9hDMlFIIdyda49Rnzglb9vgSqwoyVwvJjXOGGlmfBKl7XVawIZxytdEyLRza1nxLMLxm2q4Mw5Aa6AqpKHvbhhQU+SwmdRwvqzyOFCK/AsG9YpvEAiP6/r7IpiY+VP6e0ZR/J3R3OYP2L49aGp5G1sbDx6ZUQBynkngYXcbAan5t9DEXJw+BAluluz0RoA1prjyyxzetfRnITBPb+pO17sw0xBJUqlC29Pz/lxois= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jul 10, 2023 at 10:19=E2=80=AFAM Axel Rasmussen wrote: > > On Sat, Jul 8, 2023 at 6:08=E2=80=AFPM Andrew Morton wrote: > > > > On Fri, 7 Jul 2023 14:55:33 -0700 Axel Rasmussen wrote: > > > > > Future patches will re-use PTE_MARKER_SWAPIN_ERROR to implement > > > UFFDIO_POISON, so make some various preparations for that: > > > > > > First, rename it to just PTE_MARKER_POISONED. The "SWAPIN" can be > > > confusing since we're going to re-use it for something not really > > > related to swap. This can be particularly confusing for things like > > > hugetlbfs, which doesn't support swap whatsoever. Also rename some > > > various helper functions. > > > > > > Next, fix pte marker copying for hugetlbfs. Previously, it would WARN= on > > > seeing a PTE_MARKER_SWAPIN_ERROR, since hugetlbfs doesn't support swa= p. > > > But, since we're going to re-use it, we want it to go ahead and copy = it > > > just like non-hugetlbfs memory does today. Since the code to do this = is > > > more complicated now, pull it out into a helper which can be re-used = in > > > both places. While we're at it, also make it slightly more explicit i= n > > > its handling of e.g. uffd wp markers. > > > > > > For non-hugetlbfs page faults, instead of returning VM_FAULT_SIGBUS f= or > > > an error entry, return VM_FAULT_HWPOISON. For most cases this change > > > doesn't matter, e.g. a userspace program would receive a SIGBUS eithe= r > > > way. But for UFFDIO_POISON, this change will let KVM guests get an MC= E > > > out of the box, instead of giving a SIGBUS to the hypervisor and > > > requiring it to somehow inject an MCE. > > > > > > Finally, for hugetlbfs faults, handle PTE_MARKER_POISONED, and return > > > VM_FAULT_HWPOISON_LARGE in such cases. Note that this can't happen to= day > > > because the lack of swap support means we'll never end up with such a > > > PTE anyway, but this behavior will be needed once such entries *can* > > > show up via UFFDIO_POISON. > > > > > > --- a/include/linux/mm_inline.h > > > +++ b/include/linux/mm_inline.h > > > @@ -523,6 +523,25 @@ static inline bool mm_tlb_flush_nested(struct mm= _struct *mm) > > > return atomic_read(&mm->tlb_flush_pending) > 1; > > > } > > > > > > +/* > > > + * Computes the pte marker to copy from the given source entry into = dst_vma. > > > + * If no marker should be copied, returns 0. > > > + * The caller should insert a new pte created with make_pte_marker()= . > > > + */ > > > +static inline pte_marker copy_pte_marker( > > > + swp_entry_t entry, struct vm_area_struct *dst_vma) > > > +{ > > > + pte_marker srcm =3D pte_marker_get(entry); > > > + /* Always copy error entries. */ > > > + pte_marker dstm =3D srcm & PTE_MARKER_POISONED; > > > + > > > + /* Only copy PTE markers if UFFD register matches. */ > > > + if ((srcm & PTE_MARKER_UFFD_WP) && userfaultfd_wp(dst_vma)) > > > + dstm |=3D PTE_MARKER_UFFD_WP; > > > + > > > + return dstm; > > > +} > > > > Breaks the build with CONFIG_MMU=3Dn (arm allnoconfig). pte_marker isn= 't > > defined. > > > > I'll slap #ifdef CONFIG_MMU around this function, but probably somethng= more > > fine-grained could be used, like CONFIG_PTE_MARKER_UFFD_WP. Please > > consider. > > Whoops, sorry about this. This function "ought" to be in > include/linux/swapops.h where it would be inside a #ifdef CONFIG_MMU > anyway, but it can't be because it uses userfaultfd_wp() so there'd be > a circular include. I think just wrapping it in CONFIG_MMU is the > right way. > > But, this has also made me realize we need to not advertise > UFFDIO_POISON as supported unless we have CONFIG_MMU. I don't want > HAVE_ARCH_USERFAULTFD_WP for that, because it's only enabled on > x86_64, whereas I want to support at least arm64 as well. I don't see > a strong reason not to just use CONFIG_MMU for this too; this feature > depends on the API in swapops.h, which uses that ifdef, so I don't see > a lot of value out of creating a new but equivalent config option. Actually, I'm being silly. CONFIG_USERFAULTFD depends on CONFIG_MMU, so we don't need to worry about most of this. Andrew's fix to just wrap the helper in CONFIG_MMU is enough. > > I'll make the needed changes (and also address Peter's comment above) > and send out a v5. > > > > > btw, both copy_pte_marker() and pte_install_uffd_wp_if_needed() look > > far too large to justify inlining. Please review the desirability of > > this. As far as inlining goes, I'm not opposed to un-inlining this, I was mainly copying that pattern from existing helpers in swapops.h. One question is, if it weren't inline, where should it go? There is no mm/swapops.c which I would say is otherwise the proper place for it. I don't see any other good place for the functions to go. The one I'm introducing isn't userfaultfd-specific so userfaultfd.c seems wrong. > > > >