From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6967AC71147 for ; Thu, 8 Dec 2022 16:44:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 016238E0005; Thu, 8 Dec 2022 11:44:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EE13A8E0001; Thu, 8 Dec 2022 11:44:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D5B998E0005; Thu, 8 Dec 2022 11:44:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C12EF8E0001 for ; Thu, 8 Dec 2022 11:44:44 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 7F043120DD4 for ; Thu, 8 Dec 2022 16:44:44 +0000 (UTC) X-FDA: 80219712888.07.ED8C5BE Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf12.hostedemail.com (Postfix) with ESMTP id 836E740019 for ; Thu, 8 Dec 2022 16:44:42 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=SHhdHvCi; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf12.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670517882; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1l8k42ZbsEJYEYSTw8FtE3/FUSY7dX6qzGCzbPDDRj8=; b=W9o11c5JufWfybvsKAJou48YNx+J8E+NPl1WEP5LIzuOTwDvARCoWo60sh5xAun7nMCm75 pAdF/Melr1bDei2Kwl9FNPORvyJX8m10ZPMT8sxbomaD1i3LgvAs+rrHCptuzuA5dUkr/T z5elmHrpEEm/tjDyzbyFwHYW9a3UPfM= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=SHhdHvCi; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf12.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670517882; a=rsa-sha256; cv=none; b=w61jHocTpUVs18YbNcXIk6XCsgPgAUGAl+pI3qt4F5kYZHOtJPCQC7gfYR8dUnfpZwLY8r FbRQH1Duc2raUoSfMECV+s+DRa367/1ZAQ88e1qUIWZ8nOYRqNqzGrV8kp2UlvN5DBraWv uK2F7+l+7dfsuVAmT71h2Ot9gT+lpJY= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670517882; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1l8k42ZbsEJYEYSTw8FtE3/FUSY7dX6qzGCzbPDDRj8=; b=SHhdHvCit0/hpQInCZECSpdH24SpQ1U+nQ4ixE8f2M9gcvRg2jBLUTsA11laqNAA77V2OR ODnk7SjfdKrP2Hm7gZtOgGRMUt6XOmoLAilM8LSK+tXTn8Oxmp+FlwRPa2tUJGldjSevgN oAXdXydwinN/fAqfO9pdjv81toZK3UM= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-528-6RpWqlNoMDuomAzmO7eQcA-1; Thu, 08 Dec 2022 11:44:38 -0500 X-MC-Unique: 6RpWqlNoMDuomAzmO7eQcA-1 Received: by mail-wm1-f69.google.com with SMTP id j2-20020a05600c1c0200b003cf7397fc9bso1056624wms.5 for ; Thu, 08 Dec 2022 08:44:38 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:organization:from :content-language:references:cc:to:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=1l8k42ZbsEJYEYSTw8FtE3/FUSY7dX6qzGCzbPDDRj8=; b=UtAOt6LLeW6Cybh/XKtkp13nmEztvL2yLy+Wl2VodCaC618Y9XqpcnhxNPE74wB6sv pRiTEpFJNRJcOsOqCdMXZnGYlJghNOJEAaKBf0xzP6THYu17ts+k1qeiFd966dYfTyKf JZ5mzBVjolfLLy3njjDIk9WkO4ZihVXNybU7rLP8/dOsnja6S9J4nc3MfCLPU4afpx1T sj6WoIKxbzNUPjYPj5iYj9vQUda/43vvapGaytxPthkNsiaA5b/zQHQ8YAECqPQKmJVk iiYBClQenPqgVDPOGN/aIGXLUhh2KT5vE4hwpwKc01TgLGX0Jg45OYeQyS8Gk6JqBBdD fIzQ== X-Gm-Message-State: ANoB5pnmoifR3YdjS5LLz+B49ZbRdZYSsE8HIleYO1moM8m+zDey6Zyg v4BhJF6c7+do+8uHujbw1Hhu9XfH2GOtTXxObb7ZZoJuBENSCMaHq5kdZoUx56OlLxUuY/5bXFi jAXU84hn80gw= X-Received: by 2002:a05:600c:4f05:b0:3d0:3d33:a629 with SMTP id l5-20020a05600c4f0500b003d03d33a629mr44320914wmq.126.1670517877765; Thu, 08 Dec 2022 08:44:37 -0800 (PST) X-Google-Smtp-Source: AA0mqf5r/uL+uiuGQyKClT4NrDXeEElwlcukfDdB27dVx2wb+66kgFv3zStJgj2Nnh6eiYXwJLX/zQ== X-Received: by 2002:a05:600c:4f05:b0:3d0:3d33:a629 with SMTP id l5-20020a05600c4f0500b003d03d33a629mr44320901wmq.126.1670517877479; Thu, 08 Dec 2022 08:44:37 -0800 (PST) Received: from ?IPV6:2003:cb:c704:6c00:f39d:87c2:dd6c:4e98? (p200300cbc7046c00f39d87c2dd6c4e98.dip0.t-ipconnect.de. [2003:cb:c704:6c00:f39d:87c2:dd6c:4e98]) by smtp.gmail.com with ESMTPSA id s11-20020a5d424b000000b00241cfe6e286sm23250692wrr.98.2022.12.08.08.44.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 08 Dec 2022 08:44:37 -0800 (PST) Message-ID: Date: Thu, 8 Dec 2022 17:44:35 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.5.0 To: Peter Xu Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Ives van Hoorne , stable@vger.kernel.org, Andrew Morton , Hugh Dickins , Alistair Popple , Mike Rapoport , Nadav Amit , Andrea Arcangeli References: <20221208114137.35035-1-david@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v1] mm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 836E740019 X-Stat-Signature: 7pu5cumx9zj5qcqsfegsx4ah35ex1ofw X-HE-Tag: 1670517882-536349 X-HE-Meta: U2FsdGVkX19zDf3+YzdL3tYhXNFrfML2qsXEW1YM9XhHy6TsTs6TvT2IBrSY+r8SHwKskwOf3byGI9LBoLxfUcEqrZPUstv1/j7d8Uqnybw6yXyu6raxJlN5rER2azAXDVSOVaquAhx6cEZYoh5kDko5rIFQBkBu4u+b64b8lAYNWRFfq2LPqbUF2rmkK1Np4g8IR7Fad5HnC/VIL0Z1TSOG7zd7/MJ6gm6DkRN2jCiN5rg1YzOCwv2SH7dEgVfl/+dDKyl81N5WV1P/3QC5ObwcTvbKxvIUOdWZMX005nIQqeVhLm8xRKgInsnv6oiSha4jLGU05WpqflxMNjNhucIoGB5eV7Ovn/47qasjJhLihKW+D+JkHPpLe9yV3flhbvQ4UXY6idFDZlhcV37u5O5sO6CbvSiTZJ+QC1YsYElAu9V5t5Q69u7qxmGJARzI2iMmkX7rBoMZxOWQaihqX4CuHBRHy3zScjFSIplm2qH/CCNlPvszmSNxRBMqgL++WZsG84cMBqhYPWp9bRktU2IIs5Sne21MY0/MeCY8BNqiaMVD6SFF+OeCV2SnW0ZgHbywKrtbaT6jrlYZY2O9anDr/rIcFsOBYd3TtocwnPRra3+bJi8YwaF98HySHCRulRieoIdw+MVUzYhXMA2ivGL8PFfTfHP1enuJKci0rvWqkgydxSb0qWBAXxzGqWczBU6UUouon7OUaUTGjW2YryH7PsI8aW0j4KTVVokLeForNaoSyZkVNZclij/OrUvIMaCpw32+kBQofmm6/o5x5jb6bppg0GK4R4nTy8choKQD3VQTnYk+capjloURHbmtHV6u+UycGRJ13s86v9BIIacegcRPpCrWXe3wdQU0XCakFCOkV+EV3pkqt828dexjxPQLbpyPy43dEqihy/AUKrstNJuPagR+gOnRs8Q5Uuv+BSzHcLVyzO1UAzxoVbbNj0dRHKkC9v/d1/X+A+L LEdXXSDb Wp8mDsntFCKSo7LrHKEDfEcwCRbwcHqtuPP59ES9SgBJIMH+z3d+pC2t7diRQFVzhB5eY3ntAwCHoDqt8DOCV7UO8VebvD/h8R0S0JHGv8jYm2Mg1IPCaijyl92b9MGbG3xCKYDDGZ2sWBheCZ/kTOJ6aUJ54+qYac0v9TrFUS+/uKOdYMeNVSXdjulxEQPM/6+RoCeWas7sPYBFOF1ZKAOhC8g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 08.12.22 17:29, Peter Xu wrote: > On Thu, Dec 08, 2022 at 12:41:37PM +0100, David Hildenbrand wrote: >> Currently, we don't enable writenotify when enabling userfaultfd-wp on >> a shared writable mapping (for now only shmem and hugetlb). The consequence >> is that vma->vm_page_prot will still include write permissions, to be set >> as default for all PTEs that get remapped (e.g., mprotect(), NUMA hinting, >> page migration, ...). >> >> So far, vma->vm_page_prot is assumed to be a safe default, meaning that >> we only add permissions (e.g., mkwrite) but not remove permissions (e.g., >> wrprotect). For example, when enabling softdirty tracking, we enable >> writenotify. With uffd-wp on shared mappings, that changed. More details >> on vma->vm_page_prot semantics were summarized in [1]. >> >> This is problematic for uffd-wp: we'd have to manually check for >> a uffd-wp PTEs/PMDs and manually write-protect PTEs/PMDs, which is error >> prone. Prone to such issues is any code that uses vma->vm_page_prot to set >> PTE permissions: primarily pte_modify() and mk_pte(). >> >> Instead, let's enable writenotify such that PTEs/PMDs/... will be mapped >> write-protected as default and we will only allow selected PTEs that are >> definitely safe to be mapped without write-protection (see >> can_change_pte_writable()) to be writable. In the future, we might want >> to enable write-bit recovery -- e.g., can_change_pte_writable() -- at >> more locations, for example, also when removing uffd-wp protection. >> >> This fixes two known cases: >> >> (a) remove_migration_pte() mapping uffd-wp'ed PTEs writable, resulting >> in uffd-wp not triggering on write access. >> (b) do_numa_page() / do_huge_pmd_numa_page() mapping uffd-wp'ed PTEs/PMDs >> writable, resulting in uffd-wp not triggering on write access. >> >> Note that do_numa_page() / do_huge_pmd_numa_page() can be reached even >> without NUMA hinting (which currently doesn't seem to be applicable to >> shmem), for example, by using uffd-wp with a PROT_WRITE shmem VMA. >> On such a VMA, userfaultfd-wp is currently non-functional. >> >> Note that when enabling userfaultfd-wp, there is no need to walk page >> tables to enforce the new default protection for the PTEs: we know that >> they cannot be uffd-wp'ed yet, because that can only happen after >> enabling uffd-wp for the VMA in general. >> >> Also note that this makes mprotect() on ranges with uffd-wp'ed PTEs not >> accidentally set the write bit -- which would result in uffd-wp not >> triggering on later write access. This commit makes uffd-wp on shmem behave >> just like uffd-wp on anonymous memory (iow, less special) in that regard, >> even though, mixing mprotect with uffd-wp is controversial. >> >> [1] https://lkml.kernel.org/r/92173bad-caa3-6b43-9d1e-9a471fdbc184@redhat.com >> >> Reported-by: Ives van Hoorne >> Debugged-by: Peter Xu >> Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs") >> Cc: stable@vger.kernel.org >> Cc: Andrew Morton >> Cc: Hugh Dickins >> Cc: Alistair Popple >> Cc: Mike Rapoport >> Cc: Nadav Amit >> Cc: Andrea Arcangeli >> Signed-off-by: David Hildenbrand > > Acked-by: Peter Xu > > One trivial nit. > >> --- >> >> As discussed in [2], this is supposed to replace the fix by Peter: >> [PATCH v3 1/2] mm/migrate: Fix read-only page got writable when recover >> pte >> >> This survives vm/selftests and my reproducers: >> * migrating pages that are uffd-wp'ed using mbind() on a machine with 2 >> NUMA nodes >> * Using a PROT_WRITE mapping with uffd-wp >> * Using a PROT_READ|PROT_WRITE mapping with uffd-wp'ed pages and >> mprotect()'ing it PROT_WRITE >> * Using a PROT_READ|PROT_WRITE mapping with uffd-wp'ed pages and >> temporarily mprotect()'ing it PROT_READ >> >> uffd-wp properly triggers in all cases. On v8.1-rc8, all mre reproducers >> fail. >> >> It would be good to get some more testing feedback and review. >> >> [2] https://lkml.kernel.org/r/20221202122748.113774-1-david@redhat.com >> >> --- >> fs/userfaultfd.c | 28 ++++++++++++++++++++++------ >> mm/mmap.c | 4 ++++ >> 2 files changed, 26 insertions(+), 6 deletions(-) >> >> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c >> index 98ac37e34e3d..fb0733f2e623 100644 >> --- a/fs/userfaultfd.c >> +++ b/fs/userfaultfd.c >> @@ -108,6 +108,21 @@ static bool userfaultfd_is_initialized(struct userfaultfd_ctx *ctx) >> return ctx->features & UFFD_FEATURE_INITIALIZED; >> } >> >> +static void userfaultfd_set_vm_flags(struct vm_area_struct *vma, >> + vm_flags_t flags) >> +{ >> + const bool uffd_wp = !!((vma->vm_flags | flags) & VM_UFFD_WP); > > IIUC this can be "uffd_wp_changed" then switch "|" to "^". But not a hot > path at all, so shouldn't matter a lot. Yes, let's do that (we can also remove the !! here): This hunk will be: diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 98ac37e34e3d..a988485ada05 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -108,6 +108,21 @@ static bool userfaultfd_is_initialized(struct userfaultfd_ctx *ctx) return ctx->features & UFFD_FEATURE_INITIALIZED; } +static void userfaultfd_set_vm_flags(struct vm_area_struct *vma, + vm_flags_t flags) +{ + const bool uffd_wp_changed = (vma->vm_flags ^ flags) & VM_UFFD_WP; + + vma->vm_flags = flags; + /* + * For shared mappings, we want to enable writenotify while + * userfaultfd-wp is enabled (see vma_wants_writenotify()). We'll simply + * recalculate vma->vm_page_prot whenever userfaultfd-wp changes. + */ + if ((vma->vm_flags & VM_SHARED) && uffd_wp_changed) + vma_set_page_prot(vma); +} + I'll wait for some more (+retest) before I resend tomorrow. Thanks! -- Thanks, David / dhildenb