From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EEAFC71136 for ; Tue, 17 Jun 2025 15:44:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A53696B00AB; Tue, 17 Jun 2025 11:44:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9DBE46B00AC; Tue, 17 Jun 2025 11:44:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 87CBA6B00AD; Tue, 17 Jun 2025 11:44:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 7342F6B00AB for ; Tue, 17 Jun 2025 11:44:19 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 3504F100B02 for ; Tue, 17 Jun 2025 15:44:19 +0000 (UTC) X-FDA: 83565314238.14.D34C0E1 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf14.hostedemail.com (Postfix) with ESMTP id F3EE6100004 for ; Tue, 17 Jun 2025 15:44:16 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZzW+FZw8; spf=pass (imf14.hostedemail.com: domain of dhildenb@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhildenb@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750175057; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EH5nVYJRAiyh61U5bFJOLvxZlURX1rOpBEXY4zGSahY=; b=HXTQexrxKyTegG7cUItccS433jxUypgnRUDiwgg7NJUlBXnUkMyf1DmhSklYwGbbuklKgh VOF2RMvrPKo29O4lhZNlbafHaOlPM8Z5isKW8uwbaHLbpF9jPM134T+ANK51PQP9eKcxlK fu180YH3M5LN1A1WAGLkj2j/nOP9qFU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750175057; a=rsa-sha256; cv=none; b=F4B1S/MVqBD2FbBkAxaYnA6np541Pa4haVAOP7E/UGa2D6gzDK2WTSJrfoMVBVgDzk0xSt Q38ufTAL4ZeZryQiFgNdBAdVcE2kba2h5Z6J+5WmXJFHUR2JI4MFTY39vuOpyV3/eWRCZJ GY4gbWBW0z+X4TU1pR2+AQfiHj9TsB8= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZzW+FZw8; spf=pass (imf14.hostedemail.com: domain of dhildenb@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhildenb@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1750175056; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EH5nVYJRAiyh61U5bFJOLvxZlURX1rOpBEXY4zGSahY=; b=ZzW+FZw8mT/SXOhdCC43o+yMuLmzLxx5nPgimsCAfwLQCCtzMKaZBsFi72+lOiPN8yCCH1 j4sh+qpoMdvW7WiDrQVFdS0UzwPUBMAGON32EiUctr29KP2pKuApcO8MG68dSvF+Zkk1K+ KPAqnzDrGCsNpxXVnffOqyPTixDim/8= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-627-jcUCcdCINz-D_PNQ_40SxQ-1; Tue, 17 Jun 2025 11:44:15 -0400 X-MC-Unique: jcUCcdCINz-D_PNQ_40SxQ-1 X-Mimecast-MFC-AGG-ID: jcUCcdCINz-D_PNQ_40SxQ_1750175054 Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-3a503f28b09so527055f8f.0 for ; Tue, 17 Jun 2025 08:44:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750175054; x=1750779854; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EH5nVYJRAiyh61U5bFJOLvxZlURX1rOpBEXY4zGSahY=; b=bXL4klZwjCRRaE20Jed/0lRwYTFfPcU3prSZi733xFSHiAYgHhH2ZXEKnKaeSPLE68 l7bQ4cTYWAdyEOShNmO/+/fHPEmGPuxrnfg0eHIVASUIFrsYzi7acXHmA2rknVqOXj8k SNXWnQFJ+PvJkD12PnKYVermEoUf6DuL/oW7ltAUOSzmNt5jUQ3FBgCLVS3HqxEYbCqy WK5eCeP0YpQAo/AO78uzmF+fAoMKPdUryyHpEikfzJyrxzT/kby71XWFCxI50a66lQIC MTIC8z+AmMJ9EPHB6j6PXEN49BzDEYvWLkFda/+U7mJxxf+17UcVuuADhEKhfH9y3GMf zf5g== X-Forwarded-Encrypted: i=1; AJvYcCWr/jzdTcdCmFH+lso6AOjlZ3KbC5/+UogJY7uRKsgxeheemqg5RAnH12vumctEvm4DP654BeN+iw==@kvack.org X-Gm-Message-State: AOJu0YzMTYqtUlGZ5BV3m7Wr1ldv9QfMy/D0oNMDWVeerBijZUgJ18UM UBaXW/bh4UFNh4DeJoZDwYeBq6FomD54QkEhp/nR4SqsoDzTV9z4S9nMu7guLp/o0bgVjcskHMS UdzR7n1OKA0kunn1Iy7GAxT/rt8UvRIonPfdb0oHFSsaV6D1vghS2 X-Gm-Gg: ASbGncuS5OvdPxVs+Z/0EZoc/b0Zn/ItOl4OAdJw8hOeFUvE+mIxKWfbGwTQZ2CZV0v yIbAU5HiFU9udx+KQJ4AwBud6v2oZTs4fsl4gCaGNMsTZyo4s8GCHquh7eqx2RuS5CfzFd8+wct L5Nn1ePYwz1fEeF20zGSndQ1flW4tIUjDzS0JUf688ajSdYfQRrYK0gAQOYdKibf3WJ278L+Xh3 k9pnPX2nMJ9jRCkAmOXdCC67PKClarhJK3bXHf1vl02FnuG7vogaacJmppb24MLIBBeZZ/ccMKl ipZOXKEbtPx/9KSZo/C2MKvvjMtaDol7s0UZUAVZ9EtUSHqCtYxKvAIjUiPNdULIyFA5OVIxd0q j6teCMw== X-Received: by 2002:a05:6000:2c0d:b0:3a4:edf5:8a41 with SMTP id ffacd0b85a97d-3a56d7bad5emr13168039f8f.4.1750175053619; Tue, 17 Jun 2025 08:44:13 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFJKGvzcDxbJ2BQdcd7rWQnrxwh5Qby5YdaZuNvO87Urq3koTuTn6k4xY4C82ZJ429DbH36HQ== X-Received: by 2002:a05:6000:2c0d:b0:3a4:edf5:8a41 with SMTP id ffacd0b85a97d-3a56d7bad5emr13168003f8f.4.1750175053036; Tue, 17 Jun 2025 08:44:13 -0700 (PDT) Received: from localhost (p200300d82f3107003851c66ab6b93490.dip0.t-ipconnect.de. [2003:d8:2f31:700:3851:c66a:b6b9:3490]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-3a568b089d8sm14553679f8f.57.2025.06.17.08.44.11 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 17 Jun 2025 08:44:12 -0700 (PDT) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, nvdimm@lists.linux.dev, David Hildenbrand , Andrew Morton , Juergen Gross , Stefano Stabellini , Oleksandr Tyshchenko , Dan Williams , Alistair Popple , Matthew Wilcox , Jan Kara , Alexander Viro , Christian Brauner , Zi Yan , Baolin Wang , Lorenzo Stoakes , "Liam R. Howlett" , Nico Pache , Ryan Roberts , Dev Jain , Barry Song , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Jann Horn , Pedro Falcato Subject: [PATCH RFC 11/14] mm: remove "horrible special case to handle copy-on-write behaviour" Date: Tue, 17 Jun 2025 17:43:42 +0200 Message-ID: <20250617154345.2494405-12-david@redhat.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250617154345.2494405-1-david@redhat.com> References: <20250617154345.2494405-1-david@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: tyWgbhKwV6epSHK71qsQPOvt5307M2ldLujomXjxbzQ_1750175054 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit content-type: text/plain; charset="US-ASCII"; x-default=true X-Rspamd-Server: rspam03 X-Stat-Signature: d9aqo6suz4685ooiz4pqgjhseytuegdq X-Rspam-User: X-Rspamd-Queue-Id: F3EE6100004 X-HE-Tag: 1750175056-363564 X-HE-Meta: U2FsdGVkX1/ukqal772LqJdcKAQB/PjP6IfvvMXhVa5cxPDmQ3ioQASodXIoZio/v7tWY/shjZB8ePiHxxXBklL9nESBTD32lSlKXrqK45hoSRLI+8E0DXIx82E9wTJMCxqEtRdeDVwYwLdrR/7aWF0VKJQiTvwsMtmD1CyfzPDUf/XSv59dN80ainkLAu+7PmXfZdZP0QF6N5kY/TtumcR9ddaDLo1PkXouS5+CR5gi2Gs63bXnfJ6lSkyNnGCrc9yEUKH70OzK5iZFkLP88s/GWAWiByqBbthtKLzCdIDyLtcQEk377LdvqMTFiIecA/xJmkDwYj5JbuD4mP5MiZOQJdnQRIlHIhpbVriR4AMh4zDJeiu43Qm0ldgaoa6FAej3E2cyokqyXhftpPBaLF8WBjmg3Thy7yaaT+ow9BDHK9yrAM8vQa89nfLZLRXFEhwb9oksFXB7xtFPpEoCTTGSYuLAYyk2XK2t31NV+j93vo8MC9IRTnDBh+Rcwm+qIBLMouKUQuP51XlrzeT8c+SGKKDHZeoZwlNvZh0H9t8MeawKz2y+PPGIQbCgYDQ1ls1FN+8LASnngSqeDARhgZCYblspSxmeKIsS3OcJ88FVlerSmiWuUgM+E70GKhtn5eXUGIYLkUMVRFl6+dPrDbpDoTdL5cAJRqHEdLWwdCnUsm6kinMSjs3hx4rbpPiB7jZYj0GaCPU4An13l7rMDpkJhpsYZAsA1I4pB5g156KFAp4D6g0VM0sxm1kyLLIZEoZUG9sWeUs2e6bSuBB6ELusZ5J/Id0NpR+ennrP3vold8+uHQ8eFGvDgxhgtxcOiqxSwFHDbG0D3Gi93vI24Mv6hARiBI7AGu+0WGYYKlOBmGOaxhzosEbJaad7FqUuOHZElP6vTjXEXRP4z0wl/RsJyNdNJMJ4ZuLmSJ+CkhkIUOHNyJq9GJIwOkCH3qeQRMzWUmQ03qMpYXaESjr jcHE4/40 fZFFSMUF+pvXRFk/VUATl66pjHN9TVJTmgl0sDBI0wmNckIwPrECc7lYjgD4EfRHS1toagQc9ZGIGrSxKklhnDN4atjJ6fFWUWAlQt0Z8KDiIDl+31JNxtxdz/K/gxSYWageT2jyYJl8IMw1ipq9V1d9TN3G9sHETdKZ53SmdbTN2fqkqDTUbHexoZQUM7ZtM8rCDLWchIwSwC7KUfdZogZFsnRPw5IYMaS7uewjobiw6b4Vo4QvPrPq0EBVO2V9IGQFEdm17BV8IkqThvgqzKaw0KLXSQv7uzglDZ6HLBml3dyG8w8xnLK0dbvxdCo7qRmek+3kg8ciMknmoEhGINAFmUsdI0GwJKtU87CufXwGSU/FJ9HULVOv0jeIUY9KLASHcwaRIcC92zF6kiy5OMZR581V3yo/N02gVytax+LD6Ue+K0VXDu1rQ59lEreI1BVz5WUXv+iJ9A02SRqYAswm8NuH+tmBimUVZ3thfPkw3ItM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Let's make the kernel a bit less horrible, by removing the linearity requirement in CoW PFNMAP mappings with !CONFIG_ARCH_HAS_PTE_SPECIAL. In particular, stop messing with vma->vm_pgoff in weird ways. Simply lookup in applicable (i.e., CoW PFNMAP) mappings whether we have an anon folio. Nobody should ever try mapping anon folios using PFNs, that just screams for other possible issues. To be sure, let's sanity-check when inserting PFNs. Are they really required? Probably not, but it's a good safety net at least for now. The runtime overhead should be limited: there is nothing to do for !CoW mappings (common case), and archs that care about performance (i.e., GUP-fast) should be supporting CONFIG_ARCH_HAS_PTE_SPECIAL either way. Likely the sanity checks added in mm/huge_memory.c are not required for now, because that code is probably only wired up with CONFIG_ARCH_HAS_PTE_SPECIAL, but this way is certainly cleaner and more consistent -- and doesn't really cost us anything in the cases we really care about. Signed-off-by: David Hildenbrand --- include/linux/mm.h | 16 ++++++ mm/huge_memory.c | 16 +++++- mm/memory.c | 118 +++++++++++++++++++++++++-------------------- 3 files changed, 96 insertions(+), 54 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 98a606908307b..3f52871becd3f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2339,6 +2339,22 @@ static inline bool can_do_mlock(void) { return false; } extern int user_shm_lock(size_t, struct ucounts *); extern void user_shm_unlock(size_t, struct ucounts *); +#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL +static inline struct page *vm_pfnmap_normal_page_pfn(struct vm_area_struct *vma, + unsigned long pfn) +{ + /* + * We don't identify normal pages using PFNs. So if we reach + * this point, it's just for sanity checks that don't apply with + * pte_special() etc. + */ + return NULL; +} +#else +struct page *vm_pfnmap_normal_page_pfn(struct vm_area_struct *vma, + unsigned long pfn); +#endif + struct folio *vm_normal_folio(struct vm_area_struct *vma, unsigned long addr, pte_t pte); struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 8f03cd4e40397..67220c30e7818 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1479,7 +1479,13 @@ vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, unsigned long pfn, BUG_ON(!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))); BUG_ON((vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) == (VM_PFNMAP|VM_MIXEDMAP)); - BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags)); + + /* + * Refuse this pfn if we could mistake it as a refcounted folio + * in a CoW mapping later in vm_normal_page_pmd(). + */ + if ((vma->vm_flags & VM_PFNMAP) && vm_pfnmap_normal_page_pfn(vma, pfn)) + return VM_FAULT_SIGBUS; pfnmap_setup_cachemode_pfn(pfn, &pgprot); @@ -1587,7 +1593,13 @@ vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, unsigned long pfn, BUG_ON(!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))); BUG_ON((vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) == (VM_PFNMAP|VM_MIXEDMAP)); - BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags)); + + /* + * Refuse this pfn if we could mistake it as a refcounted folio + * in a CoW mapping later in vm_normal_page_pud(). + */ + if ((vma->vm_flags & VM_PFNMAP) && vm_pfnmap_normal_page_pfn(vma, pfn)) + return VM_FAULT_SIGBUS; pfnmap_setup_cachemode_pfn(pfn, &pgprot); diff --git a/mm/memory.c b/mm/memory.c index 3d3fa01cd217e..ace9c59e97181 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -536,9 +536,35 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr, add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); } +#ifndef CONFIG_ARCH_HAS_PTE_SPECIAL +struct page *vm_pfnmap_normal_page_pfn(struct vm_area_struct *vma, + unsigned long pfn) +{ + struct folio *folio; + struct page *page; + + VM_WARN_ON_ONCE(!(vma->vm_flags & VM_PFNMAP)); + + /* + * If we have a CoW mapping and spot an anon folio, then it can + * only be due to CoW: the page is "normal". + */ + if (likely(!is_cow_mapping(vma->vm_flags))) + return NULL; + if (likely(!pfn_valid(pfn))) + return NULL; + + page = pfn_to_page(pfn); + folio = page_folio(page); + if (folio_test_slab(folio) || !folio_test_anon(folio)) + return NULL; + return page; +} +#endif /* !CONFIG_ARCH_HAS_PTE_SPECIAL */ + /* Called only if the page table entry is not marked special. */ static inline struct page *vm_normal_page_pfn(struct vm_area_struct *vma, - unsigned long addr, unsigned long pfn) + unsigned long pfn) { /* * With CONFIG_ARCH_HAS_PTE_SPECIAL, any special page table mappings @@ -553,13 +579,8 @@ static inline struct page *vm_normal_page_pfn(struct vm_area_struct *vma, if (!pfn_valid(pfn)) return NULL; } else { - unsigned long off = (addr - vma->vm_start) >> PAGE_SHIFT; - /* Only CoW'ed anon folios are "normal". */ - if (pfn == vma->vm_pgoff + off) - return NULL; - if (!is_cow_mapping(vma->vm_flags)) - return NULL; + return vm_pfnmap_normal_page_pfn(vma, pfn); } } @@ -589,30 +610,19 @@ static inline struct page *vm_normal_page_pfn(struct vm_area_struct *vma, * (such as GUP) can still identify these mappings and work with the * underlying "struct page". * - * There are 2 broad cases. Firstly, an architecture may define a pte_special() - * pte bit, in which case this function is trivial. Secondly, an architecture - * may not have a spare pte bit, which requires a more complicated scheme, - * described below. + * An architecture may support pte_special() to distinguish "special" + * from "normal" mappings more efficiently, and even without the VMA at hand. + * For example, in order to support GUP-fast, whereby we don't have the VMA + * available when walking the page tables, support for pte_special() is + * crucial. + * + * If an architecture does not support pte_special(), this function is less + * trivial and more expensive in some cases. * * A raw VM_PFNMAP mapping (ie. one that is not COWed) is always considered a * special mapping (even if there are underlying and valid "struct pages"). * COWed pages of a VM_PFNMAP are always normal. * - * The way we recognize COWed pages within VM_PFNMAP mappings is through the - * rules set up by "remap_pfn_range()": the vma will have the VM_PFNMAP bit - * set, and the vm_pgoff will point to the first PFN mapped: thus every special - * mapping will always honor the rule - * - * pfn_of_page == vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT) - * - * And for normal mappings this is false. - * - * This restricts such mappings to be a linear translation from virtual address - * to pfn. To get around this restriction, we allow arbitrary mappings so long - * as the vma is not a COW mapping; in that case, we know that all ptes are - * special (because none can have been COWed). - * - * * In order to support COW of arbitrary special mappings, we have VM_MIXEDMAP. * * VM_MIXEDMAP mappings can likewise contain memory with or without "struct @@ -621,10 +631,7 @@ static inline struct page *vm_normal_page_pfn(struct vm_area_struct *vma, * folios) are refcounted and considered normal pages by the VM. * * The disadvantage is that pages are refcounted (which can be slower and - * simply not an option for some PFNMAP users). The advantage is that we - * don't have to follow the strict linearity rule of PFNMAP mappings in - * order to support COWable mappings. - * + * simply not an option for some PFNMAP users). */ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte) @@ -642,7 +649,7 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, print_bad_pte(vma, addr, pte, NULL); return NULL; } - return vm_normal_page_pfn(vma, addr, pfn); + return vm_normal_page_pfn(vma, pfn); } struct folio *vm_normal_folio(struct vm_area_struct *vma, unsigned long addr, @@ -666,7 +673,7 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr, !is_huge_zero_pfn(pfn)); return NULL; } - return vm_normal_page_pfn(vma, addr, pfn); + return vm_normal_page_pfn(vma, pfn); } struct folio *vm_normal_folio_pmd(struct vm_area_struct *vma, @@ -2422,6 +2429,13 @@ static vm_fault_t insert_pfn(struct vm_area_struct *vma, unsigned long addr, pte_t *pte, entry; spinlock_t *ptl; + /* + * Refuse this pfn if we could mistake it as a refcounted folio + * in a CoW mapping later in vm_normal_page(). + */ + if ((vma->vm_flags & VM_PFNMAP) && vm_pfnmap_normal_page_pfn(vma, pfn)) + return VM_FAULT_SIGBUS; + pte = get_locked_pte(mm, addr, &ptl); if (!pte) return VM_FAULT_OOM; @@ -2511,7 +2525,6 @@ vm_fault_t vmf_insert_pfn_prot(struct vm_area_struct *vma, unsigned long addr, BUG_ON(!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))); BUG_ON((vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) == (VM_PFNMAP|VM_MIXEDMAP)); - BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags)); BUG_ON((vma->vm_flags & VM_MIXEDMAP) && pfn_valid(pfn)); if (addr < vma->vm_start || addr >= vma->vm_end) @@ -2656,10 +2669,11 @@ vm_fault_t vmf_insert_mixed_mkwrite(struct vm_area_struct *vma, * mappings are removed. any references to nonexistent pages results * in null mappings (currently treated as "copy-on-access") */ -static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, +static int remap_pte_range(struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, unsigned long pfn, pgprot_t prot) { + struct mm_struct *mm = vma->vm_mm; pte_t *pte, *mapped_pte; spinlock_t *ptl; int err = 0; @@ -2674,6 +2688,14 @@ static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, err = -EACCES; break; } + /* + * Refuse this pfn if we could mistake it as a refcounted folio + * in a CoW mapping later in vm_normal_page(). + */ + if (vm_pfnmap_normal_page_pfn(vma, pfn)) { + err = -EINVAL; + break; + } set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot))); pfn++; } while (pte++, addr += PAGE_SIZE, addr != end); @@ -2682,10 +2704,11 @@ static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, return err; } -static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, +static inline int remap_pmd_range(struct vm_area_struct *vma, pud_t *pud, unsigned long addr, unsigned long end, unsigned long pfn, pgprot_t prot) { + struct mm_struct *mm = vma->vm_mm; pmd_t *pmd; unsigned long next; int err; @@ -2697,7 +2720,7 @@ static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, VM_BUG_ON(pmd_trans_huge(*pmd)); do { next = pmd_addr_end(addr, end); - err = remap_pte_range(mm, pmd, addr, next, + err = remap_pte_range(vma, pmd, addr, next, pfn + (addr >> PAGE_SHIFT), prot); if (err) return err; @@ -2705,10 +2728,11 @@ static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, return 0; } -static inline int remap_pud_range(struct mm_struct *mm, p4d_t *p4d, +static inline int remap_pud_range(struct vm_area_struct *vma, p4d_t *p4d, unsigned long addr, unsigned long end, unsigned long pfn, pgprot_t prot) { + struct mm_struct *mm = vma->vm_mm; pud_t *pud; unsigned long next; int err; @@ -2719,7 +2743,7 @@ static inline int remap_pud_range(struct mm_struct *mm, p4d_t *p4d, return -ENOMEM; do { next = pud_addr_end(addr, end); - err = remap_pmd_range(mm, pud, addr, next, + err = remap_pmd_range(vma, pud, addr, next, pfn + (addr >> PAGE_SHIFT), prot); if (err) return err; @@ -2727,10 +2751,11 @@ static inline int remap_pud_range(struct mm_struct *mm, p4d_t *p4d, return 0; } -static inline int remap_p4d_range(struct mm_struct *mm, pgd_t *pgd, +static inline int remap_p4d_range(struct vm_area_struct *vma, pgd_t *pgd, unsigned long addr, unsigned long end, unsigned long pfn, pgprot_t prot) { + struct mm_struct *mm = vma->vm_mm; p4d_t *p4d; unsigned long next; int err; @@ -2741,7 +2766,7 @@ static inline int remap_p4d_range(struct mm_struct *mm, pgd_t *pgd, return -ENOMEM; do { next = p4d_addr_end(addr, end); - err = remap_pud_range(mm, p4d, addr, next, + err = remap_pud_range(vma, p4d, addr, next, pfn + (addr >> PAGE_SHIFT), prot); if (err) return err; @@ -2773,18 +2798,7 @@ static int remap_pfn_range_internal(struct vm_area_struct *vma, unsigned long ad * Disable vma merging and expanding with mremap(). * VM_DONTDUMP * Omit vma from core dump, even when VM_IO turned off. - * - * There's a horrible special case to handle copy-on-write - * behaviour that some programs depend on. We mark the "original" - * un-COW'ed pages by matching them up with "vma->vm_pgoff". - * See vm_normal_page() for details. */ - if (is_cow_mapping(vma->vm_flags)) { - if (addr != vma->vm_start || end != vma->vm_end) - return -EINVAL; - vma->vm_pgoff = pfn; - } - vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP); BUG_ON(addr >= end); @@ -2793,7 +2807,7 @@ static int remap_pfn_range_internal(struct vm_area_struct *vma, unsigned long ad flush_cache_range(vma, addr, end); do { next = pgd_addr_end(addr, end); - err = remap_p4d_range(mm, pgd, addr, next, + err = remap_p4d_range(vma, pgd, addr, next, pfn + (addr >> PAGE_SHIFT), prot); if (err) return err; -- 2.49.0