From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A98AEB64DC for ; Tue, 4 Jul 2023 01:36:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CAC4828004A; Mon, 3 Jul 2023 21:36:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C5CB7280049; Mon, 3 Jul 2023 21:36:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AFD8C28004A; Mon, 3 Jul 2023 21:36:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9FB78280049 for ; Mon, 3 Jul 2023 21:36:20 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 77E49AF9D9 for ; Tue, 4 Jul 2023 01:36:20 +0000 (UTC) X-FDA: 80972214120.10.61B6763 Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) by imf21.hostedemail.com (Postfix) with ESMTP id B7F121C001D for ; Tue, 4 Jul 2023 01:36:18 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=lPheyFPO; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.170 as permitted sender) smtp.mailfrom=yuzhao@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688434578; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8yD+j63IlPX6Oxohw2v8p6cUDw/4hfG0D1dupj02blo=; b=lLicSWI+Q/R0toGLAmBeYc70lHDXTs8GOalcHypgPUyNFIANyRs8Nhost1+B1/3QeEMl+h EKr3CwnIxGJqSXVJxnaTZ4nwvxrmRecPMRt6OW/AF2Wkc7gjK08vgnZDfB6WIVNvr3Qkcg 5eYtxGPlyqNGpyHONMFCB/WMS8E1aTY= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=lPheyFPO; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.170 as permitted sender) smtp.mailfrom=yuzhao@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688434578; a=rsa-sha256; cv=none; b=WRezq19FGXsmBZIs14UoGjj/omhw2sT/0U4RFyEdmSsvY1HIz3Cq5qcjWy8T60yHy0PoIL wjXcG1J6qN18JHBcRzFv0yXklzmNcxdJB+rYpD+NXtO1Rbq8xTqeWJE/BK3XHVR09iNu5Z 4iavowT5ESO7vOB8mwYuzfWK/YRopy4= Received: by mail-qt1-f170.google.com with SMTP id d75a77b69052e-401d1d967beso555471cf.0 for ; Mon, 03 Jul 2023 18:36:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1688434578; x=1691026578; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=8yD+j63IlPX6Oxohw2v8p6cUDw/4hfG0D1dupj02blo=; b=lPheyFPOdIARgikhT2HA6JkatLfoRopl+Hkkk5EZdcMtAhoxKgruswk3VgaGSdZT/q AwqsPXojmBqr2hS7VKv82E74HBQmBTRgWqNS/G9A+C9bZS3Z8Y2k9ZeDaQbhSdO3CZ/R dEeh97Ag10TzLTMa67GxomTisNKecZVUUWKvkDHI980PrhqdeBAKh5v6GqcGzIFYIAEQ 07gPSYLvmV/yrzAKXBd+nKWfLQFqy0XRAzZ9Ijkh65ZkP7JSM6+2wqyAKxb5H+V4twv4 DDliNTcWHJTaL9YqvRDLu8tEfvOeENj2MheyRtyf10Pjp1XyErxS8qvI4BR5YCWBuYBL TA8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688434578; x=1691026578; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=8yD+j63IlPX6Oxohw2v8p6cUDw/4hfG0D1dupj02blo=; b=RCh3Ej3akn0Phkw9KS6U+GOJXJHD1b48r3xU13/vWvKGIyezMdhCwO26MVBcQxjVLo ozRAqzFi7P4q0o1V8kowcc980qNjNrOz/g4Nv+bajqwc7nh4ksPV2HgLw8fXqflQRhpD dCUC7cz7C6mxJnhN37CKZqjywSdeWSqQdThf/m+X6UI3Nn+DCRoJ97TySqU2Xuo5r2YO YpzHA/gj22LUksr5P9Ds8OMMGzRy7IRvOMud8sXSctltsaVVvUhbUfiEke7ytOPZEAEz gy9tUX7fHJiUjJ88gei6f6yrYiyy9/9Ig83lLd+S4NATwmx+/jFt1AmCCnqSe9vNZPYl 4iXg== X-Gm-Message-State: ABy/qLbnj42LSMbtywYjNVOXlHGCluz1KULXDgNLqBLx1Htm0mIFkMTK pzk/166Lf728ZhcxJnWoJ0IJLNv4s6QhmrbaPU5Xdw== X-Google-Smtp-Source: APBJJlFT6LDzh411+p8Nv3qxlRQNPQ7+fahMCv0UXJtEuQEg8z15fbg9Zwi5fShONQKjiRIoRXw9AFGZRBz7YCjDvgM= X-Received: by 2002:ac8:4e4c:0:b0:3f8:5b2:aeec with SMTP id e12-20020ac84e4c000000b003f805b2aeecmr38618qtw.20.1688434577662; Mon, 03 Jul 2023 18:36:17 -0700 (PDT) MIME-Version: 1.0 References: <20230703135330.1865927-1-ryan.roberts@arm.com> <20230703135330.1865927-5-ryan.roberts@arm.com> In-Reply-To: <20230703135330.1865927-5-ryan.roberts@arm.com> From: Yu Zhao Date: Mon, 3 Jul 2023 19:35:41 -0600 Message-ID: Subject: Re: [PATCH v2 4/5] mm: FLEXIBLE_THP for improved performance To: Ryan Roberts Cc: Andrew Morton , Matthew Wilcox , "Kirill A. Shutemov" , Yin Fengwei , David Hildenbrand , Catalin Marinas , Will Deacon , Anshuman Khandual , Yang Shi , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: multipart/mixed; boundary="0000000000003ae8c205ff9f4fb5" X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: B7F121C001D X-Stat-Signature: cy71wcs8p3kx9d5jrooqt3catwsz5k9x X-HE-Tag: 1688434578-844535 X-HE-Meta: U2FsdGVkX1/Wkq5ADS4HspepyyqRJ5YQsNY7dPCUkLUonmj9KyiYhgbAtXui7kT1TtV51lndHJQQYlT/CbZuRgatO3kvVDC+ydp947H69s+mvc9azjWAHO4ghBpEBOspYq5CCxn23H8n1M9VwYXjpIcm8FOIe6qMoRfcMgpJqwBP8Z3wVtpIv9hnz98PcOsi4zsYwT+uaOxQoZeSKQrPiplGM839DGNy5iMgYaec0Xjp2WAPZjmHFKWnHhOcDienBA6GppwXIsAVi6+q23e4D3LnXsQJ1jf8o+aOpRB+Dj4kQQozU2D7izh0ARQs2rONWb6VrvQd3/M1A1SPYilKRgXv79WBvI+UlRTlcuFFK2N4E6fcGOavn5G0r5X2qZj8rgSUCUVOzZ34PAUUAcPX3KAa3sGrksiHBwG8lB4dUa4zut1Daz0W8E1c7ssRnK8y0zM4EalGhaSTPZyL+VYbPb+WSg7DTBFLAnQpqmYJ+X9T4bgZTq8uF0Oewklg1k4/jnYi7vYk+QVPakjSo6kzStUdQ3MJ4e3VAbRZYq3SbkN6g3/pNHDLf0XmCQFsBLDfwiFUiCNtzYKgV/0Y+Up81BSNzfs4vOL/PTz7uw3RzuxoQw2IuLvvmaZlPmyeNdUps4m4K5R83+edkSPyKB8SNXEXrbWfNyHnW3Z+U37866Jl8hd8HUy+6/O+f5xpzX7YyljO7pb2AR03IchzqN2NEKqpMV6equysNs8bh+FwgScgSehaVLONS1lHe7Uc2vSThrpJh09qosUphBPYhluNskAdC/qYWor2YCw1VAFhJ/42De7EV358KGmyC68YAh+13s6qmqqCXUtcowRdqbon7TG9a5qAw1vRwFfCn0aqNhYdjuSqTW2zXCYkKiLoZa3kOsJV/I2SvBAr/Yw/xqa/PX1t0+s68ZUm5gZ2BPJ0Dwx1MSVgcthEmRadqVctAlp4pmiTbqWQaVcexJtanX/ 1dBzXIb8 TE69Bgr4GtFBFjTQY+WZ0Iq14n01IaD7RaTGTkfwn1T32KY2SWNIKwAKToSHFxT7nFkM5Izpqm7QxVTN73IPH7f6V/rdv+PCDEKbHSjXmJZL4x/VjTtRz5T1Ck5zM4JSQn9xxkxJ7pTIQU2wABQR6ZAgWCCO8HXwug8RBHG7rKOVcORPnCE+YV5DtGHItlIvrykU1g2H19bb5kdPlgdYAUHLZIpG583zroNMme8j3vpDgAJ/g0OjE/udh1YQj4fhoccPueo0MKwt4SBsKJyBbh6e75+9Kzhfl6TCk7R5e3ORSvsUgKduCBzqmCA2cRoGQaF8DYa4nvCxHbRcwOC2PFnJ+QY6njIoVgBlKQ6xGuCrJrwY7rJAu/tJigF0FswhMxqQJi68rYPLgwAEgBevjfF5hIGjcE4os1vz0Q0IN1wv0QLucWXSjJCYmJLevQX3BsIxyPMKoZpdO2vuHkFOLDvhIHfoz17ZCIpuX9RwsCx71WrxWufwKEkw9o/6KkXbnPzPI9lzJ09aQD56a4pGXP8jbiIF4YpD+mPYlukWnSWpBV5VEOicrtQfN6g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --0000000000003ae8c205ff9f4fb5 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Jul 3, 2023 at 7:53=E2=80=AFAM Ryan Roberts = wrote: > > Introduce FLEXIBLE_THP feature, which allows anonymous memory to be > allocated in large folios of a specified order. All pages of the large > folio are pte-mapped during the same page fault, significantly reducing > the number of page faults. The number of per-page operations (e.g. ref > counting, rmap management lru list management) are also significantly > reduced since those ops now become per-folio. > > The new behaviour is hidden behind the new FLEXIBLE_THP Kconfig, which > defaults to disabled for now; there is a long list of todos to make > FLEXIBLE_THP robust with existing features (e.g. compaction, mlock, some > madvise ops, etc). These items will be tackled in subsequent patches. > > When enabled, the preferred folio order is as returned by > arch_wants_pte_order(), which may be overridden by the arch as it sees > fit. Some architectures (e.g. arm64) can coalsece TLB entries if a coalesce > contiguous set of ptes map physically contigious, naturally aligned contiguous > memory, so this mechanism allows the architecture to optimize as > required. > > If the preferred order can't be used (e.g. because the folio would > breach the bounds of the vma, or because ptes in the region are already > mapped) then we fall back to a suitable lower order. > > Signed-off-by: Ryan Roberts > --- > mm/Kconfig | 10 ++++ > mm/memory.c | 168 ++++++++++++++++++++++++++++++++++++++++++++++++---- > 2 files changed, 165 insertions(+), 13 deletions(-) > > diff --git a/mm/Kconfig b/mm/Kconfig > index 7672a22647b4..1c06b2c0a24e 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -822,6 +822,16 @@ config READ_ONLY_THP_FOR_FS > support of file THPs will be developed in the next few release > cycles. > > +config FLEXIBLE_THP > + bool "Flexible order THP" > + depends on TRANSPARENT_HUGEPAGE > + default n The default value is already N. > + help > + Use large (bigger than order-0) folios to back anonymous memory= where > + possible, even if the order of the folio is smaller than the PM= D > + order. This reduces the number of page faults, as well as other > + per-page overheads to improve performance for many workloads. > + > endif # TRANSPARENT_HUGEPAGE > > # > diff --git a/mm/memory.c b/mm/memory.c > index fb30f7523550..abe2ea94f3f5 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3001,6 +3001,116 @@ static vm_fault_t fault_dirty_shared_page(struct = vm_fault *vmf) > return 0; > } > > +#ifdef CONFIG_FLEXIBLE_THP > +/* > + * Allocates, zeros and returns a folio of the requested order for use a= s > + * anonymous memory. > + */ > +static struct folio *alloc_anon_folio(struct vm_area_struct *vma, > + unsigned long addr, int order) > +{ > + gfp_t gfp; > + struct folio *folio; > + > + if (order =3D=3D 0) > + return vma_alloc_zeroed_movable_folio(vma, addr); > + > + gfp =3D vma_thp_gfp_mask(vma); > + folio =3D vma_alloc_folio(gfp, order, vma, addr, true); > + if (folio) > + clear_huge_page(&folio->page, addr, folio_nr_pages(folio)= ); > + > + return folio; > +} > + > +/* > + * Preferred folio order to allocate for anonymous memory. > + */ > +#define max_anon_folio_order(vma) arch_wants_pte_order(vma) > +#else > +#define alloc_anon_folio(vma, addr, order) \ > + vma_alloc_zeroed_movable_folio(vma, addr) > +#define max_anon_folio_order(vma) 0 > +#endif > + > +/* > + * Returns index of first pte that is not none, or nr if all are none. > + */ > +static inline int check_ptes_none(pte_t *pte, int nr) > +{ > + int i; > + > + for (i =3D 0; i < nr; i++) { > + if (!pte_none(ptep_get(pte++))) > + return i; > + } > + > + return nr; > +} > + > +static int calc_anon_folio_order_alloc(struct vm_fault *vmf, int order) > +{ > + /* > + * The aim here is to determine what size of folio we should allo= cate > + * for this fault. Factors include: > + * - Order must not be higher than `order` upon entry > + * - Folio must be naturally aligned within VA space > + * - Folio must be fully contained inside one pmd entry > + * - Folio must not breach boundaries of vma > + * - Folio must not overlap any non-none ptes > + * > + * Additionally, we do not allow order-1 since this breaks assump= tions > + * elsewhere in the mm; THP pages must be at least order-2 (since= they > + * store state up to the 3rd struct page subpage), and these page= s must > + * be THP in order to correctly use pre-existing THP infrastructu= re such > + * as folio_split(). > + * > + * Note that the caller may or may not choose to lock the pte. If > + * unlocked, the result is racy and the user must re-check any ov= erlap > + * with non-none ptes under the lock. > + */ > + > + struct vm_area_struct *vma =3D vmf->vma; > + int nr; > + unsigned long addr; > + pte_t *pte; > + pte_t *first_set =3D NULL; > + int ret; > + > + order =3D min(order, PMD_SHIFT - PAGE_SHIFT); > + > + for (; order > 1; order--) { I'm not sure how we can justify this policy. As an initial step, it'd be a lot easier to sell if we only considered the order of arch_wants_pte_order() and the order 0. > + nr =3D 1 << order; > + addr =3D ALIGN_DOWN(vmf->address, nr << PAGE_SHIFT); > + pte =3D vmf->pte - ((vmf->address - addr) >> PAGE_SHIFT); > + > + /* Check vma bounds. */ > + if (addr < vma->vm_start || > + addr + (nr << PAGE_SHIFT) > vma->vm_end) > + continue; > + > + /* Ptes covered by order already known to be none. */ > + if (pte + nr <=3D first_set) > + break; > + > + /* Already found set pte in range covered by order. */ > + if (pte <=3D first_set) > + continue; > + > + /* Need to check if all the ptes are none. */ > + ret =3D check_ptes_none(pte, nr); > + if (ret =3D=3D nr) > + break; > + > + first_set =3D pte + ret; > + } > + > + if (order =3D=3D 1) > + order =3D 0; > + > + return order; > +} Everything above can be simplified into two helpers: vmf_pte_range_changed() and alloc_anon_folio() (or whatever names you prefer). Details below. > /* > * Handle write page faults for pages that can be reused in the current = vma > * > @@ -3073,7 +3183,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf= ) > goto oom; > > if (is_zero_pfn(pte_pfn(vmf->orig_pte))) { > - new_folio =3D vma_alloc_zeroed_movable_folio(vma, vmf->ad= dress); > + new_folio =3D alloc_anon_folio(vma, vmf->address, 0); This seems unnecessary for now. Later on, we could fill in an aligned area with multiple write-protected zero pages during a read fault and then replace them with a large folio here. > if (!new_folio) > goto oom; > } else { > @@ -4040,6 +4150,9 @@ static vm_fault_t do_anonymous_page(struct vm_fault= *vmf) > struct folio *folio; > vm_fault_t ret =3D 0; > pte_t entry; > + int order; > + int pgcount; > + unsigned long addr; > > /* File mapping without ->vm_ops ? */ > if (vma->vm_flags & VM_SHARED) > @@ -4081,24 +4194,51 @@ static vm_fault_t do_anonymous_page(struct vm_fau= lt *vmf) > pte_unmap_unlock(vmf->pte, vmf->ptl); > return handle_userfault(vmf, VM_UFFD_MISSING); > } > - goto setpte; > + if (uffd_wp) > + entry =3D pte_mkuffd_wp(entry); > + set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry); > + > + /* No need to invalidate - it was non-present before */ > + update_mmu_cache(vma, vmf->address, vmf->pte); > + goto unlock; > + } Nor really needed IMO. Details below. =3D=3D=3D > + /* > + * If allocating a large folio, determine the biggest suitable or= der for > + * the VMA (e.g. it must not exceed the VMA's bounds, it must not > + * overlap with any populated PTEs, etc). We are not under the pt= l here > + * so we will need to re-check that we are not overlapping any po= pulated > + * PTEs once we have the lock. > + */ > + order =3D uffd_wp ? 0 : max_anon_folio_order(vma); > + if (order > 0) { > + vmf->pte =3D pte_offset_map(vmf->pmd, vmf->address); > + order =3D calc_anon_folio_order_alloc(vmf, order); > + pte_unmap(vmf->pte); > } =3D=3D=3D The section above together with the section below should be wrapped in a he= lper. > - /* Allocate our own private page. */ > + /* Allocate our own private folio. */ > if (unlikely(anon_vma_prepare(vma))) > goto oom; =3D=3D=3D > - folio =3D vma_alloc_zeroed_movable_folio(vma, vmf->address); > + folio =3D alloc_anon_folio(vma, vmf->address, order); > + if (!folio && order > 0) { > + order =3D 0; > + folio =3D alloc_anon_folio(vma, vmf->address, order); > + } =3D=3D=3D One helper returns a folio of order arch_wants_pte_order(), or order 0 if it fails to allocate that order, e.g., folio =3D alloc_anon_folio(vmf); And if vmf_orig_pte_uffd_wp(vmf) is true, the helper allocates order 0 regardless of arch_wants_pte_order(). Upon success, it can update vmf->address, since if we run into a race with another PF, we exit the fault handler and retry anyway. > if (!folio) > goto oom; > > + pgcount =3D 1 << order; > + addr =3D ALIGN_DOWN(vmf->address, pgcount << PAGE_SHIFT); As shown above, the helper already updates vmf->address. And mm/ never used pgcount before -- the convention is nr_pages =3D folio_nr_pages(). > if (mem_cgroup_charge(folio, vma->vm_mm, GFP_KERNEL)) > goto oom_free_page; > folio_throttle_swaprate(folio, GFP_KERNEL); > > /* > * The memory barrier inside __folio_mark_uptodate makes sure tha= t > - * preceding stores to the page contents become visible before > - * the set_pte_at() write. > + * preceding stores to the folio contents become visible before > + * the set_ptes() write. We don't have set_ptes() yet. > */ > __folio_mark_uptodate(folio); > > @@ -4107,11 +4247,12 @@ static vm_fault_t do_anonymous_page(struct vm_fau= lt *vmf) > if (vma->vm_flags & VM_WRITE) > entry =3D pte_mkwrite(pte_mkdirty(entry)); > > - vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->addre= ss, > - &vmf->ptl); > + vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf= ->ptl); > if (vmf_pte_changed(vmf)) { > update_mmu_tlb(vma, vmf->address, vmf->pte); > goto release; > + } else if (order > 0 && check_ptes_none(vmf->pte, pgcount) !=3D p= gcount) { > + goto release; > } Need new helper: if (vmf_pte_range_changed(vmf, nr_pages)) { for (i =3D 0; i < nr_pages; i++) update_mmu_tlb(vma, vmf->address + PAGE_SIZE * i, vmf->pte + i); goto release; } (It should be fine to call update_mmu_tlb() even if it's not really necessa= ry.) > ret =3D check_stable_address_space(vma->vm_mm); > @@ -4125,16 +4266,17 @@ static vm_fault_t do_anonymous_page(struct vm_fau= lt *vmf) > return handle_userfault(vmf, VM_UFFD_MISSING); > } > > - inc_mm_counter(vma->vm_mm, MM_ANONPAGES); > - folio_add_new_anon_rmap(folio, vma, vmf->address); > + folio_ref_add(folio, pgcount - 1); > + add_mm_counter(vma->vm_mm, MM_ANONPAGES, pgcount); > + folio_add_new_anon_rmap(folio, vma, addr); > folio_add_lru_vma(folio, vma); > -setpte: > + > if (uffd_wp) > entry =3D pte_mkuffd_wp(entry); > - set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry); > + set_ptes(vma->vm_mm, addr, vmf->pte, entry, pgcount); We would have to do it one by one for now. > /* No need to invalidate - it was non-present before */ > - update_mmu_cache(vma, vmf->address, vmf->pte); > + update_mmu_cache_range(vma, addr, vmf->pte, pgcount); Ditto. How about this (by moving mk_pte() and its friends here): ... folio_add_lru_vma(folio, vma); for (i =3D 0; i < nr_pages; i++) { entry =3D mk_pte(folio_page(folio, i), vma->vm_page_prot); entry =3D pte_sw_mkyoung(entry); if (vma->vm_flags & VM_WRITE) entry =3D pte_mkwrite(pte_mkdirty(entry)); setpte: if (uffd_wp) entry =3D pte_mkuffd_wp(entry); set_pte_at(vma->vm_mm, vmf->address + PAGE_SIZE * i, vmf->pte + i, entry); /* No need to invalidate - it was non-present before */ update_mmu_cache(vma, vmf->address + PAGE_SIZE * i, vmf->pte + i); } > unlock: > pte_unmap_unlock(vmf->pte, vmf->ptl); > return ret; Attaching a small patch in case anything above is not clear. Please take a look. Thanks. --0000000000003ae8c205ff9f4fb5 Content-Type: text/x-patch; charset="US-ASCII"; name="anon_folios.patch" Content-Disposition: attachment; filename="anon_folios.patch" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_ljnm9ju40 ZGlmZiAtLWdpdCBhL21tL21lbW9yeS5jIGIvbW0vbWVtb3J5LmMKaW5kZXggNDBhMjY5NDU3Yzhi Li4wNGZkYjg1MjlmNjggMTAwNjQ0Ci0tLSBhL21tL21lbW9yeS5jCisrKyBiL21tL21lbW9yeS5j CkBAIC00MDYzLDYgKzQwNjMsOCBAQCB2bV9mYXVsdF90IGRvX3N3YXBfcGFnZShzdHJ1Y3Qgdm1f ZmF1bHQgKnZtZikKICAqLwogc3RhdGljIHZtX2ZhdWx0X3QgZG9fYW5vbnltb3VzX3BhZ2Uoc3Ry dWN0IHZtX2ZhdWx0ICp2bWYpCiB7CisJaW50IGkgPSAwOworCWludCBucl9wYWdlcyA9IDE7CiAJ Ym9vbCB1ZmZkX3dwID0gdm1mX29yaWdfcHRlX3VmZmRfd3Aodm1mKTsKIAlzdHJ1Y3Qgdm1fYXJl YV9zdHJ1Y3QgKnZtYSA9IHZtZi0+dm1hOwogCXN0cnVjdCBmb2xpbyAqZm9saW87CkBAIC00MTA3 LDEwICs0MTA5LDEyIEBAIHN0YXRpYyB2bV9mYXVsdF90IGRvX2Fub255bW91c19wYWdlKHN0cnVj dCB2bV9mYXVsdCAqdm1mKQogCS8qIEFsbG9jYXRlIG91ciBvd24gcHJpdmF0ZSBwYWdlLiAqLwog CWlmICh1bmxpa2VseShhbm9uX3ZtYV9wcmVwYXJlKHZtYSkpKQogCQlnb3RvIG9vbTsKLQlmb2xp byA9IHZtYV9hbGxvY196ZXJvZWRfbW92YWJsZV9mb2xpbyh2bWEsIHZtZi0+YWRkcmVzcyk7CisJ Zm9saW8gPSBhbGxvY19hbm9uX2ZvbGlvKHZtZik7IC8vIHVwZGF0ZXMgdm1mLT5hZGRyZXNzIGFj Y29yZGluZ2x5CiAJaWYgKCFmb2xpbykKIAkJZ290byBvb207CiAKKwlucl9wYWdlcyA9IGZvbGlv X25yX3BhZ2VzKGZvbGlvKTsKKwogCWlmIChtZW1fY2dyb3VwX2NoYXJnZShmb2xpbywgdm1hLT52 bV9tbSwgR0ZQX0tFUk5FTCkpCiAJCWdvdG8gb29tX2ZyZWVfcGFnZTsKIAlmb2xpb190aHJvdHRs ZV9zd2FwcmF0ZShmb2xpbywgR0ZQX0tFUk5FTCk7CkBAIC00MTIyLDE3ICs0MTI2LDEzIEBAIHN0 YXRpYyB2bV9mYXVsdF90IGRvX2Fub255bW91c19wYWdlKHN0cnVjdCB2bV9mYXVsdCAqdm1mKQog CSAqLwogCV9fZm9saW9fbWFya191cHRvZGF0ZShmb2xpbyk7CiAKLQllbnRyeSA9IG1rX3B0ZSgm Zm9saW8tPnBhZ2UsIHZtYS0+dm1fcGFnZV9wcm90KTsKLQllbnRyeSA9IHB0ZV9zd19ta3lvdW5n KGVudHJ5KTsKLQlpZiAodm1hLT52bV9mbGFncyAmIFZNX1dSSVRFKQotCQllbnRyeSA9IHB0ZV9t a3dyaXRlKHB0ZV9ta2RpcnR5KGVudHJ5KSk7Ci0KIAl2bWYtPnB0ZSA9IHB0ZV9vZmZzZXRfbWFw X2xvY2sodm1hLT52bV9tbSwgdm1mLT5wbWQsIHZtZi0+YWRkcmVzcywKIAkJCSZ2bWYtPnB0bCk7 CiAJaWYgKCF2bWYtPnB0ZSkKIAkJZ290byByZWxlYXNlOwotCWlmICh2bWZfcHRlX2NoYW5nZWQo dm1mKSkgewotCQl1cGRhdGVfbW11X3RsYih2bWEsIHZtZi0+YWRkcmVzcywgdm1mLT5wdGUpOwor CWlmICh2bWZfcHRlX3JhbmdlX2NoYW5nZWQodm1mLCBucl9wYWdlcykpIHsKKwkJZm9yIChpID0g MDsgaSA8IG5yX3BhZ2VzOyBpKyspCisJCQl1cGRhdGVfbW11X3RsYih2bWEsIHZtZi0+YWRkcmVz cyArIFBBR0VfU0laRSAqIGksIHZtZi0+cHRlICsgaSk7CiAJCWdvdG8gcmVsZWFzZTsKIAl9CiAK QEAgLTQxNDcsMTYgKzQxNDcsMjQgQEAgc3RhdGljIHZtX2ZhdWx0X3QgZG9fYW5vbnltb3VzX3Bh Z2Uoc3RydWN0IHZtX2ZhdWx0ICp2bWYpCiAJCXJldHVybiBoYW5kbGVfdXNlcmZhdWx0KHZtZiwg Vk1fVUZGRF9NSVNTSU5HKTsKIAl9CiAKLQlpbmNfbW1fY291bnRlcih2bWEtPnZtX21tLCBNTV9B Tk9OUEFHRVMpOworCWZvbGlvX3JlZl9hZGQoZm9saW8sIG5yX3BhZ2VzIC0gMSk7CisJYWRkX21t X2NvdW50ZXIodm1hLT52bV9tbSwgTU1fQU5PTlBBR0VTLCBucl9wYWdlcyk7CiAJZm9saW9fYWRk X25ld19hbm9uX3JtYXAoZm9saW8sIHZtYSwgdm1mLT5hZGRyZXNzKTsKIAlmb2xpb19hZGRfbHJ1 X3ZtYShmb2xpbywgdm1hKTsKKworCWZvciAoaSA9IDA7IGkgPCBucl9wYWdlczsgaSsrKSB7CisJ CWVudHJ5ID0gbWtfcHRlKGZvbGlvX3BhZ2UoZm9saW8sIGkpLCB2bWEtPnZtX3BhZ2VfcHJvdCk7 CisJCWVudHJ5ID0gcHRlX3N3X21reW91bmcoZW50cnkpOworCQlpZiAodm1hLT52bV9mbGFncyAm IFZNX1dSSVRFKQorCQkJZW50cnkgPSBwdGVfbWt3cml0ZShwdGVfbWtkaXJ0eShlbnRyeSkpOwog c2V0cHRlOgotCWlmICh1ZmZkX3dwKQotCQllbnRyeSA9IHB0ZV9ta3VmZmRfd3AoZW50cnkpOwot CXNldF9wdGVfYXQodm1hLT52bV9tbSwgdm1mLT5hZGRyZXNzLCB2bWYtPnB0ZSwgZW50cnkpOwor CQlpZiAodWZmZF93cCkKKwkJCWVudHJ5ID0gcHRlX21rdWZmZF93cChlbnRyeSk7CisJCXNldF9w dGVfYXQodm1hLT52bV9tbSwgdm1mLT5hZGRyZXNzICsgUEFHRV9TSVpFICogaSwgdm1mLT5wdGUg KyBpLCBlbnRyeSk7CiAKLQkvKiBObyBuZWVkIHRvIGludmFsaWRhdGUgLSBpdCB3YXMgbm9uLXBy ZXNlbnQgYmVmb3JlICovCi0JdXBkYXRlX21tdV9jYWNoZSh2bWEsIHZtZi0+YWRkcmVzcywgdm1m LT5wdGUpOworCQkvKiBObyBuZWVkIHRvIGludmFsaWRhdGUgLSBpdCB3YXMgbm9uLXByZXNlbnQg YmVmb3JlICovCisJCXVwZGF0ZV9tbXVfY2FjaGUodm1hLCB2bWYtPmFkZHJlc3MgKyBQQUdFX1NJ WkUgKiBpLCB2bWYtPnB0ZSArIGkpOworCX0KIHVubG9jazoKIAlpZiAodm1mLT5wdGUpCiAJCXB0 ZV91bm1hcF91bmxvY2sodm1mLT5wdGUsIHZtZi0+cHRsKTsK --0000000000003ae8c205ff9f4fb5--