From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD084C61DA4 for ; Wed, 15 Mar 2023 15:26:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D71866B0071; Wed, 15 Mar 2023 11:26:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D22156B0072; Wed, 15 Mar 2023 11:26:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BEAB16B0075; Wed, 15 Mar 2023 11:26:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id AD97B6B0071 for ; Wed, 15 Mar 2023 11:26:21 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 7DE87811EE for ; Wed, 15 Mar 2023 15:26:21 +0000 (UTC) X-FDA: 80571508962.11.2460A9F Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf03.hostedemail.com (Postfix) with ESMTP id 6C82720011 for ; Wed, 15 Mar 2023 15:26:18 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=none; spf=pass (imf03.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678893978; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8uGJ2oaVzqUBU2uWK2qporYNO4sl1ppgj/+PhzIWOTc=; b=pXHMTbPjG0QI3EgiDtIgVrsCMaCAY3OHawbn/SBqNHlzdUo9UIH+1LuISTTkMleqK/UXHq VsB/bmLH9Flm5RjGzjwAa+Vg9aaxWyopah2UdlyJ8YxcFLNpyK0yoC2KTuE3g/y3+BoxDA p3INmR1pYTIsQ6fe1sCJ4SuHAEk4x9I= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=none; spf=pass (imf03.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678893978; a=rsa-sha256; cv=none; b=AyLiMzZV8H90kgMAjlfFe3G+yopP/wWaO/wHgqkM/2PfjZgrEhTSCLPd1Bheg5QBXi9Ovl UcWEwIEHXwknlCqH2OpfJsZrN+09jhREC9Nl5n6ZxEgaVaKfr+GcfS8ioLoWaRKVUZZzoD bh7nP1g0jX/lniAnu21o+MGRiMoVwbU= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0B2044B3; Wed, 15 Mar 2023 08:27:01 -0700 (PDT) Received: from [10.57.64.236] (unknown [10.57.64.236]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 777913F67D; Wed, 15 Mar 2023 08:26:16 -0700 (PDT) Message-ID: <6dd5cdf8-400e-8378-22be-994f0ada5cc2@arm.com> Date: Wed, 15 Mar 2023 15:26:14 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Subject: Re: [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range() Content-Language: en-US To: "Matthew Wilcox (Oracle)" , linux-arch@vger.kernel.org Cc: Yin Fengwei , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20230315051444.3229621-1-willy@infradead.org> <20230315051444.3229621-36-willy@infradead.org> From: Ryan Roberts In-Reply-To: <20230315051444.3229621-36-willy@infradead.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: ahtwoi9ozqox1ew7n5k1uzwr38g54d4i X-Rspamd-Queue-Id: 6C82720011 X-HE-Tag: 1678893978-759823 X-HE-Meta: U2FsdGVkX1/KbyLuuDKjtI7nPU/WcVaKD2DC9959Zu1sJgXWAFCnvwd4QEX5kjg2t6g6SsXhX/bLeui6ZFpBeV5cl46ANWgV7En9409bD3LklHPB6S0urvJcZK3bXM0EKLTkuSTK16dHa9o07qJIvlFWqizwmZ5K72x9zoXRwU07eIuOv+cR8f7wdU3S5jwJRR/eBysP6/Zx1e4REsMnvnt9ZpKCadr//W2MwB0tyYs5MHmrdbG07eh63Pq+51bEJEKLZ60NV7lneHarBJ/ajudAD/vuUWvHwJQWEqzAdbQ22bPMnlMnupgQ4ZAa9Sgyk1b4C/C9NYEMVB29m7196fmIg37o8ChlK4rzkPDlL72DgEiHsOnyrwZBbzVOFiiqS93rnolGER+vAnDMwXWaqNzBWk+r4RqPh2u84G24+45QLLCtrOP6XLcUiKFkwAaHPnVWLGAwKJR8uXAN3TK0mNNrHUCd8QgmEOQGUQFFPhIXp80KwD8XsZhhF9EQJUXJ3yOqu3R//SwMie+3ZdlmPn/u8vfOpRUeKcrx36ZYfHPYV6ssaHWu6QdLzjGcjgrHQXuVlbinfPq297ZDPy3dPIUGiWxx1oMMoBiKuOhZrXRGO1+lAdDLiWMQ1RoYsMYQ9DsBs1wpaN81x5oWSVTkb6Ke+DjWTIt+cJqQFoHLp3FU2X9Ch+Uro9Pt0Q3/kAlDMSAv0EYmVFtXVG1iCprNoajjfNnDxv4M4GFyugjhjg94cxJOxPOsiXx8e3uChhkRxZzd8dvA9eixDzmXuaJLQpLmlaz3rnUz4EBKUAsH+ahlWDBqqE06dvqGlPBFqiEWyfVkEx2KXjOR8jI5bZ5s6Ap0El7GgLRjuNL18DwIt6oQ4J5NZuLhKDUsGuFzBkWbDyIMVjOGuIrq6qbnzSduGNocKQDTvnNzYTjOKfch4gXlVbVXQjEyP5RRZtKWkPGbHvB2aFogOTPg6hiVUJ0 +F7/mvHg E9LGUkHQakPfoYoO5490JBg3cCdxlaSLOenCsTS5wPYZa47l8RCbcEWxdxGfSnO/jLagTCnCsveAml1lAzVrfxnl5u+kVewJhw0uRLTNFQkeQiCVDpj5VRpODpaeF85NIH/pWzp95JQ83OE7Qteh01AmUlrW7447jJFWgoOwI1wlO5WZdYCouc3UxEgoriYFSbmMksWEYmLSXh1ViKejtvlumpqWUfvJV3d1DVJ5rStNZEy6Ctdn+sSMemLDFPZJ2XII1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 15/03/2023 05:14, Matthew Wilcox (Oracle) wrote: > From: Yin Fengwei > > set_pte_range() allows to setup page table entries for a specific > range. It takes advantage of batched rmap update for large folio. > It now takes care of calling update_mmu_cache_range(). > > Signed-off-by: Yin Fengwei > Signed-off-by: Matthew Wilcox (Oracle) > --- > Documentation/filesystems/locking.rst | 2 +- > include/linux/mm.h | 3 ++- > mm/filemap.c | 3 +-- > mm/memory.c | 27 +++++++++++++++------------ > 4 files changed, 19 insertions(+), 16 deletions(-) > > diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst > index 7de7a7272a5e..922886fefb7f 100644 > --- a/Documentation/filesystems/locking.rst > +++ b/Documentation/filesystems/locking.rst > @@ -663,7 +663,7 @@ locked. The VM will unlock the page. > Filesystem should find and map pages associated with offsets from "start_pgoff" > till "end_pgoff". ->map_pages() is called with page table locked and must > not block. If it's not possible to reach a page without blocking, > -filesystem should skip it. Filesystem should use do_set_pte() to setup > +filesystem should skip it. Filesystem should use set_pte_range() to setup > page table entry. Pointer to entry associated with the page is passed in > "pte" field in vm_fault structure. Pointers to entries for other offsets > should be calculated relative to "pte". > diff --git a/include/linux/mm.h b/include/linux/mm.h > index ee755bb4e1c1..81788c985a8c 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1299,7 +1299,8 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) > } > > vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page); > -void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr); > +void set_pte_range(struct vm_fault *vmf, struct folio *folio, > + struct page *page, unsigned int nr, unsigned long addr); > > vm_fault_t finish_fault(struct vm_fault *vmf); > vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf); > diff --git a/mm/filemap.c b/mm/filemap.c > index 6e2b0778db45..e2317623dcbf 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -3504,8 +3504,7 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf, > ret = VM_FAULT_NOPAGE; > > ref_count++; > - do_set_pte(vmf, page, addr); > - update_mmu_cache(vma, addr, vmf->pte); > + set_pte_range(vmf, folio, page, 1, addr); > } while (vmf->pte++, page++, addr += PAGE_SIZE, ++count < nr_pages); > > /* Restore the vmf->pte */ > diff --git a/mm/memory.c b/mm/memory.c > index 6aa21e8f3753..9a654802f104 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4274,7 +4274,8 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page) > } > #endif > > -void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr) > +void set_pte_range(struct vm_fault *vmf, struct folio *folio, > + struct page *page, unsigned int nr, unsigned long addr) > { > struct vm_area_struct *vma = vmf->vma; > bool uffd_wp = vmf_orig_pte_uffd_wp(vmf); > @@ -4282,7 +4283,7 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr) > bool prefault = vmf->address != addr; I think you are changing behavior here - is this intentional? Previously this would be evaluated per page, now its evaluated once for the whole range. The intention below is that directly faulted pages are mapped young and prefaulted pages are mapped old. But now a whole range will be mapped the same. Thanks, Ryan > pte_t entry; > > - flush_icache_page(vma, page); > + flush_icache_pages(vma, page, nr); > entry = mk_pte(page, vma->vm_page_prot); > > if (prefault && arch_wants_old_prefaulted_pte()) > @@ -4296,14 +4297,18 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr) > entry = pte_mkuffd_wp(entry); > /* copy-on-write page */ > if (write && !(vma->vm_flags & VM_SHARED)) { > - inc_mm_counter(vma->vm_mm, MM_ANONPAGES); > - page_add_new_anon_rmap(page, vma, addr); > - lru_cache_add_inactive_or_unevictable(page, vma); > + add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr); > + VM_BUG_ON_FOLIO(nr != 1, folio); > + folio_add_new_anon_rmap(folio, vma, addr); > + folio_add_lru_vma(folio, vma); > } else { > - inc_mm_counter(vma->vm_mm, mm_counter_file(page)); > - page_add_file_rmap(page, vma, false); > + add_mm_counter(vma->vm_mm, mm_counter_file(page), nr); > + folio_add_file_rmap_range(folio, page, nr, vma, false); > } > - set_pte_at(vma->vm_mm, addr, vmf->pte, entry); > + set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr); > + > + /* no need to invalidate: a not-present page won't be cached */ > + update_mmu_cache_range(vma, addr, vmf->pte, nr); > } > > static bool vmf_pte_changed(struct vm_fault *vmf) > @@ -4376,11 +4381,9 @@ vm_fault_t finish_fault(struct vm_fault *vmf) > > /* Re-check under ptl */ > if (likely(!vmf_pte_changed(vmf))) { > - do_set_pte(vmf, page, vmf->address); > - > - /* no need to invalidate: a not-present page won't be cached */ > - update_mmu_cache(vma, vmf->address, vmf->pte); > + struct folio *folio = page_folio(page); > > + set_pte_range(vmf, folio, page, 1, vmf->address); > ret = 0; > } else { > update_mmu_tlb(vma, vmf->address, vmf->pte);