From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4D9AC4332F for ; Tue, 12 Dec 2023 11:57:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4EE656B02B4; Tue, 12 Dec 2023 06:57:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 477BA6B02B5; Tue, 12 Dec 2023 06:57:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 340736B02B6; Tue, 12 Dec 2023 06:57:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 1F39D6B02B4 for ; Tue, 12 Dec 2023 06:57:52 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id EB863C096E for ; Tue, 12 Dec 2023 11:57:51 +0000 (UTC) X-FDA: 81558017142.22.38EB3A6 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf23.hostedemail.com (Postfix) with ESMTP id 119FC140006 for ; Tue, 12 Dec 2023 11:57:49 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf23.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702382270; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KbttHYcCQMUAFhzQk1KYeAk1liOe0GMFpG1NVms8NFI=; b=syh+rZVqo1S7mv3HX4pD/w3Tck8rn3ag40CqKxwRR6ApFMTDgMjA+gn8awfDhl+ImzFlng 9KvXockEOKLwmDDvMM2zenk3Gdg1F/QhK1y7kz1FrtS42u237e4cyGvFLlBq2DKaeddwfo h6OFLNF5YAGzsMiAKjmkD2hMr928xXU= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf23.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702382270; a=rsa-sha256; cv=none; b=q/mQGSj5hA+STWm4CDM+Bb2RKOl5W0pedSoKLJJ5lad6zwzwB0flBLI91+Ffr2nkUeX5Lf rO/3Dl8H4o/WxdrhTKXkmqm6pf+wVSw8bdEslKfmAJmc4lBe42h3qgRl1/yPCa3ds81BAo y+PAM3/K4qyOf2qwp8yYz0ybl7Gsu1Q= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 494A5143D; Tue, 12 Dec 2023 03:58:35 -0800 (PST) Received: from [10.1.39.183] (XHFQ2J9959.cambridge.arm.com [10.1.39.183]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9B5AE3F762; Tue, 12 Dec 2023 03:57:45 -0800 (PST) Message-ID: <97489e94-ea4e-40a3-9e56-d5f7d1219e81@arm.com> Date: Tue, 12 Dec 2023 11:57:44 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 02/15] mm: Batch-clear PTE ranges during zap_pte_range() Content-Language: en-GB To: Alistair Popple Cc: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , Barry Song <21cnbao@gmail.com>, Yang Shi , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20231204105440.61448-1-ryan.roberts@arm.com> <20231204105440.61448-3-ryan.roberts@arm.com> <87h6kta3ap.fsf@nvdebian.thelocal> From: Ryan Roberts In-Reply-To: <87h6kta3ap.fsf@nvdebian.thelocal> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 119FC140006 X-Stat-Signature: 3kj5py5mkg97xs7aio5z8q97bj3m5jbs X-HE-Tag: 1702382269-700929 X-HE-Meta: U2FsdGVkX1/uIp5I/j+CR1beS/PSOk6GGjzDJ8A7WUu8ICbjyplSj6C8gAUZNfWczhT7JOlebpcToC8tR7W7gs+8FD3djvaJ6rtCF2iZJ0bbpo27gZEl8Giu84l34QuVlmGSpedQU9jSt3D4uFuhReyA5AX4ADa/B2cYD4wM5wCDkXcrAYVczU4shqBmXSY5uBTMydn49LL47zXH0VU015UjcaN7WgODaIr5vtyKTconH14Zp1sibx8k65ubCcvXUe2kyG7jiOMjxsO/EUOGZUR//jG3/C4M261I97nmZKs4KxDho30p+DFB19ZHRllwwyJotNnT0HpxDu9y0U7clkSl7TVhRGjssv8OI2tPkeHdE0LBH0OPcgGN0HQXFbasdo60NYPHAEc9hAyhAgEvdfAmTceecDyMirD/R2u7rdsOMSHKJqheB43tzhXJplaPlHHlcAsgS208dAsqWjMHbPQT+oCBR9xMOE/HhxV4oUlxoqnR7tQLYqybsDIfV1ZMbFSrm2u0RbAjcz+KsimWxifylMMONvfKtkEptO22vh0ET/6FxisfZ1TumGcKtp6R8FXGVm+BincS8IPQoikkHmGlhaaj83nLA18z3U98XJajjt0VFUrlyzpqmDiF/4uqqBABTJPOigt460dT3zWKO5AgWZj2M0jcCfFgj9WegbFoCyELMv2Vjpnb9Em8sxi//ANo6W1oL10ikYBEpQRQ1LngwWIEZ3SAuc+40YXJ/URUthNONupa4mByx+huHHQ4GnenvCyVzjn3bGZ8GMKj2zui6j9ETLsDZJBEodEhoT8L2EWulb1fsX25Nt2WLRebNzP0ugElfHzR7d7JFKi7pADJ2/yS7BNumYKF02nzXB+GfJvOLtSG3lTTLFoxbviBvAkWasFAEkB6PN9sZY33j2RvsKy/7s4P2+c1v3BFm5jdOcRqrOtpAc+PcwQ6v1Mq+Rj7QtuAFy3UA8bRNze s80K01Xt 7gwDgO099eAtRueH8k/4Y1sRkPZYGkfPE2ThiWfm+jGGPcgqG+n9Au3XCl+7KKqlfKW5Lnc5SWvaL8FHwfMj+YACR1jZP9+NgVDsH3hpJ0vPSdMhHMJD+dsnhfKupkzFq3w/TBDl4GrRnCY2YNVTob+vMb4FvL4jR8/KIyZ1E1UtQ+UPdXE4/3TV5F9xpNMF6ytZZAb5w9nh1Ljr6mnpkiePwj25OaxULRGZg7k8IYzmGb2wfVdxVwOowM0Lz5k98qpScD/tPp/zYmiHUM1YIbljlYX3pPUkKCMRtzXLvC6omK6HXfVXgwtqP+wz0gCPsm8ifSIbcpW5VYx8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 08/12/2023 01:30, Alistair Popple wrote: > > Ryan Roberts writes: > >> Convert zap_pte_range() to clear a set of ptes in a batch. A given batch >> maps a physically contiguous block of memory, all belonging to the same >> folio. This will likely improve performance by a tiny amount due to >> removing duplicate calls to mark the folio dirty and accessed. And also >> provides us with a future opportunity to batch the rmap removal. >> >> However, the primary motivation for this change is to reduce the number >> of tlb maintenance operations that the arm64 backend has to perform >> during exit and other syscalls that cause zap_pte_range() (e.g. munmap, >> madvise(DONTNEED), etc.), as it is about to add transparent support for >> the "contiguous bit" in its ptes. By clearing ptes using the new >> clear_ptes() API, the backend doesn't have to perform an expensive >> unfold operation when a PTE being cleared is part of a contpte block. >> Instead it can just clear the whole block immediately. >> >> This change addresses the core-mm refactoring only, and introduces >> clear_ptes() with a default implementation that calls >> ptep_get_and_clear_full() for each pte in the range. Note that this API >> returns the pte at the beginning of the batch, but with the dirty and >> young bits set if ANY of the ptes in the cleared batch had those bits >> set; this information is applied to the folio by the core-mm. Given the >> batch is garranteed to cover only a single folio, collapsing this state > > Nit: s/garranteed/guaranteed/ > >> does not lose any useful information. >> >> A separate change will implement clear_ptes() in the arm64 backend to >> realize the performance improvement as part of the work to enable >> contpte mappings. >> >> Signed-off-by: Ryan Roberts >> --- >> include/asm-generic/tlb.h | 9 ++++++ >> include/linux/pgtable.h | 26 ++++++++++++++++ >> mm/memory.c | 63 ++++++++++++++++++++++++++------------- >> mm/mmu_gather.c | 14 +++++++++ >> 4 files changed, 92 insertions(+), 20 deletions(-) > > > >> diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c >> index 4f559f4ddd21..57b4d5f0dfa4 100644 >> --- a/mm/mmu_gather.c >> +++ b/mm/mmu_gather.c >> @@ -47,6 +47,20 @@ static bool tlb_next_batch(struct mmu_gather *tlb) >> return true; >> } >> >> +unsigned int tlb_get_guaranteed_space(struct mmu_gather *tlb) >> +{ >> + struct mmu_gather_batch *batch = tlb->active; >> + unsigned int nr_next = 0; >> + >> + /* Allocate next batch so we can guarrantee at least one batch. */ >> + if (tlb_next_batch(tlb)) { >> + tlb->active = batch; > > Rather than calling tlb_next_batch(tlb) and then undoing some of what it > does I think it would be clearer to factor out the allocation part of > tlb_next_batch(tlb) into a separate function (eg. tlb_alloc_batch) that > you can call from both here and tlb_next_batch(). As per my email against patch 1, I have some perf regressions to iron out for microbenchmarks; one issue is that this code forces the allocation of a page for a batch even when we are only modifying a single pte (which would previously fit in the embedded batch). So I've renamed this function to tlb_reserve_space(int nr). If it already has enough room, it will jsut return immediately. Else it will keep calling tlb_next_batch() in a loop until space has been allocated. Then after the loop we set tlb->active back to the original batch. Given the new potential need to loop a couple of times, and the need to build up that linked list, I think it works nicely without refactoring tlb_next_batch(). > > Otherwise I think this overall direction looks better than trying to > play funny games in the arch layer as it's much clearer what's going on > to core-mm code. > > - Alistair > >> + nr_next = batch->next->max; >> + } >> + >> + return batch->max - batch->nr + nr_next; >> +} >> + >> #ifdef CONFIG_SMP >> static void tlb_flush_rmap_batch(struct mmu_gather_batch *batch, struct vm_area_struct *vma) >> { >