From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92727D5E146 for ; Fri, 8 Nov 2024 07:38:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CE3DB6B0089; Fri, 8 Nov 2024 02:38:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C77786B00A1; Fri, 8 Nov 2024 02:38:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B0D046B00A2; Fri, 8 Nov 2024 02:38:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 8ADA56B0089 for ; Fri, 8 Nov 2024 02:38:55 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id EA7FBABC6C for ; Fri, 8 Nov 2024 07:38:54 +0000 (UTC) X-FDA: 82762125096.30.B66EBDF Received: from mail-pg1-f176.google.com (mail-pg1-f176.google.com [209.85.215.176]) by imf21.hostedemail.com (Postfix) with ESMTP id C34481C0007 for ; Fri, 8 Nov 2024 07:37:42 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=kXUPtEVM; spf=pass (imf21.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.215.176 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731051447; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mu6NYk5h6l9e4x73wJ0Oq7+ktjt603TFHhfxu3JPm2w=; b=RLzpxYhIJUH2uEE4IyqiY/ZWoMe/uvBRnBBev6t0UHoQ7rnrqREqaafOFidtg1lw57HUsf IGLpO+toWOE7y5uwtsyRDkeQnaMINMzOP2+CnDKXR+AyG2YnQpeESq27nFz6YShOjUS3sv RQcVxLOhTlUP8jNPijg7xV8oEBNdIjE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731051447; a=rsa-sha256; cv=none; b=aH13sEEPrDjUsfv8KngvXLrfelYmB32SBc/mozSYHHF8RRZSePmbowlpAVjghEfNDk4289 CU8yr3XS3NpYwqPlDy78hhrrRL2MD20NiSj7Mx4n2f5a6S5TkKcf9wBAwwhM/JPBomEt7Y 0ciK71OAnqy8uRNq4GPQaSxjaDuKDsQ= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=kXUPtEVM; spf=pass (imf21.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.215.176 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com Received: by mail-pg1-f176.google.com with SMTP id 41be03b00d2f7-7ea8de14848so1334776a12.2 for ; Thu, 07 Nov 2024 23:38:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1731051530; x=1731656330; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=mu6NYk5h6l9e4x73wJ0Oq7+ktjt603TFHhfxu3JPm2w=; b=kXUPtEVMgl10U3aNrCbsfkKU4qkARsFkzmKsPCntcG9VQuTgeMIQN2UTfDI4LrA1Nm POtCzAK3x9JjUJ59BS3BM9/bFJbGt2N8RYI/f/7AQ0KICk21zhcNTiF6ZkjdSCKqJUgW yZEiBnB54ftR15NOYEGcn0/R7/ltMQhfdlOvkVWJ0aJmhMqbIDqQHGuimp2r6YwBbFqy WsR5jwGaHuZpXRzZbicxjZDk+5mGWWMM5VcnMzCMPR0/iA/UE8TAGS/IhAQZdiHt22LT swfw0lGM9ZN4Aun/3rHwDcWgEnGkTRGBGcGUaVpgxP6qL6PWhsxhQGOehaIr/KQuQnLj XOgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731051530; x=1731656330; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mu6NYk5h6l9e4x73wJ0Oq7+ktjt603TFHhfxu3JPm2w=; b=MLwqccIE/4CH0vWuMa1M2nia8mcD64CEhPh9irWxp3N9JJIMx/C1BjQ776gobYJFNQ IPn2nWS3UEPSlq6yJAOU5LlRCJQNeU0ewezmrd5GfPiE7pm5HTdLeGBFZVQVDSNUeZSO uj6SS4JPo2LXyNzPtjVdbr/sF5pxPa0UrNOrGZluoQMMwIHTYDYJNR8NXOvrWcpaYrYz BJGHkpKfGbzwnNhPEqU9IunUFAI4FOXOKfMwr9DtqQ23BvqHwIUXygQRerf84yxMzg5X pdnknkmPmdHzID84jN5yYcGQkIFD41jLNsQ/ysKQ3YsSYNXifcbASpp1NM9s+KVxxUxW f3Fw== X-Forwarded-Encrypted: i=1; AJvYcCUB4ZUZdYxBbzMR0zBI/5vlaEqC9PXPlyvadWxLBewJV/fv+O+6mne/5iq0GP4xpi8ncPPOFjymfQ==@kvack.org X-Gm-Message-State: AOJu0Yy29cP8ykiCGv8osgQ4IdETPf7mmBENA9ACK80oTjSRUE0hVNNL FdM7Vm4VZSy8FxjOOG3hG8XxGKAvuxGRI5OrCK4NduIsyqYdupcqYvj+0AtPH0Y= X-Google-Smtp-Source: AGHT+IEMuiK4oRALWDBrVch+hPLKui8J9k1Rw8yVjg/V/fJA5pCZyDIryCWFFbqE8aW1H8eSGsevHw== X-Received: by 2002:a05:6a20:72ab:b0:1db:d8fe:9d4 with SMTP id adf61e73a8af0-1dc229a6b5dmr2373750637.16.1731051530278; Thu, 07 Nov 2024 23:38:50 -0800 (PST) Received: from [10.84.149.95] ([63.216.146.178]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-724078a7c50sm3077427b3a.60.2024.11.07.23.38.43 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 07 Nov 2024 23:38:49 -0800 (PST) Message-ID: Date: Fri, 8 Nov 2024 15:38:41 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 6/7] x86: mm: free page table pages by RCU instead of semi RCU Content-Language: en-US To: Jann Horn Cc: Dave Hansen , Andy Lutomirski , Peter Zijlstra , david@redhat.com, hughd@google.com, willy@infradead.org, mgorman@suse.de, muchun.song@linux.dev, vbabka@kernel.org, akpm@linux-foundation.org, zokeefe@google.com, rientjes@google.com, peterx@redhat.com, catalin.marinas@arm.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org References: From: Qi Zheng In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam10 X-Stat-Signature: ciehki5h4k8jq4wzep4cim6i8799nzfq X-Rspamd-Queue-Id: C34481C0007 X-Rspam-User: X-HE-Tag: 1731051462-354280 X-HE-Meta: U2FsdGVkX19W3Tlesp44/1XDGgKDzb7xrpFMMfufNZsd44K8H0tOQKxwqCiVlyyDoW4evLk+lc3y1stmuCix1Y1/dSCvs7FAHzVrdbZpdraolz5oa3KEevc9ggT9iOj3+T7CLMrUrBRKA2SERU0l9yl3diW5wy7D3qDSe7+KX865eaWnhEYuF1ULvrt43ln714KoKborWkc9Ps7Ksgo2XmV0w5dSFX1O1kkbh9w3/FyAsVqoVbD+n2tClsbX9Au94I9g428qWTFbxW68REEYbS2cCBtWlcRO1cXEQxxUD/Rv+XBCagQCtw22SnxT4/w261y2ASNX30blDQfP2H18ESnHm9DRnlhCRh6bFF45BSiOo8Ij0B4JngNbZ3Rq3SDML9fGXGxciSmHbdXzR3PBxaxmwk2J3HvRG8C5NLOyKNo49xaeyQmU2RyjhHMBmfq4vjAuhH9ZS/4pXod7WYyg/fUdyDVibvQp9iYx7hP2dCF5f8OVmh1t8wz0roHqbIrE3TZUT5lEUPR2H5LUdkPUCN0TTdOumkmSDKnrlMgnUfeuATxj/Pl1qgH+FeZ0KNTbqfbTB2z86nNfYqVEAn2Q6XbxYsrH8h5AR6k19GVtWJg0B9mGEHYUlVpgcUuKBtza0NvvbawmTDl2dGEfG0j/9DSodh1SNPb/jukNMpVdnjuOojNo/B75HixW6cVUqWHxY1StDZw+7DziRjyfEuNMbWoaxPe+PBrtHj0z9yK3Y1AUdrspXvvw8rj4VEvulEs56F3lTO7B8jMjR7EXPO1I1ZmPjDJpscCWFjHXShFHpDr1UmDQJJxgQQv/1azobZ/X/F6hQENaG37sgXpJxcFpTS9aSyzsSNYV1RnxJQGlNVKHLPMnipNpqZ8kYAF3hBd0JZEu5woTUAdRixb7/pIy2IifrY0ewilia79mRMVBmg9BdehmcyGJIKfWhu+hFfYLg9RyrO8YYUcPsnlM9jf C5hnwdq2 4+uL+Yu+OAvGUZ6V1oSPGuR1ozaM99ikJYVXJEhx2XU1UyXQhweIE0DkkbjNZW8Mg78F5uSRXPBAS8z3DDCk202ms+eGFiu5nqkkVURqx12p9zFWzNxLuxoTJ6svcNcBu0K7vHrwVLhmnf3SBW17Vt70/jYEbdhC4O2OyXXpbNRCPCCDS1SWAO03R9HnRK+f26gJVL91PW0RPy+yLYKwex6yhf29nbOmZIxvfy5n3UkZ3WdR6Z6mgnB2oCc1crYPxJltbKYxTlw0j5/UFxykEoubmtffGJlpOgr5uENHQ/OH6mk2Wvzj2U6otxQUjM9gLOz906YfGWDF/MgHAtIZzR9iCv2813skxq0A+WoidNQbLuZvWXjtVootK8uh3GnDTCJCnZ5n0asAWqgAOQYJwQSwLwWwikW9rP/l8 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Jann, On 2024/11/8 06:39, Jann Horn wrote: > +x86 MM maintainers - x86@kernel.org was already cc'ed, but I don't > know if that is enough for them to see it, and I haven't seen them > comment on this series yet; I think you need an ack from them for this > change. Yes, thanks to Jann for cc-ing x86 MM maintainers, and look forward to their feedback! > > On Thu, Oct 31, 2024 at 9:14 AM Qi Zheng wrote: >> Now, if CONFIG_MMU_GATHER_RCU_TABLE_FREE is selected, the page table pages >> will be freed by semi RCU, that is: >> >> - batch table freeing: asynchronous free by RCU >> - single table freeing: IPI + synchronous free >> >> In this way, the page table can be lockless traversed by disabling IRQ in >> paths such as fast GUP. But this is not enough to free the empty PTE page >> table pages in paths other that munmap and exit_mmap path, because IPI >> cannot be synchronized with rcu_read_lock() in pte_offset_map{_lock}(). >> >> In preparation for supporting empty PTE page table pages reclaimation, >> let single table also be freed by RCU like batch table freeing. Then we >> can also use pte_offset_map() etc to prevent PTE page from being freed. > > I applied your series locally and followed the page table freeing path > that the reclaim feature would use on x86-64. Looks like it goes like > this with the series applied: Yes. > > free_pte > pte_free_tlb > __pte_free_tlb > ___pte_free_tlb > paravirt_tlb_remove_table > tlb_remove_table [!CONFIG_PARAVIRT, Xen PV, Hyper-V, KVM] > [no-free-memory slowpath:] > tlb_table_invalidate > tlb_remove_table_one > tlb_remove_table_sync_one [does IPI for GUP-fast] ^ It seems that this step can be ommitted when CONFIG_PT_RECLAIM is enabled, because GUP-fast will disable IRQ, which can also serve as the RCU critical section. > __tlb_remove_table_one [frees via RCU] > [fastpath:] > tlb_table_flush > tlb_remove_table_free [frees via RCU] > native_tlb_remove_table [CONFIG_PARAVIRT on native] > tlb_remove_table [see above] > > Basically, the only remaining case in which > paravirt_tlb_remove_table() does not use tlb_remove_table() with RCU > delay is !CONFIG_PARAVIRT && !CONFIG_PT_RECLAIM. Given that > CONFIG_PT_RECLAIM is defined as "default y" when supported, I guess > that means X86's direct page table freeing path will almost never be > used? If it stays that way and the X86 folks don't see a performance > impact from using RCU to free tables on munmap() / process exit, I > guess we might want to get rid of the direct page table freeing path > on x86 at some point to simplify things... In theory, using RCU to asynchronously free PTE pages should make munmap() / process exit path faster. I can try to grab some data. > > (That simplification might also help prepare for Intel Remote Action > Request, if that is a thing people want...) If so, even better! > >> Like pte_free_defer(), we can also safely use ptdesc->pt_rcu_head to free >> the page table pages: >> >> - The pt_rcu_head is unioned with pt_list and pmd_huge_pte. >> >> - For pt_list, it is used to manage the PGD page in x86. Fortunately >> tlb_remove_table() will not be used for free PGD pages, so it is safe >> to use pt_rcu_head. >> >> - For pmd_huge_pte, we will do zap_deposited_table() before freeing the >> PMD page, so it is also safe. > > Please also update the comments on "struct ptdesc" accordingly. OK, will do. > >> Signed-off-by: Qi Zheng >> --- >> arch/x86/include/asm/tlb.h | 19 +++++++++++++++++++ >> arch/x86/kernel/paravirt.c | 7 +++++++ >> arch/x86/mm/pgtable.c | 10 +++++++++- >> mm/mmu_gather.c | 9 ++++++++- >> 4 files changed, 43 insertions(+), 2 deletions(-) >> >> diff --git a/arch/x86/include/asm/tlb.h b/arch/x86/include/asm/tlb.h >> index 580636cdc257b..e223b53a8b190 100644 >> --- a/arch/x86/include/asm/tlb.h >> +++ b/arch/x86/include/asm/tlb.h >> @@ -34,4 +34,23 @@ static inline void __tlb_remove_table(void *table) >> free_page_and_swap_cache(table); >> } >> >> +#ifdef CONFIG_PT_RECLAIM >> +static inline void __tlb_remove_table_one_rcu(struct rcu_head *head) >> +{ >> + struct page *page; >> + >> + page = container_of(head, struct page, rcu_head); >> + free_page_and_swap_cache(page); >> +} > > Why free_page_and_swap_cache()? Page tables shouldn't have swap cache, > so I think something like put_page() would do the job. Ah, I just did the same thing as __tlb_remove_table(). But I also have the same doubt as you, why does __tlb_remove_table() need to call free_page_and_swap_cache() instead of put_page(). Thanks, Qi > >> +static inline void __tlb_remove_table_one(void *table) >> +{ >> + struct page *page; >> + >> + page = table; >> + call_rcu(&page->rcu_head, __tlb_remove_table_one_rcu); >> +} >> +#define __tlb_remove_table_one __tlb_remove_table_one >> +#endif /* CONFIG_PT_RECLAIM */ >> + >> #endif /* _ASM_X86_TLB_H */ >> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c >> index fec3815335558..89688921ea62e 100644 >> --- a/arch/x86/kernel/paravirt.c >> +++ b/arch/x86/kernel/paravirt.c >> @@ -59,10 +59,17 @@ void __init native_pv_lock_init(void) >> static_branch_enable(&virt_spin_lock_key); >> } >> >> +#ifndef CONFIG_PT_RECLAIM >> static void native_tlb_remove_table(struct mmu_gather *tlb, void *table) >> { >> tlb_remove_page(tlb, table); >> } >> +#else >> +static void native_tlb_remove_table(struct mmu_gather *tlb, void *table) >> +{ >> + tlb_remove_table(tlb, table); >> +} >> +#endif >> >> struct static_key paravirt_steal_enabled; >> struct static_key paravirt_steal_rq_enabled; >> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c >> index 5745a354a241c..69a357b15974a 100644 >> --- a/arch/x86/mm/pgtable.c >> +++ b/arch/x86/mm/pgtable.c >> @@ -19,12 +19,20 @@ EXPORT_SYMBOL(physical_mask); >> #endif >> >> #ifndef CONFIG_PARAVIRT >> +#ifndef CONFIG_PT_RECLAIM >> static inline >> void paravirt_tlb_remove_table(struct mmu_gather *tlb, void *table) >> { >> tlb_remove_page(tlb, table); >> } >> -#endif >> +#else >> +static inline >> +void paravirt_tlb_remove_table(struct mmu_gather *tlb, void *table) >> +{ >> + tlb_remove_table(tlb, table); >> +} >> +#endif /* !CONFIG_PT_RECLAIM */ >> +#endif /* !CONFIG_PARAVIRT */ >> >> gfp_t __userpte_alloc_gfp = GFP_PGTABLE_USER | PGTABLE_HIGHMEM; >> >> diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c >> index 99b3e9408aa0f..d948479ca09e6 100644 >> --- a/mm/mmu_gather.c >> +++ b/mm/mmu_gather.c >> @@ -311,10 +311,17 @@ static inline void tlb_table_invalidate(struct mmu_gather *tlb) >> } >> } >> >> +#ifndef __tlb_remove_table_one >> +static inline void __tlb_remove_table_one(void *table) >> +{ >> + __tlb_remove_table(table); >> +} >> +#endif >> + >> static void tlb_remove_table_one(void *table) >> { >> tlb_remove_table_sync_one(); >> - __tlb_remove_table(table); >> + __tlb_remove_table_one(table); >> } >> >> static void tlb_table_flush(struct mmu_gather *tlb) >> -- >> 2.20.1 >>