From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21168C19F2E for ; Thu, 27 Feb 2025 06:59:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8F66E6B0085; Thu, 27 Feb 2025 01:59:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 87F1C6B0088; Thu, 27 Feb 2025 01:59:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6F89A280001; Thu, 27 Feb 2025 01:59:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 4DCCD6B0085 for ; Thu, 27 Feb 2025 01:59:00 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id A84DE4F7F8 for ; Thu, 27 Feb 2025 06:58:59 +0000 (UTC) X-FDA: 83164822398.19.75A7F0C Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by imf21.hostedemail.com (Postfix) with ESMTP id ED1481C0006 for ; Thu, 27 Feb 2025 06:58:56 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=DsAmhJHe; spf=pass (imf21.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740639537; a=rsa-sha256; cv=none; b=b21teH5ptonxzaUMp9GMqaNeNi+Ll8TTl2u33QSA2KJ0x7Z3CWqz4hnlA3h1HPU1EFPKY0 fmPDSHpL9ECaPkxGeop9bXNLbaklbcxSHxCQD/Hx8y43iclQmDjHkw3TpQxj4ddf2Vui6a SDpqIrzT2a7SGcjigtIwm1WA4FCbP8I= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=DsAmhJHe; spf=pass (imf21.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740639537; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MfruRsY+T9lOEc8HQldVsXeT9YnNdxrpdhBm7o5Zvg8=; b=1rjX4S7GhE5LuRSJhoi8VY09AuwnzdjwrQWVFUZmYZRLxpTY8C/TpoIKrSqGyYA/mdmmD+ A/DvZDYYJsi3/IAbVWpvj+vs/PoRlofWVF+0D4tXToImpywUsiYmN8RFcA0BWZjxSXbl5o gTceebeY86qBHXPItp+/L21fYQVMFjY= Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-2230c74c8b6so16736395ad.0 for ; Wed, 26 Feb 2025 22:58:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1740639535; x=1741244335; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=MfruRsY+T9lOEc8HQldVsXeT9YnNdxrpdhBm7o5Zvg8=; b=DsAmhJHewZQTynI6iL64KDlPTMcPMimknqEc9B6RN7McT7BS7D792iXu9i9KQeEDxh r0yVGAAWfwsBUQJLutrH7XM880/WUTGKcL1KbI36FFB7CvFmNZ2hQHtmvA68uanMtwec EDG0ycCQUMjYVehU02G9GozRT9kiTA/HAd1umDSSm4WK1ISJuR7q+DxZSUa0ZXlEu7p1 /PxjtIWv5Pp5GyQS/9xzRSXFqJABBINfRUpt0Z7XiPu8BYOnKzX/KGD7lvCVSLbM5ePV LBFJ0ozv10cV8GCrr/NKrA8ecydk4v5uiwFjo2q35N3Ztme4YccRxNfxUExGo2gagAQs vCgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740639535; x=1741244335; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=MfruRsY+T9lOEc8HQldVsXeT9YnNdxrpdhBm7o5Zvg8=; b=EHWxG3zR4dpQoppB+YGxqtsYoGBjHYxQoKYZY1cIWJuQSlqdobwvuabpQDrW1v1DB1 LOQxgNiQtfl7xzL/EtlcwY47HpOQMO8QZqar0wwVxzdk4m8pR0T4D1WFtjyqEfeHYi0/ MGULkhAsz32jYddxYMsy1goUkAX+LKverysCeWKdwbtLn6GspBNWQgXA0J6rQ+8GoLiZ SLeRLeXy372M5pUxSUxfW8+YedbnWBav51BHq2BHogAdDUzjyJLVDINTPEiO2AUg8rEc WZV6J0Auf6VG07mG1xcOotadF34sw3ZkhikASu4q54Lql22TxsUxjraGKh9gyUxK7mDG 6oLg== X-Gm-Message-State: AOJu0YwU93lgLYk6x6WrHNXsDe5YCvcfVgy0YlJKGZHIdRyaewjaijhQ +y4k3mD0jVqR2/Q7M+AuLkI1ZcUEGvO4rkhOlL1NLGuOOMDv0VtcBc9GEk4VL9M= X-Gm-Gg: ASbGncurZ1LQUZ5Q4APG56GxOxSQJk91+ulpzRZ4OuB/YGnc5Mt4JYFIYUVPlP8ignO eWJcPxoK0a5L+yxSuXBphNB3zFMOllNH7lW8XD+HsPpI4H/abO5qYGpMfPnoEP5rnPygqWgpBgS AUjS8KlQYcyKEJVxpkI1MGtRsrM0M60iI6oq/ybm8sFdgu6mDZXmS8YocKzcxxgm8LfUZpgZOQP EHrB2agZbgv+tXDEpfT4yesdb/oan5Gc8qj5howmFYPKgoQTA3KTnKYzOWgwlvfdbV3jQk0eBLZ Q4ut/tUVjn+q1KCg0bZqv7O8B1axB0vvtNq97Br6mdbn7bomUw== X-Google-Smtp-Source: AGHT+IGUjTn00KPpX5mYfIBAiaODZc50JFY37y/Z4PuASFcLngKFz8EgyB5ISCL6z4hnVmppA/jlzA== X-Received: by 2002:a17:903:1984:b0:223:4b88:77ff with SMTP id d9443c01a7336-2234b887eb6mr25466455ad.6.1740639535586; Wed, 26 Feb 2025 22:58:55 -0800 (PST) Received: from [10.68.122.90] ([63.216.146.179]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2235050d7c1sm7272975ad.198.2025.02.26.22.58.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 26 Feb 2025 22:58:55 -0800 (PST) Message-ID: <6281ffc9-398e-44b9-a95c-2527004e09b7@bytedance.com> Date: Thu, 27 Feb 2025 14:58:50 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: CONFIG_PT_RECLAIM To: Johannes Weiner , peterz@infradead.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20250226183013.GB1042@cmpxchg.org> <20250227060820.GC110982@cmpxchg.org> From: Qi Zheng In-Reply-To: <20250227060820.GC110982@cmpxchg.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: ED1481C0006 X-Stat-Signature: hhuckyyei1qfyhwzg51npkabzcyr4tk3 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1740639536-39707 X-HE-Meta: U2FsdGVkX1+imJ0CsE1cdGs/pv/PQkkTGFzdvDgz2Vr9B3/BMtqQZLgEgbUIqtt91XPrOI6CZiDi2b54SIV1qmzWk3ZbJstOBWNAti79n3WBO1qZOe61zMIypJPTGDz1t0xXyzg7FPuq2lGhpdqhtf5uYKxPjb7jCaH0VpZGYolCI2fZrhu60Gy5cwPI9oM/sB896qw1HkxIOx+vitAA5OA+BHhAVCYI6Ls2qONzl0ixCZebE7o/Dlz9r+CPODxEb06sHzd7JkRNC42xicqKbKPYLP27D74qUAkxcoeVJXflyAgPgSH6huT/i01eyGyu4BGnky0tx8HpAfdRa8TA4GAppLuzPaYoHQ2VflMROiFsEuC81VeNLZN49Y01zUzfnAIyHlx9fNnUombVTAexQK/85k9wjQtmvkVwPuH7smWiwoVAg5LBmRxyFxgmOjMyWcOLHLtjSGpmBT/Hjj4ST0JSjtc28040mvq24jKQaFZ7i5bZUaAF2Zs/ajKxpHxuaetbUGU9W7j2viJKSTWKAZQDm3AogRoaWoZVPXvaYwfNP/DuEAjM8Y3i9SUHTGIkJu7aApiSb4R+3EoixI/TmVLAG0YTFnADXlES3ANHwld2kRaTaXb8+/aBeeyTn4R7yYA6VtarKPEEzrbbQQef1ZyqdOwlLwZ0n3cnYSAc7uIGEAKCtxpmNigumT9wTYL9XDoVE3ehBSLbjpbxdo08aig8lrgtVz3torecwb4VGBtQfC/g7ZzDo0MbXTnVQGYBTY1AvC6TyyPZ9ohWZbn7ALDSOUju6YoZOqHYbRqFpc37TJWjaZOo8WnuEO87NAj+ShP1JuEP69Zw1uxqEVzSEmKErC3Ybs8R/ONqcRwGckl1Vs295nrHQHDV5igGeuntvLbDDgpGmEVZOd+PNmR8JN71ijyRHzC4FqQM9oj3JTgMLRs1pMBaNcqSsVaDvJUym272tnm/6u/KWFBdEi2 AFGp7GmO MfBbtPydlHCaOfNW0NEhAfijDv4GTPrsgSR+xa4BXPv4yr3Ry1my9FZjv2f1vRXt1CyH9SqqScKmI/ReYRXutJUNo0FBGhXDgs57Gn41MJYvM+68ipLsH2/Kikr8DGj3ebvwxs3d5awuGsBLkkbtPEMNgjVIp0L3IOy/ANkK2mfPGeBegzFjkomJBjReQBhOYsYMDvaUZSTiVq4H2zN1ibw20twSCIqKnAAow2VszKEKGUsWnCWl3QD7MOe6zLstbZ9thgoE4DJnCrhYV8EFXWAOfqcr7UwFllGW4/zgKeL3BfrfjCo59yX2BR3WahQOzJHGK5BxvZKYbLSysnKt6bxLz0RzflBA5Zir9/V3V+S/B+mnxl1pKUb4aAMDYIoshQJ9bxzsGBWBrjK1xiTk2RjZz52FCwOIdfvv/ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Johannes, On 2/27/25 2:08 PM, Johannes Weiner wrote: > On Thu, Feb 27, 2025 at 11:04:51AM +0800, Qi Zheng wrote: >> Hi Johannes, >> >> On 2/27/25 2:30 AM, Johannes Weiner wrote: >>> Does PT_RECLAIM need to be configurable by the user? >> >> The PT_RECLAIM will select MMU_GATHER_RCU_TABLE_FREE, but not all archs >> support MMU_GATHER_RCU_TABLE_FREE, and even before Rik's a37259732a7dc >> ("x86/mm: Make MMU_GATHER_RCU_TABLE_FREE unconditional"), x86 only >> supports MMU_GATHER_RCU_TABLE_FREE in the case of PARAVIRT. >> >> Therefore, PT_RECLAIM also implies the meaning of enabling >> MMU_GATHER_RCU_TABLE_FREE, so I made it user-configurable. And I just >> thought that as a new feature, it would be better to give users the >> ability to turn it on and off. > > New *features*, yes - something that has a significant enough cost > that clearly not all users want to pay for the benefits. Got it. > > But it's hard to imagine anybody would WANT to keep the page tables > around if they madvised away all the pages inside of them. It's a > great optimization, what would be a reason to opt out? OK, now I think it makes sense to change it to 'def_bool y'. > >>> diff --git a/mm/Kconfig b/mm/Kconfig >>> index 2761098dbc1a..99383c93db33 100644 >>> --- a/mm/Kconfig >>> +++ b/mm/Kconfig >>> @@ -1309,16 +1309,9 @@ config ARCH_SUPPORTS_PT_RECLAIM >>> def_bool n >>> >>> config PT_RECLAIM >>> - bool "reclaim empty user page table pages" >>> - default y >>> + def_bool y >>> depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP >>> select MMU_GATHER_RCU_TABLE_FREE >>> - help >>> - Try to reclaim empty user page table pages in paths other than munmap >>> - and exit_mmap path. >>> - >>> - Note: now only empty user PTE page table pages will be reclaimed. >>> - >> >> Maybe keep the help information? > > I don't find it very helpful :( Which "other paths?" It doesn't > explain any pros and cons, and why anybody might choose to enable or > disable it. The Note repeats what's in the sentence before it. Sorry about that. :( > > Maybe I'm missing something. Could this not just be an #ifdef block > inside mm/madvise.c, instead of living inside a new file with two new > config symbols? > > #ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE > ... > #endif > > Is there an arch-specific feature that it requires besides > MMU_GATHER_RCU_TABLE_FREE such that only x86 supports it now? No, it only needs MMU_GATHER_RCU_TABLE_FREE. > > And why *does* it require MMU_GATHER_RCU_TABLE_FREE? Because in the madvise(MADV_DONTNEED) path, mmu_gather has been used to batch flush tlb and free physical pages. It is a better choice to free PTE pages in this ways as well. And because PT_RECLAIM needs rcu, we need MMU_GATHER_RCU_TABLE_FREE to make pte_free_tlb() free PTE pages through rcu. Of course, we also need to modify __tlb_remove_table_one() to make it use rcu as well. > > Documentation/mm/process_addrs.rst explains why you need rcu, but > there is free_pte_defer() that THP was using long before x86 needed > MMU_GATHER_RCU_TABLE_FREE. It seems to me if you could use that, this > feature would also work fine on architectures that do not generally > need RCU for flush & frees otherwise. So is the main issue that there As mentioned above, we want to flush & frees in batches, so we don't use pte_free_defer(). > just isn't an explicitly deferred variant of pte_free_tlb()? The pte_free_defer() seems to have been adapted to all archs, so I wonder if all archs can support MMU_GATHER_RCU_TABLE_FREE, so that pte_free_tlb() will always use rcu to free PTE pages. Maybe I missed something. +Peter. > > If so, this is a fairly non-obvious dependency that should be > documented. It would help somebody trying to port this to a !RCU > mmu_gather arch. > > And I apologize if all this was discussed before. But if it was, the > conclusions should be in the changelog or in code comments. This is a > very delicate synchronization scheme that I think deserves explicit > documentation somewhere.