From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2B09C369DC for ; Thu, 1 May 2025 19:36:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5DB3B6B0088; Thu, 1 May 2025 15:36:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 563176B0089; Thu, 1 May 2025 15:36:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 429966B008A; Thu, 1 May 2025 15:36:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 1AFFD6B0088 for ; Thu, 1 May 2025 15:36:31 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A780E1D0231 for ; Thu, 1 May 2025 19:36:32 +0000 (UTC) X-FDA: 83395345824.16.200C4D3 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by imf22.hostedemail.com (Postfix) with ESMTP id 6DA53C0004 for ; Thu, 1 May 2025 19:36:29 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei-partners.com; spf=pass (imf22.hostedemail.com: domain of gutierrez.asier@huawei-partners.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=gutierrez.asier@huawei-partners.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746128190; a=rsa-sha256; cv=none; b=COFU8fUhumzHKcse6s4s04uNnAjSd1EUnrpgEsR1YE3v4bTLLuOSTGJSTbo97Kg8aWQtiW RBGgHRyyt0XqajkAqxNDx8xwHcuVI5LesTSHs5JKM/0MBkgRMKCI6iZjjL3Y5CLkgR0XAk i3L3Td0cgSyBCczmY1QHPDeivCXteAM= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei-partners.com; spf=pass (imf22.hostedemail.com: domain of gutierrez.asier@huawei-partners.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=gutierrez.asier@huawei-partners.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746128190; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RX2n1H7ZF+NFERYH0AhhiICyUbC1mMxz5DKB1zkUWzQ=; b=ogpZK8WV+M/XEpik8DRH6RhOhP60dWoDTYnHu5SkixIcm9NXyY6FUS4PQrGT2OhX9QtRWX 2/7IdygwTFtw0PMTKM2RKnYLFUjSh+gVn1253Wq51TAYoqfA/oocOdV/T/IXySQHizeHqd +ezrAWYBv6BTA5yIPkgvzEPYutqtwBI= Received: from mail.maildlp.com (unknown [172.18.186.216]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4ZpPPg17pbz6L4wk; Fri, 2 May 2025 03:34:15 +0800 (CST) Received: from mscpeml500003.china.huawei.com (unknown [7.188.49.51]) by mail.maildlp.com (Postfix) with ESMTPS id 785E514038F; Fri, 2 May 2025 03:36:24 +0800 (CST) Received: from [10.123.123.154] (10.123.123.154) by mscpeml500003.china.huawei.com (7.188.49.51) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Thu, 1 May 2025 22:36:23 +0300 Message-ID: <6850ac3f-af96-4cc6-9dd0-926dd3a022c9@huawei-partners.com> Date: Thu, 1 May 2025 22:36:23 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH 0/4] mm, bpf: BPF based THP adjustment To: Zi Yan , Johannes Weiner CC: Yafang Shao , "Liam R. Howlett" , , , , , David Hildenbrand , Baolin Wang , Lorenzo Stoakes , Nico Pache , Ryan Roberts , Dev Jain , , , Michal Hocko References: <20250429024139.34365-1-laoar.shao@gmail.com> <42ECBC51-E695-4480-A055-36D08FE61C12@nvidia.com> <8F000270-A724-4536-B69E-C22701522B89@nvidia.com> <20250430174521.GC2020@cmpxchg.org> <84DE7C0C-DA49-4E4F-9F66-E07567665A53@nvidia.com> Content-Language: en-US From: Gutierrez Asier In-Reply-To: <84DE7C0C-DA49-4E4F-9F66-E07567665A53@nvidia.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.123.123.154] X-ClientProxiedBy: mscpeml100004.china.huawei.com (7.188.51.133) To mscpeml500003.china.huawei.com (7.188.49.51) X-Rspamd-Queue-Id: 6DA53C0004 X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: h4np7ifzdaxk1itqm9bgp4rkn414hz1j X-HE-Tag: 1746128189-213091 X-HE-Meta: U2FsdGVkX19yYpKmuR2sFRd2rFZSTetfVeEEcf/8zQd7y/GWP6+1kv8VEfVORbq01cSxKnfHnybReOzH23gcoh3aCRQCHSqjfwZ7KWU1sYL3OrgZhGXlIUheJXZjO0FhH8YwqhLbzLXxylHRCVxGXsAwC0i2qRwiDoyTYpTYM0ydQsJwH2CKMUhqkEM/yhCrn75ayLv/E5DTcWQIJGjULOFhYS/xJsnoPQNNAcQjiCR8PR+PVZrHflxowuXI+3Io6BNpjWBR2nq12e8wuC8X0UU91TToOur0LzYq5uhvjIlgL5cRV4RvbG6KhXVoaVVB5ALfiWQhXy//iMMpWYJPz1Xq26lVb0qoSVTK5zm6cs9KxJwq5rIkd8rDizTAXJ0A/huW1BQk0zJbxWkioS67tYzJySfPtxb0IBI3gmjftrFgKHkfgn2jMpp0eGHNiezd+I2NW7uzlVj33gxDY2AeK4nHGWIiY4sPAcfqXabbROPsaluZuqpGrF2jOKVEtMVz9bmF4KDwE8QrRkihy7nNJeJzYI68n5jTN3bXiMDUuaOOhkr9XE6e7rHM/nTw/hT/euPUFCHjYKceV6BUOLuyyd1bWp2xNBI4RE+5hNsKWaQ2U7dCtr/4YdDQa1u2Q3czJwdJMFPjF3WtJrYu2OYFTE0J61Rj219R8Y4q2oO5+ktaIfszXV9l1taQT6TQcfCLVUlKtQJZIyNGTZA3JCG/snHlW0bAzUhoJz+1VittjjRCm4lKfkFMYE4tyFvklfSBZsLwL4pq5ZZjsEGmaiMybka02THmkm6zjaG6vPsoQPoBv2Hdnoc0xU/Jz4h7mGKcjP+Y3dwDLpt7xKHoNmLTD2SRVMuldKYT95Wdh2X03sQ/GYn8NhTELnlXy+uwwQwaJBAs1hAGPorSSIcrgOkBtp/oGhXYmJtHEmVgLq4uhZSoPEOobzRsDOIiE5sVxsGrPeXNUdMy48jEHUcE67N VNl1+nUF 1p4Lp66GBXq1qRWfK9GCvP70j4Jhn3bDoeOzLRCttNZZ9QZvx6hgfO62wTXTWvldgPsxKkxA7K0O0QWik60pw5FQs6PNucG37l9X+mTDdbYEFK+eHXFYTLzfpomE+GQwjGh3e4FGMA0ZlNK/zXdD8y+wKqVmKOGBfI8wuzkbSFNRf+TElpGRbYBVgyn/UfM8n6pBxwK27TtKNbqdFj7Zd2yFsGarr8FN7B2+g9Xdnp8ar2keSPwlKXBNvIEHon37NLChwV41IB8vEXUg9VTCielmjwI3ENJrUmYd5Jfb+gmcY3jJtGRL+XtRalqNCUqBkhFEjm5RcXfJ5j87XPv3dzwhKUTahcHPFaXlOiULpQyRcBmY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 4/30/2025 8:53 PM, Zi Yan wrote: > On 30 Apr 2025, at 13:45, Johannes Weiner wrote: > >> On Thu, May 01, 2025 at 12:06:31AM +0800, Yafang Shao wrote: >>>>>> If it isn't, can you state why? >>>>>> >>>>>> The main difference is that you are saying it's in a container that you >>>>>> don't control. Your plan is to violate the control the internal >>>>>> applications have over THP because you know better. I'm not sure how >>>>>> people might feel about you messing with workloads, >>>>> >>>>> It’s not a mess. They have the option to deploy their services on >>>>> dedicated servers, but they would need to pay more for that choice. >>>>> This is a two-way decision. >>>> >>>> This implies you want a container-level way of controlling the setting >>>> and not a system service-level? >>> >>> Right. We want to control the THP per container. >> >> This does strike me as a reasonable usecase. >> >> I think there is consensus that in the long-term we want this stuff to >> just work and truly be transparent to userspace. >> >> In the short-to-medium term, however, there are still quite a few >> caveats. thp=always can significantly increase the memory footprint of >> sparse virtual regions. Huge allocations are not as cheap and reliable >> as we would like them to be, which for real production systems means >> having to make workload-specifcic choices and tradeoffs. >> >> There is ongoing work in these areas, but we do have a bit of a >> chicken-and-egg problem: on the one hand, huge page adoption is slow >> due to limitations in how they can be deployed. For example, we can't >> do thp=always on a DC node that runs arbitary combinations of jobs >> from a wide array of services. Some might benefit, some might hurt. >> >> Yet, it's much easier to improve the kernel based on exactly such >> production experience and data from real-world usecases. We can't >> improve the THP shrinker if we can't run THP. >> >> So I don't see it as overriding whoever wrote the software running >> inside the container. They don't know, and they shouldn't have to care >> about page sizes. It's about letting admins and kernel teams get >> started on using and experimenting with this stuff, given the very >> real constraints right now, so we can get the feedback necessary to >> improve the situation. > > Since you think it is reasonable to control THP at container-level, > namely per-cgroup. Should we reconsider cgroup-based THP control[1]? > (Asier cc'd) > > In this patchset, Yafang uses BPF to adjust THP global configs based > on VMA, which does not look a good approach to me. WDYT? > > > [1] https://lore.kernel.org/linux-mm/20241030083311.965933-1-gutierrez.asier@huawei-partners.com/ > > -- > Best Regards, > Yan, Zi Hi, I believe cgroup is a better approach for containers, since this approach can be easily integrated with the user space stack like containerd and kubernets, which use cgroup to control system resources. However, I pointed out earlier, the approach I suggested has some flaws: 1. Potential polution of cgroup with a big number of knobs 2. Requires configuration by the admin Ideally, as Matthew W. mentioned, there should be an automatic system. Anyway, regarding containers, I believe cgroup is a good approach given that the admin or the container management system uses cgroups to set up the containers. -- Asier Gutierrez Huawei