From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4927EC3ABB6 for ; Mon, 5 May 2025 09:39:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F35776B0088; Mon, 5 May 2025 05:39:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EBF0C6B0089; Mon, 5 May 2025 05:39:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D5EF16B008A; Mon, 5 May 2025 05:39:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id B16A26B0088 for ; Mon, 5 May 2025 05:39:14 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id E71BC801BB for ; Mon, 5 May 2025 09:39:15 +0000 (UTC) X-FDA: 83408355870.18.ACCCD10 Received: from mail-qv1-f54.google.com (mail-qv1-f54.google.com [209.85.219.54]) by imf01.hostedemail.com (Postfix) with ESMTP id 0A7624000B for ; Mon, 5 May 2025 09:39:13 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Mf9pzMum; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf01.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.54 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746437954; a=rsa-sha256; cv=none; b=s1+g+Bcy7ChO+bg/AdA4kVN8609m1RoTWsgoxVeIbMjToyEHd9kqNC1qaeQnGx0/VPnHD1 04G5K63K3XfitJlUWDel+Bhvjnms1p3aPCvlY5ehHEPSW9E68sPHvPGihDzo2nhJCwz5mE AGKX3QdMRaAO0D0/mf/vOJw0j+3KsRI= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Mf9pzMum; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf01.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.54 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746437954; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PniXnnmZ7a07N4kLZ05ow2A8a799DETymk+oTq8f92k=; b=Wjq03rIGbI7hO2c3GHYKA8bmeaMFHEcefuMnsI97tpK5N5sIEDzQuxkb/mFQY/rWegu+pQ L5boX07+4ZvnubqNW5CvlB2Eh4w+XYe7XXINnyODUER0fJbq2mewPdxZIzsBg7xUKG/l9d NksjtGX3GVW7X3Pf4gQz28Lslqh0IDk= Received: by mail-qv1-f54.google.com with SMTP id 6a1803df08f44-6f0c30a1cb6so30694046d6.2 for ; Mon, 05 May 2025 02:39:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1746437953; x=1747042753; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=PniXnnmZ7a07N4kLZ05ow2A8a799DETymk+oTq8f92k=; b=Mf9pzMumayWkSPzb7BKcAKPmuZduFbhU/6Hr2JZIZa6G3buHIf6f6UlGoK4lswP8a8 hR4uAk4qjeG0Wtgeu3QPwTCT1a7Vm58DBYb/YaITjto9p+UKw2mxmJU8YBRfcySH1hWT UB0+S3h2mmJNsb06GnnkPk+iq1HRoea0wdYV+wxXxXtPMDTMyktOJ6J3GJ9h5uBkpAzt luQv8SSusQGz2/+ohG0ZUcYixnunfNjzhDjerx/NhvUaKsOSV1HWbjCD0HhLUJpkxpBP Te40ip3lB8xuW6AZrBKZgeoNuWxn1Rq5GcpZnugz3rQkIeRgk8HYxlpOhrAaccm0jyht q3wQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746437953; x=1747042753; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PniXnnmZ7a07N4kLZ05ow2A8a799DETymk+oTq8f92k=; b=BicURdc4T3Wwf8GlQYcFAJnBgR+crMfzaWwwQWoodXVDOgAHhm2aTgkfy+VF16bczM VSl3RiBnZpjbPrzUOn+qP49Iu7uIZVSknUPBZf7Oq9W2If+AMbOvCEDILwsfdAobXZXD BYvsNty8yTj2WWlaQNKuLsflhn7WGlp5YW3h4p8KyAXTHN4CRgDUVw1fBakpXLpN3zl2 O5xDClFfj/WenuISPytKFO+h3VYxJWqGXg2kqj4n3/tbEy3J7tv+RHwDfkp77RzRPyFy qdTqPzVE9TWKNp5MdR6bOJ5ZokwGRrRvdOhb2SE1lDuuTzqTcz5Qp/PsxiEqoINPexIa CnCw== X-Forwarded-Encrypted: i=1; AJvYcCXeEFxpj8KDcnULfia9dvgmqZMRoKjesxBNWgfh/bRbWzMq7aQjpPakoIO14nL3HVVdnkppCp3pWQ==@kvack.org X-Gm-Message-State: AOJu0YzsBomXjc47KJOrCyUbDUuDTkeZ+ABWXLOvfUIoha3Vcn1XdzDZ 5wt4VKC85xkxw5RL4hQe4yFBIoAREFeRS1KsJrHHB1GJjUuJ3Y3vr3m5PyofN+vcaPkJ3TWxoki r2bEce1EkH8je0Oh3eYyZHYw4/8I= X-Gm-Gg: ASbGnctF3h6J+xq517FhkfFk6iUS0z+Yrk7Pcz5uTP6I6Tc6cpD6nzAMG/OnmQ23JKM 0fbOtMXQDpQrhZGUhy89WfAH9z6cgq13fLBBSHNjvnBY0CytlqlZd6/tGksIFslvv1dlZRRWQv7 v3o7N55iF/1/f2CGdNw5k5hoo= X-Google-Smtp-Source: AGHT+IH5E8Q2fSfr06mpuhLKBq2yLsRg6RyFZ/c9Q16ztS7+h+zPz+rsMsfIdCy5fK8FK3cEEnp9/SuvKUocKIDFd38= X-Received: by 2002:ad4:5c4c:0:b0:6f4:c237:9709 with SMTP id 6a1803df08f44-6f528c9ca62mr130077606d6.25.1746437952956; Mon, 05 May 2025 02:39:12 -0700 (PDT) MIME-Version: 1.0 References: <20250429024139.34365-1-laoar.shao@gmail.com> <42ECBC51-E695-4480-A055-36D08FE61C12@nvidia.com> <8F000270-A724-4536-B69E-C22701522B89@nvidia.com> <20250430174521.GC2020@cmpxchg.org> <84DE7C0C-DA49-4E4F-9F66-E07567665A53@nvidia.com> <6850ac3f-af96-4cc6-9dd0-926dd3a022c9@huawei-partners.com> <88dd89b9-b2a2-47f7-bc53-1b85004e71da@huawei-partners.com> In-Reply-To: <88dd89b9-b2a2-47f7-bc53-1b85004e71da@huawei-partners.com> From: Yafang Shao Date: Mon, 5 May 2025 17:38:36 +0800 X-Gm-Features: ATxdqUHhMNJNNc_3HkeeMQyecqZ4pH3Yntbjz2m-_N2ilVFXA8JDzBUsehQ6bnk Message-ID: Subject: Re: [RFC PATCH 0/4] mm, bpf: BPF based THP adjustment To: Gutierrez Asier Cc: Zi Yan , Johannes Weiner , "Liam R. Howlett" , akpm@linux-foundation.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, David Hildenbrand , Baolin Wang , Lorenzo Stoakes , Nico Pache , Ryan Roberts , Dev Jain , bpf@vger.kernel.org, linux-mm@kvack.org, Michal Hocko Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 0A7624000B X-Stat-Signature: u1nmmcnq9aonpdtk4zc4u9egpqdhd5sc X-Rspam-User: X-HE-Tag: 1746437953-394108 X-HE-Meta: U2FsdGVkX18XbtVL6Oj/OincGf7kveG0vWmJwktDPs+/1W3z/+BLEqS55byCCc1sUunr9DKp0H/EhTw2s5xfsmyxD2BhxZf9YI8O1p2EK7Htuf2H/+rZdISodoxXoSZT+zF+bgIttVdWMbslNxRW8RL4v5ZrxO3cgadvmjXVUgIgt+DnUXOUKfGHUeCGxzXRnq89FY1MsRkIcpROAIGjYKX2UWn89G5jpYVDZTsRNongMUnd57EknRMwRpwyG4d0MEaNilgvOPw4x2rRzODFw9jIfuf67yqpbFqimCTE0HFcBgAqKqd/8sFiSX/0bR2wgyhKNY0hj5RMx7EWQTfRJwSpL+IyZmoLuhSFoT/V7n4Yv08zCK28DGZ5SMuWe5r1yO7Kk6AuCAvEUFxVdMiLOxpT6h+2iy747xH3sjRm9MO7Dk+zOElC/G3I+zIhQPKgCqUQdubYp74xwEt3po75xmWyzoIYdDfQ7JmSXxaziraQqmg7XqTfmHWRnLH8/4OHNMYcjgTo8OweiPebZbEcXVDDAbssFoo+UuqXu5Oh6fzr3YjT5z4bwR81gprQN0pvEgv7pXs2rg7JiVvo0X1FrNaekzU1Pu2lS02mSGhmYL5k5IJJHWrx5ly+XGCkGc8XTTgvHwbrBA7tT3o+0mxI04aaKUl8GkFj2fHb06DjTXa7eFRPR0idplUs39azhnmj34Hf8/CqmflVjWB7A7BBoma+XvrgD4Tyqn7ZkNOkJEOE69LwxFePC0aE0lnObwac4joamLEa3RFHkhmi0f1A3Ck8VFEdvJGNndv2XCfzr4Lwj831GqpKU4DAu+Ka1ytDLX8ghexNx28OXx5pjw4Fpjr6Oia4jw4+7QgS6NIY+5U+ryml3sBF8jFQSTB1buqI8MEnXEUWVtkf3SUgJHwG+J/GrLJXDssUyfmkVnArPJMv3UGMDkF+xHo6CXez5R3IlBHydBhcHHq1B9QUEAR Ll+HxKyS TwPOYJJVhns6idbHocFgOWad2KHDWCBXpRIUXQkI0ILq3TJhmFCno8jfy0/5l8adHf31KjRrt8IHoMOz218fkdF8meYBpZ1Q8R9CTZfgnK62kc/dn8CmJ1XYuOyuL06hkUCZrfeYWw5DE9oAmmfgNXST2W76R/NFd/h9WpLyAbaqwJB3vqzNi2qVsP+BBln4yHmcxs8eEvznad9xPYBDkEtfPRfk1lntJRHpV68X2FYLgnZnEjAalLH9nnqFUFR89QFR+H5aYsACc4HFKkOe5kZ335caY5DO28G9II5NXC2Aqf4u+Wqu1KDSBXb/c4evitr0215NiwtIwliJwxUl5igm6LEFofRzq4nW/gnGjdOP9fqocrO0KMcgDXqN3IOKkPltg7pwyBJA7OGT7tQ3Gy3YcekNscLHr6yCJjPUoSTZP3bQUaiR3aSpbBXJ0ilOVfZCf6PWGO0ntmOzlQk1kOKKc7rifBPkgpAfMTLMIrE7YL5GaMt3K2JBdrX84hBMIgkbwjIWliWW/qQM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, May 5, 2025 at 5:11=E2=80=AFPM Gutierrez Asier wrote: > > > > On 5/2/2025 8:48 AM, Yafang Shao wrote: > > On Fri, May 2, 2025 at 3:36=E2=80=AFAM Gutierrez Asier > > wrote: > >> > >> > >> On 4/30/2025 8:53 PM, Zi Yan wrote: > >>> On 30 Apr 2025, at 13:45, Johannes Weiner wrote: > >>> > >>>> On Thu, May 01, 2025 at 12:06:31AM +0800, Yafang Shao wrote: > >>>>>>>> If it isn't, can you state why? > >>>>>>>> > >>>>>>>> The main difference is that you are saying it's in a container t= hat you > >>>>>>>> don't control. Your plan is to violate the control the internal > >>>>>>>> applications have over THP because you know better. I'm not sur= e how > >>>>>>>> people might feel about you messing with workloads, > >>>>>>> > >>>>>>> It=E2=80=99s not a mess. They have the option to deploy their ser= vices on > >>>>>>> dedicated servers, but they would need to pay more for that choic= e. > >>>>>>> This is a two-way decision. > >>>>>> > >>>>>> This implies you want a container-level way of controlling the set= ting > >>>>>> and not a system service-level? > >>>>> > >>>>> Right. We want to control the THP per container. > >>>> > >>>> This does strike me as a reasonable usecase. > >>>> > >>>> I think there is consensus that in the long-term we want this stuff = to > >>>> just work and truly be transparent to userspace. > >>>> > >>>> In the short-to-medium term, however, there are still quite a few > >>>> caveats. thp=3Dalways can significantly increase the memory footprin= t of > >>>> sparse virtual regions. Huge allocations are not as cheap and reliab= le > >>>> as we would like them to be, which for real production systems means > >>>> having to make workload-specifcic choices and tradeoffs. > >>>> > >>>> There is ongoing work in these areas, but we do have a bit of a > >>>> chicken-and-egg problem: on the one hand, huge page adoption is slow > >>>> due to limitations in how they can be deployed. For example, we can'= t > >>>> do thp=3Dalways on a DC node that runs arbitary combinations of jobs > >>>> from a wide array of services. Some might benefit, some might hurt. > >>>> > >>>> Yet, it's much easier to improve the kernel based on exactly such > >>>> production experience and data from real-world usecases. We can't > >>>> improve the THP shrinker if we can't run THP. > >>>> > >>>> So I don't see it as overriding whoever wrote the software running > >>>> inside the container. They don't know, and they shouldn't have to ca= re > >>>> about page sizes. It's about letting admins and kernel teams get > >>>> started on using and experimenting with this stuff, given the very > >>>> real constraints right now, so we can get the feedback necessary to > >>>> improve the situation. > >>> > >>> Since you think it is reasonable to control THP at container-level, > >>> namely per-cgroup. Should we reconsider cgroup-based THP control[1]? > >>> (Asier cc'd) > >>> > >>> In this patchset, Yafang uses BPF to adjust THP global configs based > >>> on VMA, which does not look a good approach to me. WDYT? > >>> > >>> > >>> [1] https://lore.kernel.org/linux-mm/20241030083311.965933-1-gutierre= z.asier@huawei-partners.com/ > >>> > >>> -- > >>> Best Regards, > >>> Yan, Zi > >> > >> Hi, > >> > >> I believe cgroup is a better approach for containers, since this > >> approach can be easily integrated with the user space stack like > >> containerd and kubernets, which use cgroup to control system resources= . > > > > The integration of BPF with containerd and Kubernetes is emerging as a > > clear trend. > > > > No, eBPF is not used for resource management, it is mainly used by the > network stack (CNI), monitoring and security. This is the most well-known use case of BPF in Kubernetes, thanks to Cilium= . > All the resource > management by Kubernetes is done using cgroups. The landscape has shifted. As Johannes (the memcg maintainer) noted[0], "Cgroups are for nested trees dividing up resources. They're not a good fit for arbitrary, non-hierarchical policy settings." [0]. https://lore.kernel.org/linux-mm/20250430175954.GD2020@cmpxchg.org/ > You are very unlikely > to convince the Kubernetes community to manage memory resources using > eBPF. Kubernetes already natively supports this capability. As documented in the Container Lifecycle Hooks guide[1], you can easily load BPF programs as plugins using these hooks. This is exactly the approach we've successfully implemented in our production environments. [1]. https://kubernetes.io/docs/concepts/containers/container-lifecycle-hoo= ks/ -- Regards Yafang