From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A58A6C369D9 for ; Wed, 30 Apr 2025 15:37:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A89B56B00C4; Wed, 30 Apr 2025 11:37:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A38E26B00C5; Wed, 30 Apr 2025 11:37:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8FF0B6B00C6; Wed, 30 Apr 2025 11:37:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 728176B00C4 for ; Wed, 30 Apr 2025 11:37:45 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6B99058584 for ; Wed, 30 Apr 2025 15:37:46 +0000 (UTC) X-FDA: 83391115332.28.E66EEED Received: from mail-qv1-f41.google.com (mail-qv1-f41.google.com [209.85.219.41]) by imf20.hostedemail.com (Postfix) with ESMTP id C67F11C000B for ; Wed, 30 Apr 2025 15:37:44 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=K+8DBybd; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.41 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746027464; a=rsa-sha256; cv=none; b=2IAhMnnE+dLNPm2vmATUko45c2rZajXqc6HNxF7qBMNx0ys18GrLvCGMIrJlTihWIWaNVd xmFF/PU+MtgPEKURm5jRFTmUWlzn8r6/WetK/9T61x3rF22pCqWSXs6JL1Ozj8Eha56E2e vyIXgjKZygsCAnzYUODERK5vHeQq8uM= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=K+8DBybd; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.41 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746027464; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GRw+yEq9weFB0BCZUYk2ytPuqJJHEMzPnj2c6RnUpAc=; b=nXEV+5tifaIYek6kkA/hf3pXMztJoG8K4DZEPXdLCLLcYZNEmVXqoC7oLW8IpulcZH5EuH Vkg0Mn/8zTmHr80y67KXivcxHKQ4tB/GBf7V3GVAtjgs2kUAgjC9X8WAFv9JlX30RxIqBc Mss2BJN3ECOW+urm3RIPuwDxgDRYylo= Received: by mail-qv1-f41.google.com with SMTP id 6a1803df08f44-6e900a7ce55so380866d6.3 for ; Wed, 30 Apr 2025 08:37:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1746027464; x=1746632264; darn=kvack.org; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=GRw+yEq9weFB0BCZUYk2ytPuqJJHEMzPnj2c6RnUpAc=; b=K+8DBybdbBVm+Eoc9IAsmKRpFofg2xdMcvgvnn3rCJuwQsc2/kOlFZ/b4EMKF6Lyx0 erNWnGV62nzw3PzuxIBcc2vSFcSGWCBeLOQnNTe9aIyFcZQGpIVLmTG32ZZky8fmcCK+ JIdVtHQNZcrrSJrglGwSMLLAleWIWKphF1NDTrftzHV1ubHbxfe9x35WN0ZCG6FkiNS9 wL4Cd4b0bK86C9kq3H+mU0oc3z9SxQQh9LfP9NJCRWe77Iq84UV3vMx2M4emZId9IVV0 KW9o7J2V9cPJZ2yoLWIn2i1JpztIdC9hfh0oKkt5rHlbX4mVp2KF/itebe9VJPR5bH6N Tnug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746027464; x=1746632264; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GRw+yEq9weFB0BCZUYk2ytPuqJJHEMzPnj2c6RnUpAc=; b=pkPOOu9GfOz6zgBJDOYklqPjoAzqRu0wsAQ/rc9V86rrv0WN8h2RV3cAGr1Itr77zw ddUTr+2yYR9oY/rnovDum95+ErrgM9MBUJN5ydGjqd5R4g6ws/YXItz/2FdL6W6M7yaC WVtJTvt538z/PHKh12fSz7zwGNaR1GZlBP3w1izceQWOYRCxr+tktmX2yXj5j+FijbBP S4NHD46DFs+p5YzVDiajr77tI0eqU0g8E4ebq2+k72pGr+uRReSuV6aQEVhyHkLtH9sI WCXK1QHQXu9JWkTR0kMqHgupoHk5h8OtNW38me5gH4nHNifHZOd77cl6am0zGdAhr9eb O4/Q== X-Forwarded-Encrypted: i=1; AJvYcCUiedtUVO5X4trb55Ng2ZnCT5sFIDi9NhfSsFRqPVSMjPXW2mUD+cQX0fN5I6H8LcY9E6lq+hD5Tg==@kvack.org X-Gm-Message-State: AOJu0Yxuqrz9f1zMVPf15k198QxB3oOyf0S7/pYudp4e1b4fZ0g6ZgEx q4ErLO7uG0nYu4fZSwYDYQmgB8gifa87ZtSotCcHMTXVSRSv5KBJP6+JHdNt+SiHKCY8Q3L3ViK dHVd4KVY3M5mcfhTiiAM2T9l1IpJhEimBV6yWVA== X-Gm-Gg: ASbGncvG1p+6Qoba1Gj+qpLLJArFIFWyui1nU7uDk4FP+jK7AqQOLHlSZNMDHRDIiJG J+rVlYQUzhrd6ZlWEq7J5DLLUov9oIYZFs6WNK9cKDQE0YhtnbW+NzdMnojzygWcp5mfQ9eAQSy Kv0uGTdG7O/O7SmDKPdN7JESs= X-Google-Smtp-Source: AGHT+IF+aq9iikENH60hZ845vHJlrkoG3zl+ByzpSrFg949If3QGHY7mdvnjsL9+gS6npDryLWtUYQDRULbZdL109is= X-Received: by 2002:a05:6214:20cc:b0:6e0:f451:2e22 with SMTP id 6a1803df08f44-6f4fe12ec22mr56109566d6.38.1746027463841; Wed, 30 Apr 2025 08:37:43 -0700 (PDT) MIME-Version: 1.0 References: <20250429024139.34365-1-laoar.shao@gmail.com> <42ECBC51-E695-4480-A055-36D08FE61C12@nvidia.com> <8F000270-A724-4536-B69E-C22701522B89@nvidia.com> In-Reply-To: From: Yafang Shao Date: Wed, 30 Apr 2025 23:37:07 +0800 X-Gm-Features: ATxdqUHlw_D1Z2w46AQq5zzxERhtXm_FtWE0HLa0uRHO69s5Y3koPmM114Oe0y0 Message-ID: Subject: Re: [RFC PATCH 0/4] mm, bpf: BPF based THP adjustment To: "Liam R. Howlett" , Zi Yan , Yafang Shao , akpm@linux-foundation.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, David Hildenbrand , Baolin Wang , Lorenzo Stoakes , Nico Pache , Ryan Roberts , Dev Jain , bpf@vger.kernel.org, linux-mm@kvack.org, Johannes Weiner , Michal Hocko Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: C67F11C000B X-Rspamd-Server: rspam04 X-Stat-Signature: thfjdfajxxwxhem95gjdw3pfgw5j696u X-HE-Tag: 1746027464-911004 X-HE-Meta: U2FsdGVkX1/1VO0YNuqW+VvaPawnNOo2HXOQBOSqUGma6OVYEDhq5VoC/MxuWL1usQ1zb1aQ0plHs2yU8WMs87XdqnPZ1xYodd85ujRosbfswsdvclxIvKJWb1ORaeG2JwtBLbcwKy70g30Ka5lGuylLMThICZDTIQ/P3+s+G8cR1I64FgefqhaOnuAawNYNPDeWeS8l4eCTixjrnFwdJvJ50VduVQptyEFx6/CERm6NjHZhXdneXAuOXPHg6PbdRHIGaPIoDqTn0khEJN0n1jIaYQ7AypmLYMQW7ed2HaEFqR6ze7S2ZqE9MuxqTuIkPN2/4fLZFSCYris7qwQxWxiOECjebRHFM3IPXuUn4F8iNm55UZMHphWJwnqE6AsG9BZ6jy8QO/TlGt7EHPlBeskPhrLlNw4IfzGqNWs1vTekcOs6xMhRP8iCA8wA4iKC/ZSy8Bwo9ypLHDQ8biGpBhkxEKIVvslk5ZLZ5+lMCsPs+/LamQ7l9esgmeJfeAGHNJZ/QMAp086OD2GvxsPiI35bRhwqAmrgrHKfdYzIXFimXKZUnUkqeQ/rDduVktspc24yNL1YTysnxGn2Mx7CewS0hc6JIYmVW2M+itn3dLiHHCIevdSDifE/mkWBQZr7Qi5IQfoBsak2XBu0zqZYwcQZf1Z8nnGKbi3CgXSWSZ7A6n5mnOJMWwEsfWYfaZXKHu+O+1PpBBx1L1tuWuo4fGPYx4t10+BlVYxrUDaYOiAOh4BTgH9gxCsKsocpJ3khJnvNp6WBDnpGQuZxmUUV+jQ632s3gfxkbHiOvrHKZCUEkNd6JkN27upBNW//ZGO8CSThhg6EnHB9k3dodfATyKqVxCsDSJUTHqR+2OsdXAwy6EVvuvmtkEM6qbR6WzTasQ3wDZewmTDQtD72dZHqBT+qmi9CWIHRZCvQ9Th37b0ONUk8tjrbLUMeka2waUoBjCa7q9tBV8YtsYYwpDN U0dUNqq9 fO/+Gk8l1phdggu6Ft7wn50GC7LkzojHWSHSBeVKUKec6tSe3E1T3vBsZCVh/vjmEZjVqx0f4kUFyxm+CVz39P2GnBg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Apr 30, 2025 at 11:21=E2=80=AFPM Liam R. Howlett wrote: > > * Zi Yan [250430 11:01]: > > ... > > > >>>>> Since multiple services run on a single host in a containerized e= nvironment, > > >>>>> enabling THP globally is not ideal. Previously, we set THP to mad= vise, > > >>>>> allowing selected services to opt in via MADV_HUGEPAGE. However, = this > > >>>>> approach had limitation: > > >>>>> > > >>>>> - Some services inadvertently used madvise(MADV_HUGEPAGE) through > > >>>>> third-party libraries, bypassing our restrictions. > > >>>> > > >>>> Basically, you want more precise control of THP enablement and the > > >>>> ability of overriding madvise() from userspace. > > >>>> > > >>>> In terms of overriding madvise(), do you have any concrete example= of > > >>>> these third-party libraries? madvise() users are supposed to know = what > > >>>> they are doing, so I wonder why they are causing trouble in your > > >>>> environment. > > >>> > > >>> To my knowledge, jemalloc [0] supports THP. > > >>> Applications using jemalloc typically rely on its default > > >>> configurations rather than explicitly enabling or disabling THP. If > > >>> the system is configured with THP=3Dmadvise, these applications may > > >>> automatically leverage THP where appropriate > > >>> > > >>> [0]. https://github.com/jemalloc/jemalloc > > >> > > >> It sounds like a userspace issue. For jemalloc, if applications requ= ire > > >> it, can't you replace the jemalloc with a one compiled with --disabl= e-thp > > >> to work around the issue? > > > > > > That=E2=80=99s not the issue this patchset is trying to address or wo= rk > > > around. I believe we should focus on the actual problem it's meant to > > > solve. > > > > > > By the way, you might not raise this question if you were managing a > > > large fleet of servers. We're a platform provider, but we don=E2=80= =99t > > > maintain all the packages ourselves. Users make their own choices > > > based on their specific requirements. It's not a feasible solution fo= r > > > us to develop and maintain every package. > > > > Basically, user wants to use THP, but as a service provider, you think > > differently, so want to override userspace choice. Am I getting it righ= t? > > Who is the platform provider in question? It makes me uneasy to have > such claims from an @gmail account with current world events.. It=E2=80=99s a small company based in China, called PDD=E2=80=94if that inf= ormation is helpful. > > ... > > > >>> > > >>> I chose not to include this in the self-tests to avoid the complexi= ty > > >>> of setting up cgroups for testing purposes. However, in patch #4 of > > >>> this series, I've included a simpler example demonstrating task-lev= el > > >>> control. > > >> > > >> For task-level control, why not using prctl(PR_SET_THP_DISABLE)? > > > > > > You=E2=80=99ll need to modify the user-space code=E2=80=94and again, = this likely > > > wouldn=E2=80=99t be a concern if you were managing a large fleet of s= ervers. > > > > > >> > > >>> For service-level control, we could potentially utilize BPF task lo= cal > > >>> storage as an alternative approach. > > >> > > >> +cgroup people > > >> > > >> For service-level control, there was a proposal of adding cgroup bas= ed > > >> THP control[1]. You might need a strong use case to convince people. > > >> > > >> [1] https://lore.kernel.org/linux-mm/20241030083311.965933-1-gutierr= ez.asier@huawei-partners.com/ > > > > > > Thanks for the reference. I've reviewed the related discussion, and i= f > > > I understand correctly, the proposal was rejected by the maintainers. > > More of the point is why it was rejected. Why is your motive different? > > > > > I wonder why your approach is better than the cgroup based THP control = proposal. > > I think Matthew's response in that thread is pretty clear and still > relevant. Are you refering https://lore.kernel.org/linux-mm/ZyT7QebITxOKNi_c@casper.infradead.org/ or https://lore.kernel.org/linux-mm/ZyIxRExcJvKKv4JW@casper.infradead.org/ ? If it=E2=80=99s the latter, then this patchset aims to make sysadmins' live= s easier. > If it isn't, can you state why? > > The main difference is that you are saying it's in a container that you > don't control. Your plan is to violate the control the internal > applications have over THP because you know better. I'm not sure how > people might feel about you messing with workloads, It=E2=80=99s not a mess. They have the option to deploy their services on dedicated servers, but they would need to pay more for that choice. This is a two-way decision. > but beyond that, you > are fundamentally fixing things at a sysadmin level because programmers > have made errors. No, they=E2=80=99re not making mistakes=E2=80=94they simply focus on the implementation details of their own services and don=E2=80=99t find it worthwhile to dive into kernel internals. Their services run perfectly well with or without THP. > You state as much in the cover letter, yes? I=E2=80=99ll try to explain it in more detail in the next version if that would be helpful. --=20 Regards Yafang