From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 764D8C54FB3 for ; Mon, 26 May 2025 20:30:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A09816B0082; Mon, 26 May 2025 16:30:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9BA676B0083; Mon, 26 May 2025 16:30:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8A84F6B0085; Mon, 26 May 2025 16:30:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 6B6A66B0082 for ; Mon, 26 May 2025 16:30:46 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id D3308140D01 for ; Mon, 26 May 2025 20:30:45 +0000 (UTC) X-FDA: 83486202450.04.7103643 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by imf01.hostedemail.com (Postfix) with ESMTP id 2098940012 for ; Mon, 26 May 2025 20:30:42 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei-partners.com; spf=pass (imf01.hostedemail.com: domain of gutierrez.asier@huawei-partners.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=gutierrez.asier@huawei-partners.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748291444; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=04oq1Dp5rDwD2qrc8wRfBGQDG4CAbxWthZ6p1RDAJLQ=; b=KurWP8YdSqc2TcOY0VV/CGCXkLLy/vT2NwpARXVfZOGrdcXXt+NxjxTe9Z6yN7yUzH6BvY n5mfYhW1ihRYEPNAkaXWqWEjtY79KYxcCN/d6Gom40IjN70Zv0srhSN/QdJB24LIlgpf1l 0hSiUVecOiM+7sRmooRgJEBeWcZK6fU= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei-partners.com; spf=pass (imf01.hostedemail.com: domain of gutierrez.asier@huawei-partners.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=gutierrez.asier@huawei-partners.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748291444; a=rsa-sha256; cv=none; b=FUdcS8yAkM9KvAxZSg9cvhw9re/8gV3yJuLhMVFwPSAIdct0aXAPhwuisDfEbRC7mddCgP E3Mv48a2AdSc1PzLazkG66Gq49wJZvj6SkZYzSlVttNIv41mVcLAl5HW7/HEBeU1aIiHdx /z2z/iSFmRDQ9Sqk6sFxk0FHAYb1GJA= Received: from mail.maildlp.com (unknown [172.18.186.231]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4b5nMR72Gbz67MmR; Tue, 27 May 2025 04:25:39 +0800 (CST) Received: from mscpeml500003.china.huawei.com (unknown [7.188.49.51]) by mail.maildlp.com (Postfix) with ESMTPS id 61B851402F3; Tue, 27 May 2025 04:30:39 +0800 (CST) Received: from [10.123.123.154] (10.123.123.154) by mscpeml500003.china.huawei.com (7.188.49.51) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 26 May 2025 23:30:38 +0300 Message-ID: Date: Mon, 26 May 2025 23:30:38 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v2 0/5] mm, bpf: BPF based THP adjustment To: David Hildenbrand , "Liam R. Howlett" , Yafang Shao , , , , , , , , , , , , , , , References: <20250520060504.20251-1-laoar.shao@gmail.com> <7d8a9a5c-e0ef-4e36-9e1d-1ef8e853aed4@redhat.com> <3b792576-6189-4f53-b47f-95875181a656@redhat.com> Content-Language: en-US From: Gutierrez Asier In-Reply-To: <3b792576-6189-4f53-b47f-95875181a656@redhat.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.123.123.154] X-ClientProxiedBy: mscpeml100004.china.huawei.com (7.188.51.133) To mscpeml500003.china.huawei.com (7.188.49.51) X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 2098940012 X-Stat-Signature: n7spgmfjykroq3o7pjq1tcmhnb5eeriq X-Rspam-User: X-HE-Tag: 1748291442-898072 X-HE-Meta: U2FsdGVkX18g1zfXk9V8SqKINm9UsUPHwLo3Anqv/eiam1nzObXquXD1Kpq9O7suY4eZr3xDoLt+fkg2K/CA/j7E8cH/7433qreFFRDrVZSXNKtZO7h7Fyd7UtTt+ATGX8SHdn1ToIEWOhFAz6KsEqOhcitJNZXCZV9hdkV5fixMzHJbIo3tP56ozWI4Mo34lLDga/9VVGK0H+lk6hKX98mxKSyC91hzVgl1DznQcDyPubFhMsC1NOy52vaENJR0Y+IzL0TIQYHTBL7rTqeK99fszrHlG2RjaxQ0XrIpeGBnjgo458hTAt7oQV37lJosj7faazu1xnbE6+pPiMCb/PQLyZ1Apq0/AGq2+wTZ2w4fDCLKef9soYS1Gzor/r7tzguxwek2IpD9HwZURhmMz/P3IRg+rozjkEGvI62LZ8SrlipORZTuRd4Q2lyiL/U9xZhtN8kXhuvI9QjAHHATXL8aUC80E4ulgf+M8j0ej83EAztA/ocbSjUOtkKsOmZy0YIuomlEWnuZ4V99+5TN8QNaO2Q+05QrpLLZCqr0XKYQ+sqjUxerRKn+Tfgq1EOhwHZI1t/1tGPmGrTvxd+UvZrzV4ZArOp+sg/QhtyNoMkw56k2UnnBGqRieuLOAIgMi3Kc1FHx9hmMCQXZVQXXrLUPJZjqci5i3G058FU43YGlLTcZfnqJIObT50c1DGiT8MT6goHWAHnSmYERs2uIK3SekGw/CPwyonBFJg8yvMGOJ9ObtdW2b17Bs25uwGQ8do4tNEZ9vdX6PEz5d5NFmbB1vNLLOT3V3SyAjaDW3+StKchkPsPLHkDXdl7lOy/et7ZHELPW8Ff1m0y98IMhBY01DG5JI5ert9FlEBe3YtXfw3LT5YkLgUYjqCYYLxT2g2s/YilIOzJvGWcHc0eJtPFD+6FaPVpqq5g/KppR3DErVKel6uq1ozTsgfzm2vdSZCiThIqPUhfss9eaoiv oICyfzc8 sabFKyCT/CgKcwpaH+Fv2qOlkRbHwHfP+QMy3EKEtH9gQATBnowEH+v1L6CKmUa5GK2Ua7/u0495AWJw7I02AIt3aa1fGOC8d1G91c7WGNA16/pPtA06z9yVAzFlEwqqqlsiKGanhziVNd35TY1GkgUrXm7LkZafIW7azSXwsGZKc610eTb2MJY1ms7uAW1X66J4gFO79ZzaE+oYGYlYpVCHliNYxW1inU0nxP/UFmTx8K6g6mAAUeVdv2MkfgJ1KjHpkZS5b9JCifr17ZMHAbk4kSXAtw/Ca7rUYAhKjsgHlLYmJ2qVrEzFvIDKk6XwEK1pehemlAZbbji7l+XgVE0OBO9e3Am6S6w1IA7i2QeJFfDwMtstLRT6RHw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 5/26/2025 7:51 PM, David Hildenbrand wrote: > On 26.05.25 17:54, Liam R. Howlett wrote: >> * Liam R. Howlett [250526 10:54]: >>> * David Hildenbrand [250526 06:49]: >>>> On 26.05.25 11:37, Yafang Shao wrote: >>>>> On Mon, May 26, 2025 at 4:14 PM David Hildenbrand wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> Let’s summarize the current state of the discussion and identify how >>>>>>> to move forward. >>>>>>> >>>>>>> - Global-Only Control is Not Viable >>>>>>> We all seem to agree that a global-only control for THP is unwise. In >>>>>>> practice, some workloads benefit from THP while others do not, so a >>>>>>> one-size-fits-all approach doesn’t work. >>>>>>> >>>>>>> - Should We Use "Always" or "Madvise"? >>>>>>> I suspect no one would choose 'always' in its current state. ;) >>>>>> >>>>>> IIRC, RHEL9 has the default set to "always" for a long time. >>>>> >>>>> good to know. >>>>> >>>>>> >>>>>> I guess it really depends on how different the workloads are that you >>>>>> are running on the same machine. >>>>> >>>>> Correct. If we want to enable THP for specific workloads without >>>>> modifying the kernel, we must isolate them on dedicated servers. >>>>> However, this approach wastes resources and is not an acceptable >>>>> solution. >>>>> >>>>>> >>>>>>    > Both Lorenzo and David propose relying on the madvise mode. However,> >>>>>> since madvise is an unprivileged userspace mechanism, any user can >>>>>>> freely adjust their THP policy. This makes fine-grained control >>>>>>> impossible without breaking userspace compatibility—an undesirable >>>>>>> tradeoff. >>>>>> >>>>>> If required, we could look into a "sealing" mechanism, that would >>>>>> essentially lock modification attempts performed by the process (i.e., >>>>>> MADV_HUGEPAGE). >>>>> >>>>> If we don’t introduce a new THP mode and instead rely solely on >>>>> madvise, the "sealing" mechanism could either violate the intended >>>>> semantics of madvise(), or simply break madvise() entirely, right? >>>> >>>> We would have to be a bit careful, yes. >>>> >>>> Errors from MADV_HUGEPAGE/MADV_NOHUGEPAGE are often ignored, because these >>>> options also fail with -EINVAL on kernels without THP support. >>>> >>>> Ignoring MADV_NOHUGEPAGE can be problematic with userfaultfd. >>>> >>>> What you likely really want to do is seal when you configured >>>> MADV_NOHUGEPAGE to be the default, and fail MADV_HUGEPAGE later. >> >> I am also not entirely sure how sealing a non-existing vma would work. >> We'd have to seal the default flags, but sealing is one way and this >> surely shouldn't be one way? > > You probably have  mseal() in mind. Just like we wouldn't be using madvise(), we also wouldn't be using mseal(). > > It could be a simple mctrl()/whatever option/flag to set the default and no longer allow changing the default and per-VMA flags, unless CAP_SYS_ADMIN or sth like that. > This isn't really TRANSPARENT Huge Pages, since we will require the application to determine which memory range will be mapped with huge pages. -- Asier Gutierrez Huawei