From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBC0BC3ABC6 for ; Thu, 8 May 2025 16:35:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 130966B009D; Thu, 8 May 2025 12:35:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0DF006B009F; Thu, 8 May 2025 12:35:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E99746B009E; Thu, 8 May 2025 12:35:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id CEC5D6B009C for ; Thu, 8 May 2025 12:35:14 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id ED991BBD26 for ; Thu, 8 May 2025 16:35:14 +0000 (UTC) X-FDA: 83420290548.07.25295A7 Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com [209.85.218.43]) by imf01.hostedemail.com (Postfix) with ESMTP id 004314000A for ; Thu, 8 May 2025 16:35:12 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fR1bovKu; spf=pass (imf01.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.218.43 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746722113; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jHyGHBJUaXvF+9cY/pYLHudacCmJqYOcWmvQRbveR/A=; b=RS5jQQVDoxJJCbp+P2lFUwS/cI/gvVcJ+/nwUUk8opCThnt6+VZsFZ35Lg5smGmdtiMGPu NpiUY5gb+cL6MaI5r5I63Qq9IgQn5O+EEC9BxeziixsSTaxG8JDdRlifPMwhR+2a69GW+w r5Vy32u55xqAtB477eVjJtB0uhv/llQ= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fR1bovKu; spf=pass (imf01.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.218.43 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746722113; a=rsa-sha256; cv=none; b=wT/SDhHSumRWYItNnqhdoECQomBSLOkHkQqjfe6VirFUbAr2uqV91KU14Xcxh5FJIRj6hc 2Z3tqFsdq+If8wqWjzzKyYIHJ7v+QirTt6cGjy8SdPZWE8oaK7BDT98XObA5vIaNyrAsqC nT8AHi97wXe8vRQcxx6QK7hihTSI/Ko= Received: by mail-ej1-f43.google.com with SMTP id a640c23a62f3a-ad1b94382b8so62893366b.0 for ; Thu, 08 May 2025 09:35:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1746722111; x=1747326911; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=jHyGHBJUaXvF+9cY/pYLHudacCmJqYOcWmvQRbveR/A=; b=fR1bovKuKIdbPL4ZPk1f0xssm2k/x3QjeLCf71A0aJsB9UhGbCufmaHg1bYp/Wjetn lcxi/VFQHHb33GLr3ERvn2RX+VbEAXcVCXUNjZLeQ6ao3MQGD1FfHGmzAucLLCLZNQfh O5z2UM8bOrUVmyf8rBrA8S81NHl/OFXr8npQCrDMyCVgtO55MOyOPRR4+iFvWlZICu5g ysnxqrytSdrzihVzrA3Mqj9tkebzWx7eR2KRiPQRw2oSGTcc7K4AA5Z2aYdYqQpaYtin 6ISmlCHTbWE9XHP4nCTWZHskFq1hfXLD5GDknxzz/jsucnrJAEHgYOYQKhrZVyn4ISQV jp9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746722111; x=1747326911; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jHyGHBJUaXvF+9cY/pYLHudacCmJqYOcWmvQRbveR/A=; b=ogos/c6U2aqGyVIXhk4/O563aC1pwr6/ls8l68b/kyRwlcmd5MeIfT7tx6LyojEJe0 qkeAiGpg7Zd8CmmcQRziBnxuwPB/M39HlyJfVhrAqxqgRE1OjUycGyRZRfoaQVGvM4AX 8gQkJUa4WcqV+BF8BdrIJCqr1nDgVZheAxItRiR0eiNClfH0651NEnXDeFfvFLmn2G9I +jg2OujOa3dzZmjSWJhg6bBCVkBOHV/EGHXXgq9Jgc8bLfk1PdsmLWpsVpF73dmJML7g GI6ir0DYRdEPAscnjB8BCYXylsjIE6SigHtkV3PR6XkcoHtZIVXe1IowyOSzzBIkIFTb Pd9w== X-Forwarded-Encrypted: i=1; AJvYcCVZZxQHruGwun8DlKf5ZhMq5yCvl33BWhhVwKF07a0u9l/DyDqvU6Z9V5cxoMWTu5wM2afNKyZwZg==@kvack.org X-Gm-Message-State: AOJu0YwvL+k0nt/Nsif0eWuZJuvFDvLsFjvzLteOvgTjjWxWjrGehY5K dz8fa54tdUJNwO9w5M9BB2X46D4tUPrBE3hB9k6aEz04e3Npdcsf1CCmMQ== X-Gm-Gg: ASbGncv12ggSH59awA8ZEaqva7g2do7G49FvWDa+Zzmv+fML9t9LAIKzVd0Ii1U4WFA JKe0vugA9/JwB78wf2cQJTmkt1QeZlLNOxHWvTY5fg6PuI4TyC0tpl1IcTO9mbgrn+NduTtTyIp qiOAdL51HLspy5bPpS0VY1/S3UBX6mF6zWB/TDLSDhjugDoRve5T3W+B29ikvmRfF9OL5psbVtu Mx5mWQuyrebnPne78P05Iq/YuHcaXeHyhTx1I7x3fzsmhUTIUnhzYjzSL3mJk0zzL/JtpGuJZ8x vBKyo1zSDdH7mHTST10rMN0fyfkRNzgwID/p8YOqFaHRWwN7xjyzbNolyrH+kFcoNNiXvB7JEI+ aFSj9dOMkzw2CswkAraS3yOFg X-Google-Smtp-Source: AGHT+IF/vnEgyoZA4S7VuuJSjHpkb42qJGtYJlNelnZudTkzK2HCSDBh7frfbj0eBA5f/Yop6ipwzg== X-Received: by 2002:a17:907:7251:b0:acb:b864:829c with SMTP id a640c23a62f3a-ad218e5778bmr37505966b.10.1746722110966; Thu, 08 May 2025 09:35:10 -0700 (PDT) Received: from ?IPV6:2a03:83e0:1126:4:14f7:eab6:23d5:4cab? ([2620:10d:c092:500::7:80fe]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-ad2197466adsm8949266b.92.2025.05.08.09.35.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 08 May 2025 09:35:10 -0700 (PDT) Message-ID: Date: Thu, 8 May 2025 17:35:10 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 0/1] prctl: allow overriding system THP policy to always To: David Hildenbrand , Andrew Morton , linux-mm@kvack.org Cc: hannes@cmpxchg.org, shakeel.butt@linux.dev, riel@surriel.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, linux-kernel@vger.kernel.org, kernel-team@meta.com References: <20250507141132.2773275-1-usamaarif642@gmail.com> <3b5d929f-ec2f-4444-825f-81e71f804033@redhat.com> Content-Language: en-US From: Usama Arif In-Reply-To: <3b5d929f-ec2f-4444-825f-81e71f804033@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 004314000A X-Stat-Signature: q79kedti8bzd4th4wemh4ihy16uafwf8 X-HE-Tag: 1746722112-661588 X-HE-Meta: U2FsdGVkX186pRLm3pVXzXlDp3X//A5yeIa+Y9DQBHwc36WPs1z9VO19BKEHiLla5zRyqb1EwhUXOp1YBhSQTaj0K3aSQZbxB+yr7YSrvNa/EZs7YSwa8D21sIKyZEcWZ+F4fuEpU9ifeVdFAIe76jh9BWaKQwAk12rB+rD8bEN+pMFU2EXGQ4uxBHZY13gIHfBJ4nruVdoIGEdoU/7unjgVWL7w8ohwGiYy/LtHUh0Hmuwd4BdrZfXqx8pqQFmQ5/BQ1fj2+IMFKuKUIuejSSzrDBFayTFjijU5ZtLC/IZ8gCsMlqUsYDkSRBpN5wSJgsQ2XFZ0zdF7IGeGYLAHrf/SGFdC17GuAG1FwaVVf5hhMq55BvHhMqt362yk6kZ04PgKgrB5D6/K+x1ky0NU5I26DJZh32z/5j9Ot6+hSESUroWwOD2ykv8bTkPtvVTSxMdwZUcgysfUg1sfNeKrPhklwU/LMycHzwNZ+y+9EGfkaZWLPGhPsnDkpRno27n+WmNkabsm3YpF28xbRgy0wOlfBFw6eUQI53bqB2g+YDqAjhUNzYZkvyVLfEec3Qgexbd/nbxya1xxB4mMqx09Xa5Z/zhseBifSEM7UYVzOxGKKtPZumV6rWHueGrmw4A6BtCDGXXfsw7yC9u9op9zkmqwmmsBKV5WPWcxU1amnrmFzZmDkNIDr1ful+e6W1LVc/hKszMEJDsjqExS0kqImX4Ev1riYxnYfupFhNCu3UxSsKOzphC+I9NBsaK2ImydZB242RhNurI0i81shAKcsfZAjYJEvXyxBh5I10NAam4s3+1srIAhgCDRDhW1mwIpgH++6oIFxZcdm48ZyJmNVENl3avpwGdC6I+pNoabgehRvNOBwUlRF2OxkaXzcBad2abI5O/f+aCjsp+es2IH7k/c8OPDZB/b0CweQ858WVYEu86X/jRbXY5XVnRH7NjkyTwRyQyVI+2MQl0V5Gn uomgxgsz Qc2HavbXFOUvCsUtJRxJ+EZuTSe1SqC1tQvkqV33s54qEtP4MCUACY4esIsIV4z3tvGDv4vIIxW9k86LnDzTVczkZC5veoyVtZTdJon9WiAv4bLwGUI2ZIjeFXmgIQyXGC1BLoL0L2+spyPyitKjf/LXC+xkwAzLGQKBtEmpeqhMmSjDMnXnEgxgkp2WFxha8Yc9VgCXWxF/kpokJIQplSbcrEUNEK+BoFvKB7gB3UrnAAzNx4rsG86TO/Vy9KmDpBupmSPpXRcYwU1klh0Vx2nhLLWa/1e2+vFrGAIv5fXcaVS7LPXn2UgOPl194+LFQr0y6ZTjs3dnvwtgPP6Zvo1H47rU6zUIxloNDdJfJT9ZiBkWTNaZbbjaoxDzCNpmh1PFJFPl7MjtrgsAFi3wtox9D2nzbyauohVeqYdmSqcyWZ2b3F7l5HwPmTFgkmsG3a+9Zp37lVL2ac1+rKalatxBm0knG8anH0zFS X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 08/05/2025 12:06, David Hildenbrand wrote: > On 07.05.25 16:00, Usama Arif wrote: >> Allowing override of global THP policy per process allows workloads >> that have shown to benefit from hugepages to do so, without regressing >> workloads that wouldn't benefit. This will allow such types of >> workloads to be run/stacked on the same machine. >> >> It also helps in rolling out hugepages in hyperscaler configurations >> for workloads that benefit from them, where a single THP policy is >> likely to be used across the entire fleet, and prctl will help override it. >> >> An advantage of doing it via prctl vs creating a cgroup specific >> option (like /sys/fs/cgroup/test/memory.transparent_hugepage.enabled) is >> that this will work even when there are no cgroups present, and my >> understanding is there is a strong preference of cgroups controls being >> hierarchical which usually means them having a numerical value. >> >> >> The output and code of test program is below: >> >> [root@vm4 vmuser]# echo madvise > /sys/kernel/mm/transparent_hugepage/enabled >> [root@vm4 vmuser]# echo inherit > /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled >> [root@vm4 vmuser]# ./a.out >> Default THP setting: >> THP is not set to 'always'. >> PR_SET_THP_ALWAYS = 1 >> THP is set to 'always'. >> PR_SET_THP_ALWAYS = 0 >> THP is not set to 'always'. > > Some quick feedback: > > (1) The "always" in PR_SET_THP_ALWAYS does not look future proof. Why wouldn't someone want to force-disable them for a process (-> "never") or set it to some other new mode ("-> defer" that is currently on the list). Yes agree with this, I think there are 3 different possible ways forward for this which I outlined in [1] in reply to Zi Yan. I like flags2 approach, but let me know what you think. [1] https://lore.kernel.org/all/9ed673ad-764f-4f46-84a7-ef98b19d22ec@gmail.com/ > > (2) In your example, is the toggle specific to 2M THP or the global toggle ...? Unclear. And that makes this interface also suboptimal. In this approach you would overwrite inherit folio sizes, and I think thats the right approach. So if you have for e.g. 2M and 16K set to inherit, and the global one is set to madvise, doing PR_SET_THP_ALWAYS would change those folio to always. > > (3) I'm a bit concerned about interaction with per-VMA settings (the one we already have, and order-specific ones that people were discussing). It's going to be a mess if we have global, per-process, per-vma and then some other policies (ebpf? whatever else?) on top. > > > The low-hanging fruit would be a per-process toggle that only controls the existing per-VMA toggle: for example, with the semantics that > > (1) All new (applicable) VMAs start with VM_HUGEPAGE > (2) All existing (applicable) VMAs that are *not* VM_NOHUGEPAGE become VM_HUGEPAGE. > > > We did something similar with PR_SET_MEMORY_MERGE. > For this you mean the prctl command would do for_each_vma and set VM_HUGEPAGE to implement point 2. For having point 1, I think we will still need extra mm->flags, i.e. MMF_VM_THP_MADVISE/DEFER/ALWAYS/NEVER. I think it would have the same affect as what this patch is trying to do? But would be just more expensive in terms of both code changes and the cost of the actual call as you now have to walk all vmas. On the other hand you wont need the below diff in from v1. I do feel the current approach in the patch is simpler? But if your point 3 is better in terms of code maintainability, happy to make it the change to it in v2. diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 2f190c90192d..0587dc4b8e2d 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -293,7 +293,8 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma, if (vm_flags & VM_HUGEPAGE) mask |= READ_ONCE(huge_anon_orders_madvise); if (hugepage_global_always() || - ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled())) + ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()) || + test_bit(MMF_THP_ALWAYS, &vma->vm_mm->flags)) mask |= READ_ONCE(huge_anon_orders_inherit); orders &= mask;