From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35878C2D0CD for ; Thu, 15 May 2025 13:35:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 630936B0082; Thu, 15 May 2025 09:35:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 569366B0083; Thu, 15 May 2025 09:35:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 393B16B0085; Thu, 15 May 2025 09:35:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 19A086B0082 for ; Thu, 15 May 2025 09:35:26 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id D77DCB5CA1 for ; Thu, 15 May 2025 13:35:26 +0000 (UTC) X-FDA: 83445239052.23.71C45D2 Received: from mail-qk1-f181.google.com (mail-qk1-f181.google.com [209.85.222.181]) by imf23.hostedemail.com (Postfix) with ESMTP id C257D14000C for ; Thu, 15 May 2025 13:35:24 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=V2WBaG17; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.222.181 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747316124; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3W70vOJiI6A4ZfVZFm0AS9kx4wlEEr6sA0Z5zAoPECo=; b=59xsTrKSnmwm/1i2mssuTc2rq6Obcm9elAPEFL/xqEoj08Dg2TEmog8kS4OBpkKAv5uCbm Hix6trl2BxMpKIW2LETI/pKWgseXrwDVriAV3VyAtXwjPwjLZdUEHX0Gymr2/FLvZqLX+q 9yoIFVKDxJkJNofw+iSX4Llkg7aTjQk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747316124; a=rsa-sha256; cv=none; b=IvQRM+aB3VqGU6kCr2xhLRoLuw7i9M8cwi6A2KpTczsFX52QVRdCek3l5BMuaKCO9COYLb tvFUWLx4ICb5sNpqCZZ+cGK82977uLfbbvVvW8pTutEXswOdPRgG6Aq8B2YuaCEWe5ByZy xEdsTQhtCPQumt41CdTox+wEQnq89Gg= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=V2WBaG17; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.222.181 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com Received: by mail-qk1-f181.google.com with SMTP id af79cd13be357-7c96759d9dfso138034385a.3 for ; Thu, 15 May 2025 06:35:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1747316124; x=1747920924; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=3W70vOJiI6A4ZfVZFm0AS9kx4wlEEr6sA0Z5zAoPECo=; b=V2WBaG17sUVEZWaiB9CEFh8j5jSo6iheYGWCcOpjB8G8RNGAUSEI2tajA2WRz4EUpK Tick0SQPW4nFOxh4xrHX9sA9pVTf8X+rix+6MZGdD6wTRZaK0fcDai/9XOa9iHhdEveU kITvMOhqzE2i2JEHJI7SH4Y7W2apEw+49IGZc3+FxmDvziD+1ZLVA3c8/4opAbKOXBoC ketoOum7M1coEKuQqA5iQ0KBC+nqVcwqMZ53adUgcUR4x0YPM/BnL3HXevjjD5nJbGfT uXqSrci9QC/Y68y0jqQ5PKKrbA9f8ocfg4SeiPPlmz1NScGw7QmYFEafi4Oxg7mT5jwA nq0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747316124; x=1747920924; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3W70vOJiI6A4ZfVZFm0AS9kx4wlEEr6sA0Z5zAoPECo=; b=va8MN/NaQ9MlcBWh6YJzsGrnusBPiJVF0XQ80hcFR6fnl5eK5lxoiKszLQ/mpFt6ym dd5qgcpB33gYhe7eyarOSsppSRQef2xz1WmhGlu3YmamWZ6iMETzNWxJlubXrHe9+AWy H4GOWXWAuFMRGcQo+X0yow00g0H4Ew8xzcPNoVaVuupAYAbU/g+134JjLiMRubJWAMoC DVnmM4lAZq+v136MLtzaQpJiNUEkvbAM63C2kkd9Fa6EVY70kHmA0vXNBRyUTK+xkq9z ZICcbNev+bNjQPl4rpdecrY1HAiQqJvHw4KOQf5NzOdMSDLsZJ5lmsMQFTZ11Q0Hld9F nQQA== X-Forwarded-Encrypted: i=1; AJvYcCXAPt+Y1EHvcMirGfaarX4eKyZSK7vQVnOqt4uy9KYG0oJ8Vo/9xcLZPm/Bl2FRxRoONkByua6IQA==@kvack.org X-Gm-Message-State: AOJu0YyX9roZurWOyT+Uc2cxMMExHof/LAaFnmXG3du2fyC/rvMiGVCq Q0Ow4a2300yV2G2LDaoSsUqm8dXXE+VXNFCIAGFuobrxUdA/syhp X-Gm-Gg: ASbGncs5K2pgCn7cPt5SVCiaQ5UZDsf2ulUuKUeh3uu95jKUKCDbSEr0+WXYS8RqamW tcLpj5xcWnzeCcXkD0S1pdP1MDg3V8UirfIcEFgHuKUsC4r+C5oHHiv8dzmN1IIS3C+Pl/AAC3g +I894wpYQKyz1/HuL786PtV0xLmguUKAe+faJaGYLl1mmAuN/+fHrc8uAeVLteMrmnxhU+YKnmo Vew23iCRcJ8qg8jzC/pQysZi+cky8tzocInFDXwGi3cAPzRb+s+KNs+cb7j8r4dlgTkZTjIBUsp /gwrngCO+voNHKzKE6vDjIwHfanUMNsEN3xGzCKOL4knp0Ly X-Google-Smtp-Source: AGHT+IGJfUgB6Cz7ZrvVPSp8W9uKmh5QpcoRFlMZ3ENFrZ5CQG6eq62jYYnhP0dl3rkIpmEEEM3nIA== X-Received: by 2002:a05:620a:2697:b0:7c5:4a8e:b6b with SMTP id af79cd13be357-7cd3c8099d4mr324281385a.46.1747316123724; Thu, 15 May 2025 06:35:23 -0700 (PDT) Received: from localhost ([2a03:2880:20ff:7::]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7cd295f0020sm251419785a.3.2025.05.15.06.35.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 May 2025 06:35:23 -0700 (PDT) From: Usama Arif To: Andrew Morton , david@redhat.com, linux-mm@kvack.org Cc: hannes@cmpxchg.org, shakeel.butt@linux.dev, riel@surriel.com, ziy@nvidia.com, laoar.shao@gmail.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH 1/6] prctl: introduce PR_THP_POLICY_DEFAULT_HUGE for the process Date: Thu, 15 May 2025 14:33:30 +0100 Message-ID: <20250515133519.2779639-2-usamaarif642@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250515133519.2779639-1-usamaarif642@gmail.com> References: <20250515133519.2779639-1-usamaarif642@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Stat-Signature: 6guz5kkj8gq7pxdsesei8hn7dqozgxxx X-Rspam-User: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: C257D14000C X-HE-Tag: 1747316124-839136 X-HE-Meta: U2FsdGVkX1/lQf+jAZ+AT+GIfadqtYDHa1QUrh+gFQYoSxmceb3Sv8tciPNeZqQrh07BGUnbHcGErpipA4LVz8dk+Abzh7ZNMRYi+GY0w/5mLK3qzFjZ8AJ2uiaw/5r9vMaU+4QBEKM0Q6/3WE7MbDzIx12tBHu6vUzY0af0jWTvonvCyuIBx5nUumyCtLI+BDu6MQ+OoZFua9kjRPSeAOyeryOGAVp8vcmkF1chFAvabroMWmC2QeI95SuxOE5i0g2d/4rY+H0kFUIOPfrPjkK3aMtXxhnqNO25BE7AlNqg1a3m2qRJPzKB8+drBa5AU4s17HuVy94iwZ5HVYVuo9CPxikR4v68PTosrMbYgUwcnCasO9NnZuWfSwYMjr8pEaBSu8AR53qFGZzhHgN88qsvJ2TpDvJZG0BGXFEz6kREPbaZ3jPijLYfiQrRsN+mdjNCVGvBGxjqJVimcUrr0ZhPdX52O05FxTNSGsXTbtd1547N67aYVZM2goHA25yt1k50AFkhD9gnxzPJ50zCHX2fsjJSmWCIGo7P6rNsiKz8APC2KOcVugq28f0rypOHyTP6GoTAi3R0EGHUe6SLNHakXCvSIpYYBFmp7uIwkJMFWG3sMUelwv7NCGhzYYTCUC0tmo49ZyiTAGxJeNBDG6sfYJ2cHVuroNm37LebLa5IWHKzUxc9vM0PxG1cnklYR1/dIsHefRPYYYyxHHCJIwBVq6UFS14dMFe4F9pGolzhkGwiIWWaNmAiME66x8h5EQ8TD/qLFnaJAENFVILmO2pou12bWjGMwbnYjf1cpJRtdolGPSd5WNEZYyh956x9utfPjzJdN9VO6KT2HJun7jZO9xfV8Wa5BoXLwmbMR+QZ7sAffMY2jKbRCQfUJbJbZQW7VVWHugF+9H9GO5JPJQl+aDIH7me7TcaVLxTGiejSB5+Z5Q5bpAIF7LciyenPCy5eOvjSw9EPLeH948z JaB8SiIq tJxB1LMHAcDsfw63r1pyB0mEx+Hrnrs3ZWXdMUARXA/QyxfhKtja+tBSe11QX0PexjiZ+sEi6p21D64TWQ0J33LILQWIhM1TpuUP1ENyihXwW03UcAPCWSDC1unRcOE+ZZslRaTMLtqz+VsRWzbLwAttlpm+2st9yMh52b6mUZ634qdTbUOLUZgbnxsYSvc6v/B6x0IY9mhaSwxXV7EurZlOrVplkDYFG6rQ4pNRIfloaAA4HVSPQ/ZkoPum00hJdZL2THqUO37/+AXbEsc+H6Sbg4v2PhJuqVcU1a6KGV3A3YYTEUHkL8TN4yh9McbD1vjfIRC6vleoqj8+ISJ7NYQZJ/XpgNaFhqqHDEJY1wQezdjBqaoxDw0ium+OdE0HJ8Ra23mFf9gsSCAxKH+rlcF9lGH162ihvzU5VBvopc9W3cBN/XCY9jpR14EphsqGCn2tQNtC5yqZu8cAgIFfGWZV0QedNo7i/LhkqL5JdkJ2CtZnNXeDud7GfvNgcHkTGev6pnEBXcjlJoQR8Zxn6YSzuoreJIrCew/WzBjxwphF1iw0EnwrWPsPmKPkTdrpWbc2WNLzW9ZgCASs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This is set via the new PR_SET_THP_POLICY prctl. This will set the MMF2_THP_VMA_DEFAULT_HUGE process flag which changes the default of new VMAs to be VM_HUGEPAGE. The call also modifies all existing VMAs that are not VM_NOHUGEPAGE to be VM_HUGEPAGE. The policy is inherited during fork+exec. This allows systems where the global policy is set to "madvise" to effectively have THPs always for the process. In an environment where different types of workloads are stacked on the same machine, this will allow workloads that benefit from always having hugepages to do so, without regressing those that don't. Signed-off-by: Usama Arif --- include/linux/huge_mm.h | 3 ++ include/linux/mm_types.h | 11 +++++++ include/uapi/linux/prctl.h | 4 +++ kernel/fork.c | 1 + kernel/sys.c | 21 ++++++++++++ mm/huge_memory.c | 32 +++++++++++++++++++ mm/vma.c | 2 ++ tools/include/uapi/linux/prctl.h | 4 +++ .../trace/beauty/include/uapi/linux/prctl.h | 4 +++ 9 files changed, 82 insertions(+) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 2f190c90192d..e652ad9ddbbd 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -260,6 +260,9 @@ static inline unsigned long thp_vma_suitable_orders(struct vm_area_struct *vma, return orders; } +void vma_set_thp_policy(struct vm_area_struct *vma); +void process_vmas_thp_default_huge(struct mm_struct *mm); + unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, unsigned long vm_flags, unsigned long tva_flags, diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index e76bade9ebb1..2fe93965e761 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1066,6 +1066,7 @@ struct mm_struct { mm_context_t context; unsigned long flags; /* Must use atomic bitops to access */ + unsigned long flags2; #ifdef CONFIG_AIO spinlock_t ioctx_lock; @@ -1744,6 +1745,11 @@ enum { MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK |\ MMF_VM_MERGE_ANY_MASK | MMF_TOPDOWN_MASK) +#define MMF2_THP_VMA_DEFAULT_HUGE 0 +#define MMF2_THP_VMA_DEFAULT_HUGE_MASK (1 << MMF2_THP_VMA_DEFAULT_HUGE) + +#define MMF2_INIT_MASK (MMF2_THP_VMA_DEFAULT_HUGE_MASK) + static inline unsigned long mmf_init_flags(unsigned long flags) { if (flags & (1UL << MMF_HAS_MDWE_NO_INHERIT)) @@ -1752,4 +1758,9 @@ static inline unsigned long mmf_init_flags(unsigned long flags) return flags & MMF_INIT_MASK; } +static inline unsigned long mmf2_init_flags(unsigned long flags) +{ + return flags & MMF2_INIT_MASK; +} + #endif /* _LINUX_MM_TYPES_H */ diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 15c18ef4eb11..325c72f40a93 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -364,4 +364,8 @@ struct prctl_mm_map { # define PR_TIMER_CREATE_RESTORE_IDS_ON 1 # define PR_TIMER_CREATE_RESTORE_IDS_GET 2 +#define PR_SET_THP_POLICY 78 +#define PR_GET_THP_POLICY 79 +#define PR_THP_POLICY_DEFAULT_HUGE 0 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/fork.c b/kernel/fork.c index 9e4616dacd82..6e5f4a8869dc 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1054,6 +1054,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, if (current->mm) { mm->flags = mmf_init_flags(current->mm->flags); + mm->flags2 = mmf2_init_flags(current->mm->flags2); mm->def_flags = current->mm->def_flags & VM_INIT_DEF_MASK; } else { mm->flags = default_dump_filter; diff --git a/kernel/sys.c b/kernel/sys.c index c434968e9f5d..1115f258f253 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2658,6 +2658,27 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, clear_bit(MMF_DISABLE_THP, &me->mm->flags); mmap_write_unlock(me->mm); break; + case PR_GET_THP_POLICY: + if (arg2 || arg3 || arg4 || arg5) + return -EINVAL; + if (!!test_bit(MMF2_THP_VMA_DEFAULT_HUGE, &me->mm->flags2)) + error = PR_THP_POLICY_DEFAULT_HUGE; + break; + case PR_SET_THP_POLICY: + if (arg3 || arg4 || arg5) + return -EINVAL; + if (mmap_write_lock_killable(me->mm)) + return -EINTR; + switch (arg2) { + case PR_THP_POLICY_DEFAULT_HUGE: + set_bit(MMF2_THP_VMA_DEFAULT_HUGE, &me->mm->flags2); + process_vmas_thp_default_huge(me->mm); + break; + default: + return -EINVAL; + } + mmap_write_unlock(me->mm); + break; case PR_MPX_ENABLE_MANAGEMENT: case PR_MPX_DISABLE_MANAGEMENT: /* No longer implemented: */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2780a12b25f0..64f66d5295e8 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -98,6 +98,38 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); } +void vma_set_thp_policy(struct vm_area_struct *vma) +{ + struct mm_struct *mm = vma->vm_mm; + + if (test_bit(MMF2_THP_VMA_DEFAULT_HUGE, &mm->flags2)) + vm_flags_set(vma, VM_HUGEPAGE); +} + +static void vmas_thp_default_huge(struct mm_struct *mm) +{ + struct vm_area_struct *vma; + unsigned long vm_flags; + + VMA_ITERATOR(vmi, mm, 0); + for_each_vma(vmi, vma) { + vm_flags = vma->vm_flags; + if (vm_flags & VM_NOHUGEPAGE) + continue; + vm_flags_set(vma, VM_HUGEPAGE); + } +} + +void process_vmas_thp_default_huge(struct mm_struct *mm) +{ + if (test_bit(MMF2_THP_VMA_DEFAULT_HUGE, &mm->flags2)) + return; + + set_bit(MMF2_THP_VMA_DEFAULT_HUGE, &mm->flags2); + vmas_thp_default_huge(mm); +} + + unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, unsigned long vm_flags, unsigned long tva_flags, diff --git a/mm/vma.c b/mm/vma.c index 1f2634b29568..101b19c96803 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -2476,6 +2476,7 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap) if (!vma_is_anonymous(vma)) khugepaged_enter_vma(vma, map->flags); ksm_add_vma(vma); + vma_set_thp_policy(vma); *vmap = vma; return 0; @@ -2705,6 +2706,7 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma, mm->map_count++; validate_mm(mm); ksm_add_vma(vma); + vma_set_thp_policy(vma); out: perf_event_mmap(vma); mm->total_vm += len >> PAGE_SHIFT; diff --git a/tools/include/uapi/linux/prctl.h b/tools/include/uapi/linux/prctl.h index 35791791a879..f5945ebfe3f2 100644 --- a/tools/include/uapi/linux/prctl.h +++ b/tools/include/uapi/linux/prctl.h @@ -328,4 +328,8 @@ struct prctl_mm_map { # define PR_PPC_DEXCR_CTRL_CLEAR_ONEXEC 0x10 /* Clear the aspect on exec */ # define PR_PPC_DEXCR_CTRL_MASK 0x1f +#define PR_SET_THP_POLICY 78 +#define PR_GET_THP_POLICY 79 +#define PR_THP_POLICY_DEFAULT_HUGE 0 + #endif /* _LINUX_PRCTL_H */ diff --git a/tools/perf/trace/beauty/include/uapi/linux/prctl.h b/tools/perf/trace/beauty/include/uapi/linux/prctl.h index 15c18ef4eb11..325c72f40a93 100644 --- a/tools/perf/trace/beauty/include/uapi/linux/prctl.h +++ b/tools/perf/trace/beauty/include/uapi/linux/prctl.h @@ -364,4 +364,8 @@ struct prctl_mm_map { # define PR_TIMER_CREATE_RESTORE_IDS_ON 1 # define PR_TIMER_CREATE_RESTORE_IDS_GET 2 +#define PR_SET_THP_POLICY 78 +#define PR_GET_THP_POLICY 79 +#define PR_THP_POLICY_DEFAULT_HUGE 0 + #endif /* _LINUX_PRCTL_H */ -- 2.47.1