From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70E04C2D0CD for ; Mon, 19 May 2025 22:33:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5B2C36B0085; Mon, 19 May 2025 18:33:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 564ED6B0088; Mon, 19 May 2025 18:33:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 47BD46B0089; Mon, 19 May 2025 18:33:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 21C546B0085 for ; Mon, 19 May 2025 18:33:50 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 6CF8CBEF84 for ; Mon, 19 May 2025 22:33:49 +0000 (UTC) X-FDA: 83461110978.24.83C8D64 Received: from mail-qk1-f171.google.com (mail-qk1-f171.google.com [209.85.222.171]) by imf27.hostedemail.com (Postfix) with ESMTP id 9B22640003 for ; Mon, 19 May 2025 22:33:47 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PWkPsgC8; spf=pass (imf27.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.222.171 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747694027; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GBzrH2GBnquUaHeSO3w8uSRV1aMylGMB1WN2N8Pvfik=; b=mTuqc4WCZMzDjLGxZ5D/E7MAoMelcsOtPHhJW5Kh7FQJNb0M6SZq97fqCopydjDtuEpK+P Wm1vHyTta3YMbd7pi1osShigEKL2K2+ahYrWWYV2CvzmGe4NfjQ5QuwzXTseO86WAD/JVu qVoPaJnohRAvwIKGlhw77f8czq9zsZk= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PWkPsgC8; spf=pass (imf27.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.222.171 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747694027; a=rsa-sha256; cv=none; b=7jr7Czkm/WBBPw21r1BlFCa2c2Os8n+0ohGKI9QF32wBPjA+WVdHeYHmSAwzcyemz8q/ew rewicwL7O5fvtH5GNlJAZmD0E9a8eW4D732aIGR70i+4W8D9CS3xDOULW6Dxjin1MO7fN/ F09p0NLZi93Vkba8n0ZhORBomyAaSPc= Received: by mail-qk1-f171.google.com with SMTP id af79cd13be357-7c5a88b34a6so532925985a.3 for ; Mon, 19 May 2025 15:33:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1747694026; x=1748298826; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GBzrH2GBnquUaHeSO3w8uSRV1aMylGMB1WN2N8Pvfik=; b=PWkPsgC8glh9KWBgL+GpbbUKZQrtqSFJTGwnPkkOs6MbYTinpV00eeDWoMsj1bCbsR mOLaAHn0rq+5/sV8VVQZHq2Zif17D5VNm7h0g3B/FI/d0aXdBgafnd3BzO13Oc7nhcqV FymzxEIvf/Eegm4cMDliWLt2M3g9OJUU4uP46yiGomA5NJ/f17Khxms3D+E+9vhgGjOi d2CEauqN96nZpjCju/KTy/K+Wjpb+sZIr0S8vatvW1QlxupEG9zNxq57CghRBp7vBGVc XCc9/2PPH+86gMMLSj6jj/wI87sl7n3/rs0wVYgTQ3rRzyOO2W55M2GWrz8OmyWVflBN xVYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747694026; x=1748298826; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GBzrH2GBnquUaHeSO3w8uSRV1aMylGMB1WN2N8Pvfik=; b=j1RKW/RjepPBOzQeJ93MIafhCv9FzBbns7YhWPNU7kjdCxbyE7xyMIRusptZEk4X9R /xZBQ5xQyGDINAlrYtOlvY17DUk9xR9aa6zPawO1TeWCgPTrrourPckGAui5V48hyvwK ZrN8y1bHhj16sh1O2vwt/JbJM3mNIQuyeulBqvMpLmGyg82vb+olG0N2x3ErvhdCUm9L aiI7wGrWCM1DpBu51Du80rGi5pt3S+BG783uPpF4N797HTC6QxY1FDuqzO9Le3BHgG21 AWHFArhn/xO7U5EzgqljjM5IYWXHLHpuJ5PmexS93La01+5nGRJH/DRirwjfYlBCIlbl q2jA== X-Forwarded-Encrypted: i=1; AJvYcCXZc7d0aWTaGzt5h3YzLX20Uevfauya+MJmiM4QAJwL2ir6KzDHlDTDh7sG5b1icoF8SevEsqCgmg==@kvack.org X-Gm-Message-State: AOJu0YyOU4BFXBuTxtr7gzsWgcpnvn3PWCB5K4lL8KUJNI4MdUzTttQV E2jPhAsLxcfIQ7hKfI+U8JOvfFFsAb93bN+65S4OM/l6l7k3xShAKX3t X-Gm-Gg: ASbGncsLz3MqjRZ05voHKxRVv+S7v88k3ZNWB3Clb4eJXfbJQvVNh27AOEtmHaWmCgG 20bWUD3y7YnoGPIICnR02sD9I5rYtWDSpXJcL0RI72w5PIQ3L1lManZ76CDnj8i8r/crmcKkMdX gqfdQ2AkW97s+fbwOLTvybXQsvMgRJpVjRGAqMvEmLdVXLbGRwnGLT+5WtveJBQv4JzyxL1tuzk RNQDdm1KceFqI2wzEEFafGeye7n1dLrNNREAqY0/Xnwvk8O0XqAeBPkVaF5I8BxC2/fxbnDuQ3k r4XRj8f2zZs+xZi9wfs/YMgh4i4Zt1pU8p70HsinehNfMjNcAg== X-Google-Smtp-Source: AGHT+IGoaL3+EMdpTKw62PROOK5NuGok7IoCsJPUMNfh1H9llowMCPFvMdWgbpfAanUPQ6/uxc12eQ== X-Received: by 2002:a05:6214:1c09:b0:6e4:3ddc:5d33 with SMTP id 6a1803df08f44-6f8b2cfe519mr234825206d6.13.1747694026467; Mon, 19 May 2025 15:33:46 -0700 (PDT) Received: from localhost ([2a03:2880:20ff:74::]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6f8b0883ed6sm62381186d6.18.2025.05.19.15.33.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 May 2025 15:33:45 -0700 (PDT) From: Usama Arif To: Andrew Morton , david@redhat.com, linux-mm@kvack.org Cc: hannes@cmpxchg.org, shakeel.butt@linux.dev, riel@surriel.com, ziy@nvidia.com, laoar.shao@gmail.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, vbabka@suse.cz, jannh@google.com, Arnd Bergmann , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH v3 2/7] prctl: introduce PR_DEFAULT_MADV_HUGEPAGE for the process Date: Mon, 19 May 2025 23:29:54 +0100 Message-ID: <20250519223307.3601786-3-usamaarif642@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250519223307.3601786-1-usamaarif642@gmail.com> References: <20250519223307.3601786-1-usamaarif642@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 9B22640003 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: gosn9j7howh54pad3op3t3efkj5u7a9k X-HE-Tag: 1747694027-364997 X-HE-Meta: U2FsdGVkX18cH28zNzg0aakV2PIznySc55/3MChf2b+rBH8MO6gruvOFopF2NolOQ2amqkRsP+lar41gUznneoyHoywd3xohEJIFJSgcWyJTJjhJyGZVROf9ZWNxJKw5+RvM6ZQQVmCTzORiGkIdmx5Le37eGVaaxr6tJ+9PrIdCzKxqkV36VBTyChx5Y5PXqmKRKMIAwO5nGb6QEgiE9xfJQgLsxbP+jt7WDDQxVrAVMyZ7ITubfO76N5s5T/EAKk9IpoMJBk7v7wC85T0M/w844rMa/kp2tJMhcuXStN9nhPgIkEhLS6a/57/tmNoonNM4Ixa/IcYR7aJyGPN0nrNqNSD9i+EQVjVGz9CEq7y7N7PF15/j5JZwvfYtpRF/hJsw7AX3DlnmCh44VJvOrG0EaivDBkHK8t0pStUX23QSqZQvbK427z6p/wsDJgtfJmTfRLf6jU2CoaGN/QRdU8HECufICQQpQ7eMiYqW50iPsQ+rDICIf3m+dddasUl/iQ+obl5ZISweU1gOW+KUcNmLA1gRR5UPY+KvYxpTteiJDVOmfezK2aReoZQ6h/ecWFWhHyGonrtC4UU0NE6hj2Xsox82svumcHGRL3l0I5ApmfiCONgjRTEqy5BoDEN9+uHWyprwL8WkRCH2n53t3ADHgU5gXP81JJZmr3CcHXIW5ayaluXypoUntkczybEUquTqlmWQeqsQY9YhzFPqOJe1tdz4AX0I36GshcPh1ib+vxhK5sbvKx8yVeTsoDHih+Tk6fcI+LhRXH9e0rfJXvvnzwkadjJRbu972LsYXePJTWW8DFbNCDRWLFJ/etUzErFd8BQ72Xfeh+JNSfOGM4vI7r8cF3XUiweUTndjr8RamE9K0SsUSn6CU9bc3Jt2gXv2+prRLJQ7/N4nvUWE0y/XKcFqaNX7PMxWECwAkczwmQS7icEYF+I4powqCWQ+tAhhSM8r/QP5wgn75V3 ozZqFAbm yxTwu9/g7vyxBVNhTn09ZUpakXZNeq8qnhmbMpyr2LewG99tqFAA8uGZZFoi9EmsjJc/oA7KpRpIv0b45P15xMhzR4YT0pp+8lJky6haAtf6ZajWSoxNJJtu8nuwOfTDmw/PUPEV3AthEzfk7mhlddM0FIYEhmB32V+ImEAvz66q6HRFkVv91XDjCZPRS/jxDKGTEV7kVGxtbWQdKnOmLY75O1bvGciIbOZx1Yjx/xa0pL6tgfeO4kjWOJVr0nxZafnECHU7Hu4DIMZIqc4+IXmGLyEiCznC6z3qcnIMcxAab2dY/TG/SVoDCIeVqQEETXHE0Xqn/hA9yuC/IfPfRKwW7xd6KZfrvJ18GGL2nNu6tvOZ27HaXY8jGjTWqAu0NvPMd1jmvPqNFu2dd/a5QNozKjzQoKNUpebb9CsJl4PUBNaZHiDzbEck+S1IUAEPGxYMAjKz2XKr6CDfpr9twTxglXFxcnQptQZ9bagl91LIcouj1R4+kd5p/msFl0s4OviafbzHHwN4hPZMdgnnnAW6od4d90EO+fwlcOHF0CSNvgJkkxlp7UZs3ppRt6a2Yc4E0+1BLenZ2Zrg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This is set via the new PR_SET_THP_POLICY prctl. It has 2 affects: - It sets VM_HUGEPAGE and clears VM_NOHUGEPAGE on the default VMA flags (def_flags). This means that every new VMA will be considered for hugepage. - Iterate through every VMA in the process and call hugepage_madvise on it, with MADV_HUGEPAGE policy. The policy is inherited during fork+exec. This effectively allows setting MADV_HUGEPAGE on the entire process. In an environment where different types of workloads are run on the same machine, this will allow workloads that benefit from always having hugepages to do so, without regressing those that don't. Signed-off-by: Usama Arif --- include/linux/huge_mm.h | 1 + include/linux/mm.h | 2 +- include/linux/mm_types.h | 4 ++- include/uapi/linux/prctl.h | 4 +++ kernel/sys.c | 29 +++++++++++++++++++ mm/huge_memory.c | 13 +++++++++ tools/include/uapi/linux/prctl.h | 4 +++ .../trace/beauty/include/uapi/linux/prctl.h | 4 +++ 8 files changed, 59 insertions(+), 2 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 23580a43787c..b24a2e0ae642 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -431,6 +431,7 @@ change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, __split_huge_pud(__vma, __pud, __address); \ } while (0) +void process_default_madv_hugepage(struct mm_struct *mm, int advice); int hugepage_set_vmflags(unsigned long *vm_flags, int advice); int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice); diff --git a/include/linux/mm.h b/include/linux/mm.h index 43748c8f3454..436f4588bce8 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -466,7 +466,7 @@ extern unsigned int kobjsize(const void *objp); #define VM_NO_KHUGEPAGED (VM_SPECIAL | VM_HUGETLB) /* This mask defines which mm->def_flags a process can inherit its parent */ -#define VM_INIT_DEF_MASK VM_NOHUGEPAGE +#define VM_INIT_DEF_MASK (VM_HUGEPAGE | VM_NOHUGEPAGE) /* This mask represents all the VMA flag bits used by mlock */ #define VM_LOCKED_MASK (VM_LOCKED | VM_LOCKONFAULT) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index e76bade9ebb1..f1836b7c5704 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1703,6 +1703,7 @@ enum { /* leave room for more dump flags */ #define MMF_VM_MERGEABLE 16 /* KSM may merge identical pages */ #define MMF_VM_HUGEPAGE 17 /* set when mm is available for khugepaged */ +#define MMF_VM_HUGEPAGE_MASK (1 << MMF_VM_HUGEPAGE) /* * This one-shot flag is dropped due to necessity of changing exe once again @@ -1742,7 +1743,8 @@ enum { #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\ MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK |\ - MMF_VM_MERGE_ANY_MASK | MMF_TOPDOWN_MASK) + MMF_VM_MERGE_ANY_MASK | MMF_TOPDOWN_MASK |\ + MMF_VM_HUGEPAGE_MASK) static inline unsigned long mmf_init_flags(unsigned long flags) { diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 15c18ef4eb11..15aaa4db5ff8 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -364,4 +364,8 @@ struct prctl_mm_map { # define PR_TIMER_CREATE_RESTORE_IDS_ON 1 # define PR_TIMER_CREATE_RESTORE_IDS_GET 2 +#define PR_SET_THP_POLICY 78 +#define PR_GET_THP_POLICY 79 +#define PR_DEFAULT_MADV_HUGEPAGE 0 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/sys.c b/kernel/sys.c index c434968e9f5d..74397ace62f3 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2474,6 +2474,7 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, unsigned long, arg4, unsigned long, arg5) { struct task_struct *me = current; + struct mm_struct *mm = me->mm; unsigned char comm[sizeof(me->comm)]; long error; @@ -2658,6 +2659,34 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, clear_bit(MMF_DISABLE_THP, &me->mm->flags); mmap_write_unlock(me->mm); break; + case PR_GET_THP_POLICY: + if (arg2 || arg3 || arg4 || arg5) + return -EINVAL; + if (mmap_write_lock_killable(mm)) + return -EINTR; + if (mm->def_flags & VM_HUGEPAGE) + error = PR_DEFAULT_MADV_HUGEPAGE; + mmap_write_unlock(mm); + break; + case PR_SET_THP_POLICY: + if (arg3 || arg4 || arg5) + return -EINVAL; + if (mmap_write_lock_killable(mm)) + return -EINTR; + switch (arg2) { + case PR_DEFAULT_MADV_HUGEPAGE: + if (!hugepage_global_enabled()) + error = -EPERM; + error = hugepage_set_vmflags(&mm->def_flags, MADV_HUGEPAGE); + if (!error) + process_default_madv_hugepage(mm, MADV_HUGEPAGE); + break; + default: + error = -EINVAL; + break; + } + mmap_write_unlock(mm); + break; case PR_MPX_ENABLE_MANAGEMENT: case PR_MPX_DISABLE_MANAGEMENT: /* No longer implemented: */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2780a12b25f0..72806fe772b5 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -98,6 +98,19 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); } +void process_default_madv_hugepage(struct mm_struct *mm, int advice) +{ + struct vm_area_struct *vma; + unsigned long vm_flags; + + mmap_assert_write_locked(mm); + VMA_ITERATOR(vmi, mm, 0); + for_each_vma(vmi, vma) { + vm_flags = vma->vm_flags; + hugepage_madvise(vma, &vm_flags, advice); + } +} + unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, unsigned long vm_flags, unsigned long tva_flags, diff --git a/tools/include/uapi/linux/prctl.h b/tools/include/uapi/linux/prctl.h index 35791791a879..f5945ebfe3f2 100644 --- a/tools/include/uapi/linux/prctl.h +++ b/tools/include/uapi/linux/prctl.h @@ -328,4 +328,8 @@ struct prctl_mm_map { # define PR_PPC_DEXCR_CTRL_CLEAR_ONEXEC 0x10 /* Clear the aspect on exec */ # define PR_PPC_DEXCR_CTRL_MASK 0x1f +#define PR_SET_THP_POLICY 78 +#define PR_GET_THP_POLICY 79 +#define PR_THP_POLICY_DEFAULT_HUGE 0 + #endif /* _LINUX_PRCTL_H */ diff --git a/tools/perf/trace/beauty/include/uapi/linux/prctl.h b/tools/perf/trace/beauty/include/uapi/linux/prctl.h index 15c18ef4eb11..325c72f40a93 100644 --- a/tools/perf/trace/beauty/include/uapi/linux/prctl.h +++ b/tools/perf/trace/beauty/include/uapi/linux/prctl.h @@ -364,4 +364,8 @@ struct prctl_mm_map { # define PR_TIMER_CREATE_RESTORE_IDS_ON 1 # define PR_TIMER_CREATE_RESTORE_IDS_GET 2 +#define PR_SET_THP_POLICY 78 +#define PR_GET_THP_POLICY 79 +#define PR_THP_POLICY_DEFAULT_HUGE 0 + #endif /* _LINUX_PRCTL_H */ -- 2.47.1