From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 553B3C3DA6D for ; Mon, 19 May 2025 22:34:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A7B2B6B0096; Mon, 19 May 2025 18:33:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A2A9C6B0098; Mon, 19 May 2025 18:33:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9411B6B0099; Mon, 19 May 2025 18:33:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 706CE6B0096 for ; Mon, 19 May 2025 18:33:57 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id A3803121462 for ; Mon, 19 May 2025 22:33:56 +0000 (UTC) X-FDA: 83461111272.21.5641E9C Received: from mail-qv1-f45.google.com (mail-qv1-f45.google.com [209.85.219.45]) by imf05.hostedemail.com (Postfix) with ESMTP id D1229100011 for ; Mon, 19 May 2025 22:33:54 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jkTEligV; spf=pass (imf05.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.219.45 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747694034; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=LpRy7jJXH2X94qR+7Z08vVSeFahgcmDit4WvxDXTDr8=; b=FOwLPdSh4UeuVaBk3c2f3Oyn3cv+9n3Dxj0fdugHIs6FNimfMRtpeEypLOTLyCrZKlltKu 1jwItoAp9v+uOSd/DAdsdfQkyNAEHADFDfp+JrAfKPPFtFUBjzi1Z6ywPF5zKuouXFujHQ +qU3e6ZixEPa2u58dfTLP6IrF0d4tn4= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jkTEligV; spf=pass (imf05.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.219.45 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747694034; a=rsa-sha256; cv=none; b=0YYAHywLHMIQGhyuAfoeK54oSfdpQEedCX68RWQUnB7JV3A+CQRObeX2zbECjSHRKQE/g3 0c/UaATfmRFZjUbxGl1r75X6SyuaTF3Dd9GUVeAlC56VWAo+4o9jjH4gZ/ulX3IVY2aVxy Tx0Nxv+W00S07jG4iq9OVmkFAjHLmuY= Received: by mail-qv1-f45.google.com with SMTP id 6a1803df08f44-6f8d663fa22so23100096d6.0 for ; Mon, 19 May 2025 15:33:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1747694034; x=1748298834; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=LpRy7jJXH2X94qR+7Z08vVSeFahgcmDit4WvxDXTDr8=; b=jkTEligVRlqZI2L2HjbPb0q0h619MDmuZ3P+6ALiUyh6z/nY8+fXuO0MU79OcgvvfO c0JQzgX1UZAD6gMmCbO8YsFXGms41mReIiwPvS8xn8t22ISGUahDsuAVUvnvJIDhL97b +6qmdq67qDjPH2QxswnnxgCyzznsbuXQ25NeE4AHjmcQP4h6RlwDh68ttghvhfS5i2ug VbYjuAID3IkF1t54AXbX0pbho5/bJ8KVtoyUHpO4urZ5K+uTMoXhOMsOBOHm1d07SNVs nK4M/yUx85CZG1Ct9iLY16BOtnCk2BDkPA0cOiNn/AuPBaKa64wqgve6p8ekavHAOkjQ 205w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747694034; x=1748298834; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=LpRy7jJXH2X94qR+7Z08vVSeFahgcmDit4WvxDXTDr8=; b=ZwIja43HvhaVQDQ11S4NJTVzNlGHLKkEAnUqhbqoyX9wuqnLvjiP05HKWpGXgpUnz0 PbVkzYnZZdlv5O+7xG0CoBSWmi/GbH53FkCBx2hb1xPLQXRipxjzlBxn9n0ayEHtja16 N7d5FDUzWTb9AfHtoIPei23ZksGySH2eBDPOy3ugz9cBH5XK5VrjVQejMAjfjkLCv3fN nBep1OaeQIPhjUz4MgpygK8Os/MRq77NZFACrwYmqcLPbjLEJTwWgRvw0+NXqD608iN2 b0ylKQambXwjtvAog9W4PfFN3q0LYakqTSECYBvC8GYiabIPfpkZtwxeaTnkp0Ln2wUb 5yFQ== X-Forwarded-Encrypted: i=1; AJvYcCXXpComSdQgxg3C+2Z26ZTOhaHaqu5cArn72MWdiEASCDcFWTUiDpFeIwdRy60zJUfQy41odw84Hw==@kvack.org X-Gm-Message-State: AOJu0YzJJBWSjincXonmE8HBR0wHwKmSTC/5DZcMfQjUUTf+g+U0Vc4u ADD/JeI6PGwGLbMre28zZC0tqET6K+8eaOIXXPozt5vsfWBPlFWuQ+kNIOwtV4/0 X-Gm-Gg: ASbGnctaOLQi1+V1rGFR9G1SicltbeEDjf1IbyOc/OiXOLkroi8xzERvN9pr9Atl0kx XjyyehtktI4SYrDf1Y9RuapRtBIzOHcdP0wKhRZ/V0fiHHOVegBpa7poqy9NXbEDrK4b0fT3Bw3 kxDXrC40xpJUV1+2QLwQnNtJg3vmXaDproVWISAaacsmqJnlYds1u2OSfpK+M550B1G058bxJ4M 68/MITWS1mp6PFHLUbwesYYPAcKZRJTRvCIgqgC6kVsZaaXfFcrIC1/VorG1Pc2qOR/aiouvNmu uA36U+FNBwLtZ355ANmEkYIGqvaLhrBChQNbHeyZcQC5IQcb X-Google-Smtp-Source: AGHT+IG9Rf1Ms6o1Iff8MOEn683LCyyVRYFlfukNUf26S1dHPnjB4DocWIpWObG8j4FUWQJEzi8IUA== X-Received: by 2002:a05:622a:5805:b0:476:8db0:8cae with SMTP id d75a77b69052e-494b074cbf8mr167938981cf.10.1747694023468; Mon, 19 May 2025 15:33:43 -0700 (PDT) Received: from localhost ([2a03:2880:20ff:9::]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-494afbabee4sm59006981cf.55.2025.05.19.15.33.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 May 2025 15:33:41 -0700 (PDT) From: Usama Arif To: Andrew Morton , david@redhat.com, linux-mm@kvack.org Cc: hannes@cmpxchg.org, shakeel.butt@linux.dev, riel@surriel.com, ziy@nvidia.com, laoar.shao@gmail.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, vbabka@suse.cz, jannh@google.com, Arnd Bergmann , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH v3 0/7] prctl: introduce PR_SET/GET_THP_POLICY Date: Mon, 19 May 2025 23:29:52 +0100 Message-ID: <20250519223307.3601786-1-usamaarif642@gmail.com> X-Mailer: git-send-email 2.47.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: D1229100011 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 6iwqtus7tgd7w7ehdq6fxi5fcjboarq6 X-HE-Tag: 1747694034-758412 X-HE-Meta: U2FsdGVkX19eDZrXZicaJqAZ1+X9XB4D1ahkPQWn98zhGSKLoRi65ksW4GKMj/9BPDxljRAiyQAHK96H+RiR9HUbd8mjiFy2XZLAQDnei1RP9imCid2FX+m2fcLp01T53yGtX2FmPAtfAiEFhaDy5auDtZw8uAUgy7XfzOQclb/3uJzyYfUzFr4hrT0tUXnfEF9gCLWfFPywEQUjm0Wt7kDrlPf+1c1OWQ6D4Ly+t+i5gk81vYhbD4CUQQFrbv+q8qfqc7u51PG3+noLduD6lBoZB032ZibZaTmuVX4/kgpheNr4emJvG3uZxrBCzAOPDp6gt5uN+uTA8PIFtkMSguuaKI8sB5HHBOzcfwXcgIXQvC6shOfTXN8/Qi1zwc3sfgGWUyF4qZ8k7uAcfXDP1rc8Sg5eW9vADRyxBEiTzjmrtgYKdrsezS29jjFg5IiK7f8gs9dHxa1nHQRxzNQZUcuCfUGLtsJso+4XKRb1sY71pYD368HqI9Pod5Im8QuYF9ltVODQPJxGmITFOqNfeeJNhcnao43D6KBJKwRsJhhjww7q5kTSaL8akPyjp3fZUy8EiGI6O8QgdbZo9DO++ctWZxaeVMpy335LmRgW3t2XAtbngjvqXfipJRm5IpfzQ7QSiRgH5KvgZThaVFRHJG8IZUQtvo+eLeK/ApzfmFfbL3PYcDhGUWjJDB+tJA7kfFH1qjzuuVOqSaijhauZvSt47ypho+t93pqpQGxJaIxnB+Pl4MsiMWLpDLKCZBuvmx+RG0rjNBmSUHoFZIXAGjYaMLHaYX5nmDGs+BEpkVPDys/c8p/dEprHrpVGrNQogWjnq8+fnq/01wnLjHxnZXcvVzviUH09npvyil4h3pkB22iGYvnm6UcsOblvlvpu4o/C+eDtAa7bHLtIOR4r8BfmyhwIxEhvwagIj7AjP7Hu1jz53KeNNblGsQEqXW9wAig7HIrFwF6N6vJjmZB H+UNz3WN y2rU5DLH0Eqa8sAJwsOzUAPjFI/DsSZP3JfUDKg+sUMY2spfXCD12zgtlz5XpxETlXEL2pp+09XdH1WO3jdR7gGQXEV2EU7XIHe8Ot6sppzWTOvgjPkV3niaOwBCV7nQuzTUYf5/oIfUFL/PKOG8VCu7+70xVX2cM6TLi8Ael2aK5DXghI2e/8anEi/HPOH112arf/Ks16oEThuB/PhZugN4f2bTc5N1+kXHm9dTD3APT06IAX86mxjii+6sEWsuBtAeVisfGnQMl8UEhqMAY7DUZARqw8ZULOaQ4nBnsmJFNqJNVOCzAg3jc4+ZZ6csRto/lwQKkCUl6sGlH8wdCz+I4DjpDLLp5L9XiatlXX/TLfAOyt6hQC6wnwIqa/t+srqG6SP3Ip+D9u6ta/06OT6db8wDTCUeZEgzTH10AxDVXUIiiFIFr028FDI/XUMg86FGMDNQRxSSCuQ95CsUkTjPltARNFxS4UxCjyaFQ1gelLFhQJW0pO3auSLGFjoj8Z2tqnzuJtOJuLvbSt79XzcLHpEEDRVn9uzm3Xkr28FfyeuXih75IAex+DLxNPdvCR7Sm5cwqfgzAQ4TKEESQopDAuc89CxgF0lRe X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This series allows to change the THP policy of a process, according to the value set in arg2, all of which will be inherited during fork+exec: - PR_DEFAULT_MADV_HUGEPAGE: This will set VM_HUGEPAGE and clear VM_NOHUGEPAGE for the default VMA flags. It will also iterate through every VMA in the process and call hugepage_madvise on it, with MADV_HUGEPAGE policy. This effectively allows setting MADV_HUGEPAGE on the entire process. In an environment where different types of workloads are run on the same machine, this will allow workloads that benefit from always having hugepages to do so, without regressing those that don't. - PR_DEFAULT_MADV_NOHUGEPAGE: This will set VM_NOHUGEPAGE and clear VM_HUGEPAGE for the default VMA flags. It will also iterate through every VMA in the process and call hugepage_madvise on it, with MADV_NOHUGEPAGE policy. This effectively allows setting MADV_NOHUGEPAGE on the entire process. In an environment where different types of workloads are run on the same machine,this will allow workloads that benefit from having hugepages on an madvise basis only to do so, without regressing those that benefit from having hugepages always. - PR_THP_POLICY_SYSTEM: This will reset (clear) both VM_HUGEPAGE and VM_NOHUGEPAGE process for the default flags. In hyperscalers, we have a single THP policy for the entire fleet. We have different types of workloads (e.g. AI/compute/databases/etc) running on a single server. Some of these workloads will benefit from always getting THP at fault (or collapsed by khugepaged), some of them will benefit by only getting them at madvise. This series is useful for 2 usecases: 1) global system policy = madvise, while we want some workloads to get THPs at fault and by khugepaged :- some processes (e.g. AI workloads) benefits from getting THPs at fault (and collapsed by khugepaged). Other workloads like databases will incur regression (either a performance regression or they are completely memory bound and even a very slight increase in memory will cause them to OOM). So what these patches will do is allow setting prctl(PR_DEFAULT_MADV_HUGEPAGE) on the AI workloads, (This is how workloads are deployed in our (Meta's/Facebook) fleet at this moment). 2) global system policy = always, while we want some workloads to get THPs only on madvise basis :- Same reason as 1). What these patches will do is allow setting prctl(PR_DEFAULT_MADV_NOHUGEPAGE) on the database workloads. (We hope this is us (Meta) in the near future, if a majority of workloads show that they benefit from always, we flip the default host setting to "always" across the fleet and workloads that regress can opt-out and be "madvise". New services developed will then be tested with always by default. "always" is also the default defconfig option upstream, so I would imagine this is faced by others as well.) v2->v3: (Thanks Lorenzo for all the below feedback!) v2: https://lore.kernel.org/all/20250515133519.2779639-1-usamaarif642@gmail.com/ - no more flags2. - no more MMF2_... - renamed policy to PR_DEFAULT_MADV_(NO)HUGEPAGE - mmap_write_lock_killable acquired in PR_GET_THP_POLICY - mmap_write lock fixed in PR_SET_THP_POLICY - mmap assert check in process_default_madv_hugepage - check if hugepage_global_enabled is enabled in the call and account for s390 - set mm->def_flags VM_HUGEPAGE and VM_NOHUGEPAGE according to the policy in the way done by madvise(). I believe VM merge will not be broken in this way. - process_default_madv_hugepage function that does for_each_vma and calls hugepage_madvise. v1->v2: - change from modifying the THP decision making for the process, to modifying VMA flags only. This prevents further complicating the logic used to determine THP order (Thanks David!) - change from using a prctl per policy change to just using PR_SET_THP_POLICY and arg2 to set the policy. (Zi Yan) - Introduce PR_THP_POLICY_DEFAULT_NOHUGE and PR_THP_POLICY_DEFAULT_SYSTEM - Add selftests and documentation. Usama Arif (7): mm: khugepaged: extract vm flag setting outside of hugepage_madvise prctl: introduce PR_DEFAULT_MADV_HUGEPAGE for the process prctl: introduce PR_DEFAULT_MADV_NOHUGEPAGE for the process prctl: introduce PR_THP_POLICY_SYSTEM for the process selftests: prctl: introduce tests for PR_DEFAULT_MADV_NOHUGEPAGE selftests: prctl: introduce tests for PR_THP_POLICY_DEFAULT_HUGE docs: transhuge: document process level THP controls Documentation/admin-guide/mm/transhuge.rst | 42 +++ include/linux/huge_mm.h | 2 + include/linux/mm.h | 2 +- include/linux/mm_types.h | 4 +- include/uapi/linux/prctl.h | 6 + kernel/sys.c | 53 ++++ mm/huge_memory.c | 13 + mm/khugepaged.c | 26 +- tools/include/uapi/linux/prctl.h | 6 + .../trace/beauty/include/uapi/linux/prctl.h | 6 + tools/testing/selftests/prctl/Makefile | 2 +- tools/testing/selftests/prctl/thp_policy.c | 286 ++++++++++++++++++ 12 files changed, 436 insertions(+), 12 deletions(-) create mode 100644 tools/testing/selftests/prctl/thp_policy.c -- 2.47.1