From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77D92C3ABBF for ; Wed, 7 May 2025 16:09:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7C5966B009A; Wed, 7 May 2025 12:09:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 74D606B009B; Wed, 7 May 2025 12:09:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5C71A6B009C; Wed, 7 May 2025 12:09:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 3A0766B009A for ; Wed, 7 May 2025 12:09:36 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 1EC261CD0CD for ; Wed, 7 May 2025 16:09:36 +0000 (UTC) X-FDA: 83416597152.17.14D1A10 Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com [209.85.218.44]) by imf17.hostedemail.com (Postfix) with ESMTP id 058924000E for ; Wed, 7 May 2025 16:09:33 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=foBNNwSN; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.218.44 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746634174; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fIjEVj0DEgX91fXHT09wKDzBRPLgzhUHI19sZdE1aQM=; b=TTopAhzcLnLvpI+xVlOuWqVco+/7baiKnDhPSzg9/dxDB/VEz9I/uHYzvmc9Datm4h8+2L 3NZFWxqg2kfSTsY0E645/N6jp38YgcbNtN2bI+GRVEWIrt0h+JiGSlSk3M6rRqPU5gYYPl ICb9afO0jy0yZPFjKMoCIGroC34MTQI= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=foBNNwSN; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.218.44 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746634174; a=rsa-sha256; cv=none; b=1bYj8C0+NLs/fCPYT+jt9aDskbU7HUnKiwZh08voiz8kqJlGKZ9+9G6oryvFS6gsv7Cnug cvCWicPhTXS77EjGkmVFd3sSCHfYAXDvdb6JX3SpoVBnFkMfbBSr7UT6K/aS8FZBtHs9ax 1ho6Q3Hyq9wPJljMIvH+XNPNa81NFGw= Received: by mail-ej1-f44.google.com with SMTP id a640c23a62f3a-acb2faa9f55so943855066b.3 for ; Wed, 07 May 2025 09:09:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1746634172; x=1747238972; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=fIjEVj0DEgX91fXHT09wKDzBRPLgzhUHI19sZdE1aQM=; b=foBNNwSNhGKNR89MORe4rxdIpGRqbZP8iCawUD7C+IrjGlBx9SRKh91ocW1Inqondp SbxjZlct38cjkrwEYPTDFHb0PwTDa7z0XxEGj15sgTb89kwAcGbQFs9MoMinZR8MUQFT ffu3aU73k9DZY0DgEF2XQY4+gDS1SYrubdFPvWsoKidtrVai3ynuzNsgYoiSW8bKZ1i2 3Mt5Vtfzn+Rn0q0QHOp7mhS6KE6kHFgQBnW7qySfTJWVEbVT4F9uP18+x6LTjgp62wRN R5dCJRNRxLkrwk89AqNeiIh1iSYliiaALpy0b/KY3ePKwtyD/536A3tS2CF36ftduM9x gNlg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746634172; x=1747238972; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fIjEVj0DEgX91fXHT09wKDzBRPLgzhUHI19sZdE1aQM=; b=pWDdILBIwW+f0SAamuhuYoTdpqNFOzrq1TEfQCgppR3g+8CUn88WUpQycgW3zHeBZS a2B/3Uy/R81T7nsVtDhaDRBxNXYyoZcPAzPpLciP8ZzzmCrkcVru9gKJZfg3aabxzRUS ZqIO8SAeCzSW+DsidRXpXKBdSbs7JpC57UT8w/Ti1A1RuSxffV/j1KVQWiJTheTb/nxq YOXNht15+PTCKjZ3Nh7CgnTjJHMNSVna3SWKNKa7Qv/ndogDq+R4EazCsMgQP7yv0tEZ pJFB4YcduCts/DcQy4MCA7SN2M7mvxxp1k4V/MA9JnR03SwsGTiQajweu4s0cL1c/24l e02w== X-Forwarded-Encrypted: i=1; AJvYcCVWlte/N0Ev/pkC9KZoyx4GooRk2BxwVXL3MrDl+uzbNssxKxo0aqgnQDxqtRGQWScsihayLcMpAA==@kvack.org X-Gm-Message-State: AOJu0Yyrdw2mhwEuRNRgcDaef4sF8XlTUkRM9YMR1IIG8oM2rzYp78dT 3DrCoge+xqf9GxUhvPp7i2/Mo3+TrXmk1kryFzWIVWqdu9hXTRxs X-Gm-Gg: ASbGncvsi2PZZW9SVBhScVTG7LFUt4AY77KLWl95kQZgSt+LIg8IugIA0cUCKR7CcD4 DcmyQ3ZNtyw/viOOQYTJk/p5B4gdJyfKIBd3m4l5W7vLE+S6r1L0WzIrQgDwNZvrIEvY0kgx92S be+0FNKnTOcXEQHrn8CnR1hl3oJARPknnDRDLjmxDAxVEh5FLz/NFZcmJnN4giKzhur9Ri47Pd+ BRTOZFICXgIkXbco84vPIFOocc7+pTkKF32u4KtAQ04zsi/sdKt27qACxwHCSIzQj84tkg2Y+Cx 3IKOGcfmZxhdjk43nMXKWJWXSapwwqye7TXGGB0Q7yNHFNNLUWDBNpkyTB+xSaFKsXlMUJlxsMK asfvB2LoqqVNDkA== X-Google-Smtp-Source: AGHT+IGIMCIP/zJlWIR0g6snp1hXwheU1vJJVVQPLxWqkovc2gx1xdJ/MjalJ0Y2FNbr/emoAUdqqg== X-Received: by 2002:a17:907:a641:b0:ace:c5de:24cc with SMTP id a640c23a62f3a-ad1e8d9c678mr353455666b.60.1746634171909; Wed, 07 May 2025 09:09:31 -0700 (PDT) Received: from ?IPV6:2a03:83e0:1126:4:14f7:eab6:23d5:4cab? ([2620:10d:c092:500::7:6396]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-ad1891f3c6bsm922430666b.82.2025.05.07.09.09.31 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 07 May 2025 09:09:31 -0700 (PDT) Message-ID: <279d29ad-cbd6-4a0e-b904-0a19326334d1@gmail.com> Date: Wed, 7 May 2025 17:09:30 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 0/1] prctl: allow overriding system THP policy to always To: Zi Yan Cc: Andrew Morton , david@redhat.com, linux-mm@kvack.org, hannes@cmpxchg.org, shakeel.butt@linux.dev, riel@surriel.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, linux-kernel@vger.kernel.org, kernel-team@meta.com, Yafang Shao References: <20250507141132.2773275-1-usamaarif642@gmail.com> <293530AA-1AB7-4FA0-AF40-3A8464DC0198@nvidia.com> <96eccc48-b632-40b7-9797-1b0780ea59cd@gmail.com> <8E3EC5A4-4387-4839-926F-3655188C20F4@nvidia.com> Content-Language: en-US From: Usama Arif In-Reply-To: <8E3EC5A4-4387-4839-926F-3655188C20F4@nvidia.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: is89nbqd4wg8pkk8pce17uog1bpnokb6 X-Rspamd-Queue-Id: 058924000E X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1746634173-388522 X-HE-Meta: U2FsdGVkX1/Zi3U8pPwhDU/+JP4Tn7YcC+x7KqpU3VPufzli4k8ozFJ9gHCXh0L3Yf5IkFHeUOyT8D+7KrtiuKCbNxyeC8zvSHQSdyqmIli+f0T7Xv+WWnWALk0QBv4zF8BTlzIXQ1V0c9NzPiJfwkofX9JZsjrYNUhSARHmOnvRdCIt8fJjnP3QrmfxdG+Ti9U2gmd2Dn0RkDWjtUbvqMZGI67le792fV/+pMQpUgc3/YHoUDsfhXrbp/8VKOotMLQgtYXiOVCVAAv7QnnXLQhfYrM/jxO6ohnrJ0CCVn19uAVVh+7dKQ/8e3SopKJoB5UJi3XzB/zBTO1NqkUBqCUdfgD6qkDRAs+T/ih4D/w8gNtYL0Syti5DO5PqWAJ3sf3iYAU6USi0CLFK5tUSOJb0F10Il4subabiCKE+DqnLZHNf/Pz78Hw0o0kGfZXUmqfZA8cm8OWpqgB8bd1DBLvJOctBc2zNpDyrtimyihbAfRW5EVawABRDiQpgBZp7yIaJm3Aei25oe8pcH2ar5iTwn5Jf5NqUSHX9g40q+s6CFlW0QU03efI8qY2a3gCLh2KIU5qCBSuQ0E9dGyQVIpuLuCjaiVS03/XbmNpqzkxHJ+G9VKP/FunJZwgqSo2Ynzj7WM1ebA8F6647UbaE+HhElv8P2RbuW5BK/AWPVthe9SEyLjWK8J4DDH4PwT1OPP0V5/tPXqe9DTNQaRbJ5lp7QgFHrSAUVBfYZUf1D8T5o3Koyu4lO5P38OpB+2iKBQelVHkNLJe20f75upsiSYuUEelONcyLEA2NEduvr/cy0E9mHDMpfdmC9WY+kqOwvFGp68KOHQkNUiypkZAg6RvDXMFu77Q9eHta+bMgINYQMhvJ2Q11dqIdPVfndoxAFdD9JDUnCqAqJBc1f7Kkit8kzWX4X/Cs/n7wOpEX32BL5a1HTX6EKjkVOWtEN4EvQypKcVE62zIQ0xM6Ebz v7iRAMjn oMXQLgcZkIQART4FHQEJtZNUj4OvhzTke8Iq7E1JcSLE3pgFbnSODw6WY2EhXdiDESKV/hAz+XNE9hdS3nfIl+DFHQyDeusO/Bqr8eZNoiWVCSxQ21C60V/8Q0bjNUaQqWgCR+kU59xwZyf5Fskv6JYkWZFW6JuxyoTvfUbFzsNbW9/z3uirNChxOug1hHvFH9YAHssMoONZQ7Ej+LqaxiKWYlWMFHDtQfC4ue8GV9m8Knd3tg7x1E+He73D9A+5JQj5/+SmZW7BmcEdsF+sOMkY1KYA+Fw4z4mRONoFGFnLeGnTpXHVSUOsvM1I07u9HBeJkfAkyiIPhlz9Df+L9wItCOvK1ycslO9Lg+TXA5GJCkXvpqlOWBj+t3eT8JUjFCbvdlfg7SVyXKgyXLgpF4xsHtC4CBzcE+dZZAfGFQ+FtFccgDG3G1E3ao8cZPp9OAkfIIgtWSZejPDoVY4ITgI77nMjpHGz+RMvb/GYE0i/yqc/Qhgck50XBqcwPwgd5lpxMFM9yXdKQqp5kCPNivZdVtXBe9GPctagx X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 07/05/2025 16:57, Zi Yan wrote: > On 7 May 2025, at 11:12, Usama Arif wrote: > >> On 07/05/2025 15:57, Zi Yan wrote: >>> +Yafang, who is also looking at changing THP config at cgroup/container level. >>> >>> On 7 May 2025, at 10:00, Usama Arif wrote: >>> >>>> Allowing override of global THP policy per process allows workloads >>>> that have shown to benefit from hugepages to do so, without regressing >>>> workloads that wouldn't benefit. This will allow such types of >>>> workloads to be run/stacked on the same machine. >>>> >>>> It also helps in rolling out hugepages in hyperscaler configurations >>>> for workloads that benefit from them, where a single THP policy is >>>> likely to be used across the entire fleet, and prctl will help override it. >>>> >>>> An advantage of doing it via prctl vs creating a cgroup specific >>>> option (like /sys/fs/cgroup/test/memory.transparent_hugepage.enabled) is >>>> that this will work even when there are no cgroups present, and my >>>> understanding is there is a strong preference of cgroups controls being >>>> hierarchical which usually means them having a numerical value. >>> >>> Hi Usama, >>> >>> Do you mind giving an example on how to change THP policy for a set of >>> processes running in a container (under a cgroup)? >> >> Hi Zi, >> >> In our case, we create the processes in the cgroup via systemd. The way we will enable THP=always >> for processes in a cgroup is in the same way we enable KSM for the cgroup. >> The change in systemd would be very similar to the line in [1], where we would set prctl PR_SET_THP_ALWAYS >> in exec-invoke. >> This is at the start of the process, but you would already know at the start of the process >> whether you want THP=always for it or not. >> >> [1] https://github.com/systemd/systemd/blob/2e72d3efafa88c1cb4d9b28dd4ade7c6ab7be29a/src/core/exec-invoke.c#L5045 > > You also need to add a new systemd.directives, e.g., MemoryTHP, to > pass the THP enablement or disablement info from a systemd config file. > And if you find those processes do not benefit from using THPs, > you can just change the new "MemoryTHP" config and restart the processes. > > Am I getting it? Thanks. > Yes, thats right. They would exactly the same as what we (Meta) do for KSM. So have MemoryTHP similar to MemroryKSM [1] and if MemoryTHP is set, the ExecContext->memory_thp would be set similar to memory_ksm [2], and when that is set, the prctl will be called at exec_invoke of the process [3]. The systemd changes should be quite simple to do. [1] https://github.com/systemd/systemd/blob/2e72d3efafa88c1cb4d9b28dd4ade7c6ab7be29a/man/systemd.exec.xml#L1978 [2] https://github.com/systemd/systemd/blob/2e72d3efafa88c1cb4d9b28dd4ade7c6ab7be29a/src/core/dbus-execute.c#L2151 [3] https://github.com/systemd/systemd/blob/2e72d3efafa88c1cb4d9b28dd4ade7c6ab7be29a/src/core/exec-invoke.c#L5045 >>> >>> Yafang mentioned that the prctl approach would require restarting all running >>> services[1] and other inflexiblities, so he proposed to use BPF to change THP >>> policy[2]. I wonder if Yafang's issues also apply to your case and if you >>> have a solution to them. >>> >>> Thanks. >>> >>> [1] https://lore.kernel.org/linux-mm/CALOAHbCXMi2GaZdHJaNLXxGsJf-hkDTrztsQiceaBcJ8d8p3cA@mail.gmail.com/ >>> [2] https://lore.kernel.org/linux-mm/20250429024139.34365-1-laoar.shao@gmail.com/ >>>> >>>> >>>> The output and code of test program is below: >>>> >>>> [root@vm4 vmuser]# echo madvise > /sys/kernel/mm/transparent_hugepage/enabled >>>> [root@vm4 vmuser]# echo inherit > /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled >>>> [root@vm4 vmuser]# ./a.out >>>> Default THP setting: >>>> THP is not set to 'always'. >>>> PR_SET_THP_ALWAYS = 1 >>>> THP is set to 'always'. >>>> PR_SET_THP_ALWAYS = 0 >>>> THP is not set to 'always'. >>>> >>>> >>>> #include >>>> #include >>>> #include >>>> #include >>>> #include >>>> #include >>>> >>>> #define PR_SET_THP_ALWAYS 78 >>>> #define SIZE 12 * (2 * 1024 * 1024) // 24 MB >>>> >>>> void check_smaps(void) { >>>> FILE *file = fopen("/proc/self/smaps", "r"); >>>> if (!file) { >>>> perror("fopen"); >>>> return; >>>> } >>>> >>>> char line[256]; >>>> int is_hugepage = 0; >>>> while (fgets(line, sizeof(line), file)) { >>>> // if (strstr(line, "AnonHugePages:")) >>>> // printf("%s\n", line); >>>> if (strstr(line, "AnonHugePages:") && strstr(line, "24576 kB")) >>>> { >>>> // printf("%s\n", line); >>>> is_hugepage = 1; >>>> break; >>>> } >>>> } >>>> fclose(file); >>>> if (is_hugepage) { >>>> printf("THP is set to 'always'.\n"); >>>> } else { >>>> printf("THP is not set to 'always'.\n"); >>>> } >>>> } >>>> >>>> void test_mmap_thp(void) { >>>> char *buffer = (char *)mmap(NULL, SIZE, PROT_READ | PROT_WRITE, >>>> MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); >>>> if (buffer == MAP_FAILED) { >>>> perror("mmap"); >>>> return; >>>> } >>>> // Touch the memory to ensure it's allocated >>>> memset(buffer, 0, SIZE); >>>> check_smaps(); >>>> munmap(buffer, SIZE); >>>> } >>>> >>>> int main() { >>>> printf("Default THP setting: \n"); >>>> test_mmap_thp(); >>>> printf("PR_SET_THP_ALWAYS = 1 \n"); >>>> prctl(PR_SET_THP_ALWAYS, 1, NULL, NULL, NULL); >>>> test_mmap_thp(); >>>> printf("PR_SET_THP_ALWAYS = 0 \n"); >>>> prctl(PR_SET_THP_ALWAYS, 0, NULL, NULL, NULL); >>>> test_mmap_thp(); >>>> >>>> return 0; >>>> } >>>> >>>> >>>> Usama Arif (1): >>>> prctl: allow overriding system THP policy to always per process >>>> >>>> include/linux/huge_mm.h | 3 ++- >>>> include/linux/mm_types.h | 7 ++----- >>>> include/uapi/linux/prctl.h | 3 +++ >>>> kernel/sys.c | 16 ++++++++++++++++ >>>> tools/include/uapi/linux/prctl.h | 3 +++ >>>> .../perf/trace/beauty/include/uapi/linux/prctl.h | 3 +++ >>>> 6 files changed, 29 insertions(+), 6 deletions(-) >>>> >>>> -- >>>> 2.47.1 >>> >>> >>> -- >>> Best Regards, >>> Yan, Zi > > > -- > Best Regards, > Yan, Zi