From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B01E1C369DC for ; Wed, 30 Apr 2025 02:34:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 869446B00DB; Tue, 29 Apr 2025 22:34:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7F0BF6B00DE; Tue, 29 Apr 2025 22:34:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 669056B00DF; Tue, 29 Apr 2025 22:34:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 416936B00DB for ; Tue, 29 Apr 2025 22:34:11 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 0669ABDF0C for ; Wed, 30 Apr 2025 02:34:12 +0000 (UTC) X-FDA: 83389140744.01.0B63B1B Received: from mail-qv1-f44.google.com (mail-qv1-f44.google.com [209.85.219.44]) by imf04.hostedemail.com (Postfix) with ESMTP id 28D394000E for ; Wed, 30 Apr 2025 02:34:09 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=iNDobwX9; spf=pass (imf04.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.44 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745980450; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+oESaPB1iKmIZBkzNoUzvC4+0JJSrJDGSFz7cfCF+wY=; b=Dn7SoTTv1fHlQaDIR0mvJVFJcKIh/66epiH6/s6qSU4auMiAn3p8Bx6FkqbqaY6OxMK9Wn VLHR1l0FDrnvQALicgrntKIxwmY4TyWDvGw7oavvS1j/U5sukFsN0Sw3UOQZOHyPcn5Qsh sij4UW3WZC6gwky6WriGsaZ+YPJsFNY= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=iNDobwX9; spf=pass (imf04.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.44 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745980450; a=rsa-sha256; cv=none; b=CC14JE/B8YT/3F/FHi6Q7CqUHPKzcC73GpQSfVh23gUFy9syHAOLZ+FoBJYkNw2JPzTcAL ai1av971V/uUJ+D152Jjp0zy9L1/zogqh21SGErXiHmhNDcD60yioSVRDOUCBrgtJtDmM4 TTfqxqRshz0d7WczmQobYmpeB9szTOk= Received: by mail-qv1-f44.google.com with SMTP id 6a1803df08f44-6e8f06e13a4so6406316d6.0 for ; Tue, 29 Apr 2025 19:34:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1745980449; x=1746585249; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=+oESaPB1iKmIZBkzNoUzvC4+0JJSrJDGSFz7cfCF+wY=; b=iNDobwX9ef3ZJPvKDfii/VI/2G+X0wkvqhUIH+GcEfKS9rBRlFGXI/CLleg662374S Zlq7bVXHQw/devrX26D1QGuJcqSU1eNdiHvp0aOI5QI7BB/3VOH8le3BiR3t2HcMPtQE n+AIVMD+MnwA/eNCkFIRo/Sm8BIP5xMFepBNgjonXGp079eBoMneguXn8GsYjkfKCCjG hAsb0FRiaZIylc9d8ztKD54mVzQLRBNJholkcggg47YrEQ6/Hbpg+gBtFgetjWHyYw0w eLu0PeSVUhUzQps91xQEzVOeuT4pqP0PFakou6gMNKYQ53nycHeYMj5rSfcfUCBomjn0 33xQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745980449; x=1746585249; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+oESaPB1iKmIZBkzNoUzvC4+0JJSrJDGSFz7cfCF+wY=; b=E1C66doaWGhnZ/WTVIQeiPIu14F1u8u88VHEl32eZhvDVwMgiTnpv/08inbN9NXa2l tLN1A9Zo2QYl2ZYFboqZUHZWUPf0pnXr9VGatwS4dF3xs0D5dbkr2Pjw683VMIs+OzkH tk12PzMrJoEJ4nbGrFaUMNguXPaTawDrbCAttAaaEbo+l0vmy8qQ69hOGLMBmGVUOzHn aKqRxed+VaAT5clGMZpz1n00sgRhJgOHJnQ9h9bEImdpdQ7aeLJf7yfiQlGJoYNn16FT dF1ENeYIiCRuj/RrpwdZ/MokHbwU0ESPmDZ116sATQ69z6Yl6k1wQIItJmNsLF/RX4pm xkag== X-Forwarded-Encrypted: i=1; AJvYcCXLklBwhJl3BADhyXCTCL7fTSWgsZzitvQoh2jMU59q/phSdTZBuiXx2yOM1Bc76bHQCJ3nXybJJw==@kvack.org X-Gm-Message-State: AOJu0YzCOUyiWqG0WJoyb9Ewiwmp6FO9Jj5DlSreAGa30mXdkzLecCUL mtdCRS9jyHIgj90OqhTGiZHnP1DM1SnjeD9qDpXjQ1EXoQKM3CA6z/y4NlZbTqppd7+fvSHcrk3 5VkGHFHFRIEywB9E1N4OFYEml6H0= X-Gm-Gg: ASbGncspqazierHpky59KzybXAnKWpi1ypH9SXO7RP56eb3IUXhzMiuwKVyNNwHUZV0 Sltq40Qoxa3TSGKts04ETe9g4HTFPJYmBSSu9SQfLyPfJJlL80dBDYYkyZxSpYisGqoXpE8Osf8 gi73lsFNWuxnynBQoxPdEWMq0= X-Google-Smtp-Source: AGHT+IHy22YRv03Bt3DTEhNvKhmb6JC+MQpuO6oqWaXg+77bVRpudf1qkFVeaWYeu54v3xHMrlLhLpYZhsKHnJE+QRw= X-Received: by 2002:ad4:5ce9:0:b0:6e6:5b8e:7604 with SMTP id 6a1803df08f44-6f4fdcb4565mr21749336d6.12.1745980449171; Tue, 29 Apr 2025 19:34:09 -0700 (PDT) MIME-Version: 1.0 References: <20250429024139.34365-1-laoar.shao@gmail.com> In-Reply-To: From: Yafang Shao Date: Wed, 30 Apr 2025 10:33:33 +0800 X-Gm-Features: ATxdqUGkhBWaatImKKdlBK3KsfC5xQ-S9XPUbbmk0fRceBoFqvDv4Stbee33cWo Message-ID: Subject: Re: [RFC PATCH 0/4] mm, bpf: BPF based THP adjustment To: Zi Yan Cc: akpm@linux-foundation.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, David Hildenbrand , Baolin Wang , Lorenzo Stoakes , "Liam R. Howlett" , Nico Pache , Ryan Roberts , Dev Jain , bpf@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 3m6ypx3j5zp5w15pjni7ejo9nnx1t8w1 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 28D394000E X-Rspam-User: X-HE-Tag: 1745980449-875643 X-HE-Meta: U2FsdGVkX1/ndZE6TEXjJERoRFxjuPrxEuyO4mrrp2qeqcm0gQcxqyiQUZarQgl6QxYcF5B2xbqDBFow/EOewBoZ9DViEV7wBs/jIk15OX5AJ+X1oNn1Zr76NMnejO6HScMOjrJym7Ew8A6TrqNIw47OUA3u7fwOxtwNxf8dQ11IOfx7JJSwIirZUBddArxW+QMg8/iH74E4RFBboqpG+XhPFJcON0epqSZnMbM4TLBuMVt9Rvp6IE0Bd8Y+KrCokal55Y0OnRXXzHVYNVUr/BS3Pse1wIKrruicsE9zd9grcLgGbnENykBOUyMXciTFv6jlpkAJCW6++4iF78AIgYjBiovc0/KshSNEbehFbXCUgP974vjezBEp9SnOCGHrL3/ilAUcb9JDQPM3q0RPb1PZn0wgCnTJAbm0Vg5nEml+Pr+L6Di5gHoW0CDFXyqumshKpTWm+uNwwpx3dJiPXQSZim7N7LPR3eYh3Cyp6iL7fQbCZS5BiyD6JdJF478f5jSL96+0iIg/fZ1RlEzmshU08Pv5tOdyKJT/dEvdJne1KtV/YFiIpskzL5ZRyVff1uSQpobhWE9QrNomxiYBFbS0Ec8vTutB5eKnXUPr1OwkGl0jCAGoy2cH21FeOpnrfNq4tA5kGkzFW+8VENw32i3T4qyJXM3nWHyFyruIOHo0RvTVmA8oMAfIwVSSKcexg3kBZ3inORlAY3yGhRfaIz5gDoADG6OgtNrEtDGDDv9s7CoEIIyS3sYjjf+/Uddu+2s3bl+NzHLPikUyncS/MHNL2iKV1MUB15zFkSfOiu4T0YpoXgllDOd35g5ipgJ9t66vcJGp2RqOAH1UxBNkfgqfpfAj4fjp72YTE0qQlnUbQX0PHtLS1hQHWyccwxaiTY+FArxB3eCMicGqs9SgxVzpbTUD2vkqPJSyEIJk2DH4r8B9E9PMreUAbJbVtNzbHwqQ/W0bidpcPuI15tb g3sR/Xyz TdN8KI3ENLnRcMVt6GJS8PE2qxG29udl/RHTiy7Nf+ZWPFlCpv+c8njsWtww3EUVitzA2/8cFwQT3PYuEnlWlHThXNghdh68CqneoOKC6z/mQVyYmvW2o/OPrnAHwaIgAwlQ9geitqUs1Y+vZz9WT+qx+PsqUy4Bd4SglHP1XpYST88mSEnmjIirOVt8x1toQY42lFQUovxZ/BteBiFVGsdNLa3M7/R6V+SbK1bQILTiqHQV8aX/NA8f/b6t9fsvpvX707fprGsYMtUUl0VeCrQTw1rHfg/JYsjasyetEO8AxiNZUjy8A/LxtlsCgHp5gmMv+AXSEhGmsuJpYCucmUgA1CVzM+GJpYNJiIx8XL1aiQYiw2hpYYzvYBrWSRQGYBirpXKIMbUV76X8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Apr 29, 2025 at 11:09=E2=80=AFPM Zi Yan wrote: > > Hi Yafang, > > We recently added a new THP entry in MAINTAINERS file[1], do you mind cci= ng > people there in your next version? (I added them here) > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/tree/MAIN= TAINERS?h=3Dmm-everything#n15589 Thanks for your reminder. I will add the maintainers and reviewers in the next version. > > On Mon Apr 28, 2025 at 10:41 PM EDT, Yafang Shao wrote: > > In our container environment, we aim to enable THP selectively=E2=80=94= allowing > > specific services to use it while restricting others. This approach is > > driven by the following considerations: > > > > 1. Memory Fragmentation > > THP can lead to increased memory fragmentation, so we want to limit = its > > use across services. > > 2. Performance Impact > > Some services see no benefit from THP, making its usage unnecessary. > > 3. Performance Gains > > Certain workloads, such as machine learning services, experience > > significant performance improvements with THP, so we enable it for t= hem > > specifically. > > > > Since multiple services run on a single host in a containerized environ= ment, > > enabling THP globally is not ideal. Previously, we set THP to madvise, > > allowing selected services to opt in via MADV_HUGEPAGE. However, this > > approach had limitation: > > > > - Some services inadvertently used madvise(MADV_HUGEPAGE) through > > third-party libraries, bypassing our restrictions. > > Basically, you want more precise control of THP enablement and the > ability of overriding madvise() from userspace. > > In terms of overriding madvise(), do you have any concrete example of > these third-party libraries? madvise() users are supposed to know what > they are doing, so I wonder why they are causing trouble in your > environment. To my knowledge, jemalloc [0] supports THP. Applications using jemalloc typically rely on its default configurations rather than explicitly enabling or disabling THP. If the system is configured with THP=3Dmadvise, these applications may automatically leverage THP where appropriate [0]. https://github.com/jemalloc/jemalloc > > > > > To address this issue, we initially hooked the __x64_sys_madvise() sysc= all, > > which is error-injectable, to blacklist unwanted services. While this > > worked, it was error-prone and ineffective for services needing always = mode, > > as modifying their code to use madvise was impractical. > > > > To achieve finer-grained control, we introduced an fmod_ret-based solut= ion. > > Now, we dynamically adjust THP settings per service by hooking > > hugepage_global_{enabled,always}() via BPF. This allows us to set THP t= o > > enable or disable on a per-service basis without global impact. > > hugepage_global_*() are whole system knobs. How did you use it to > achieve per-service control? In terms of per-service, does it mean > you need per-memcg group (I assume each service has its own memcg) THP > configuration? With this new BPF hook, we can manage THP behavior either per-service or per-memory. In our use case, we=E2=80=99ve chosen memcg-based control for finer-grained management. Below is a simplified example of our implementation: struct{ __uint(type, BPF_MAP_TYPE_HASH); __uint(max_entries, 4096); /* usually there won't too many cgroups */ __type(key, u64); __type(value, u32); __uint(map_flags, BPF_F_NO_PREALLOC); } thp_whitelist SEC(".maps"); SEC("fmod_ret/mm_bpf_thp_vma_allowable") int BPF_PROG(thp_vma_allowable, struct vm_area_struct *vma) { struct cgroup_subsys_state *css; struct css_set *cgroups; struct mm_struct *mm; struct cgroup *cgroup; struct cgroup *parent; struct task_struct *p; u64 cgrp_id; if (!vma) return 0; mm =3D vma->vm_mm; if (!mm) return 0; p =3D mm->owner; cgroups =3D p->cgroups; cgroup =3D cgroups->subsys[memory_cgrp_id]->cgroup; cgrp_id =3D cgroup->kn->id; /* Allow the tasks in the thp_whiltelist to use THP. */ if (bpf_map_lookup_elem(&thp_whitelist, &cgrp_id)) return 1; return 0; } I chose not to include this in the self-tests to avoid the complexity of setting up cgroups for testing purposes. However, in patch #4 of this series, I've included a simpler example demonstrating task-level control. For service-level control, we could potentially utilize BPF task local storage as an alternative approach. --=20 Regards Yafang