From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E508CC2BD09 for ; Wed, 3 Jul 2024 16:08:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 747FC6B0083; Wed, 3 Jul 2024 12:08:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6F74E6B008A; Wed, 3 Jul 2024 12:08:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5981E6B008C; Wed, 3 Jul 2024 12:08:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 3B0996B0083 for ; Wed, 3 Jul 2024 12:08:19 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id A08BBC054C for ; Wed, 3 Jul 2024 16:08:18 +0000 (UTC) X-FDA: 82298923476.23.8770009 Received: from mail-ej1-f50.google.com (mail-ej1-f50.google.com [209.85.218.50]) by imf09.hostedemail.com (Postfix) with ESMTP id 8BA16140009 for ; Wed, 3 Jul 2024 16:08:16 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=RldVQVx8; spf=pass (imf09.hostedemail.com: domain of shy828301@gmail.com designates 209.85.218.50 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720022885; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=b2pdxtnt1FgHWWmn6kO+dZg+GShFc0xP2wGWgU3fL8M=; b=cCh3fPRTByE8dohm6U4T5dtseI3DYkELaZfsoFrHG/h2AoieeZAX8Bgrf96TNw0tcAJzTr miUGkj2xwP/g65nVaNWzf4WQAgOVWU9rNQz+Mwf7XmVc/BRFh3TBv7sky/QVNdllFnNMY/ Ssxjo6xNCmkh89Jx//0VdtV0htJAKSI= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=RldVQVx8; spf=pass (imf09.hostedemail.com: domain of shy828301@gmail.com designates 209.85.218.50 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720022885; a=rsa-sha256; cv=none; b=VJcE7Qv3GeMAkNnbWBEdgTnpO0zdtj08RatLhNnP3J/O56n1aoV/CVP3M4HFvs/dhie6b+ fc3oYjxwzFhn8erpQhHBY6Cbj+cz0DODArBOIhJGsCMTrtCdVhYYV/SBICIqieIvuA9RIu l8XkFG3ejVGWhNzHbJWaHvLs6pRb0kY= Received: by mail-ej1-f50.google.com with SMTP id a640c23a62f3a-a72af03ebdfso777076766b.3 for ; Wed, 03 Jul 2024 09:08:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720022895; x=1720627695; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=b2pdxtnt1FgHWWmn6kO+dZg+GShFc0xP2wGWgU3fL8M=; b=RldVQVx8IMi7M+6m+zhFeZOakz0ZfLhjxlZCXSOdyU4r5dwfpG6tCZZuuVl2CrD+vx Uby97X8+A6jN9ctaP3m+r6TsaTzLAiQs6gsNtU2y7GQJSEob7Trci2y6yy18MrB6AMdn IWuS+ZsdwyBgc0mWvDc/02mB7esVzlLKIauxH+cWiQ8kwKs3tdmN9wn/aErZR/ve3qzi 3W2Z9KJvTpfVgg+t4muniLsdq5XNGVJAsNps8vIrWRTCi1FQFqxIWjbSmzsoPh1JaRdd qKSf9uyhpv2H3etWB8WLnBEDE5Uamagl9854yjns9FF+DQ1xmB/kqRuWcUQe92rsH62W ZzMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720022895; x=1720627695; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=b2pdxtnt1FgHWWmn6kO+dZg+GShFc0xP2wGWgU3fL8M=; b=VHL5VyXCMzE82js0JmBq9rW17oXO5wmLwQ5jrIwPotAaAxa57t/AHhnqxf2y5m+MBJ wuHFVSfbiOM8OY0iMsDuLwrbl3RGgBXU6XxieH56g2v8ua8yLE2D5uowpMLAm97Ndy3r 4aRkM/lDcXyznNg/iNtLmWweQNEjt6Kkkipq5UyJEUxzRwfOYslFPY/ocDJOZxkOB/LZ MkPwhZm6sp0wHj3KpKfEJWkgNYRfly+Oj01ljakgiQSsq/xAIPTIngmsgOoR1z4OJC1G 6h94Qx0QlUZqz04kZnkBbWg2ooiGUeN2+NohMCdibovNdNNp1mi822bfyHJCCy+BjdtK jP8w== X-Forwarded-Encrypted: i=1; AJvYcCXpGCxnEEoxjMv0g/Q1UR8Mg4MDWQR2RvElcR/5Jqoz7jy+lDnq1F3FyoA1k3NI3d9UcW0/IKAak4HX2vmYxpVq4hI= X-Gm-Message-State: AOJu0YydnD0mzVhYDbUxMi7Sj0q1IYD4TumFTFyk6GhKQMfnxMT3n3Vv 2l+SHdUCG7IcbipSAt19Hozb8aeg64L8CDKGYmBbVXT5Ad84iqKWHTBVbv82F1paW+0hwNTZAGW ysGK0W6SQqucSdm3JkObYxFXRHoo= X-Google-Smtp-Source: AGHT+IFhUYFRhbui+Csdt7xE9fVsW4u4K1V66lh8/QMfBbE5wBLTgGo+YtsbsKo4CbCgHrVoXI0iXOKehgKEPUsuL/M= X-Received: by 2002:a17:906:1749:b0:a72:5226:331d with SMTP id a640c23a62f3a-a751445fa4amr765061966b.70.1720022894556; Wed, 03 Jul 2024 09:08:14 -0700 (PDT) MIME-Version: 1.0 References: <20240628104926.34209-1-libang.li@antgroup.com> <4b38db15-0716-4ffb-a38b-bd6250eb93da@arm.com> <4d54880e-03f4-460a-94b9-e21b8ad13119@linux.alibaba.com> <516aa6b3-617c-4642-b12b-0c5f5b33d1c9@arm.com> <597ac51e-3f27-4606-8647-395bb4e60df4@redhat.com> <6f68fb9d-3039-4e38-bc08-44948a1dae4d@arm.com> <992cdbf9-80df-4a91-aea6-f16789c5afd7@redhat.com> <2e0a1554-d24f-4d0d-860b-0c2cf05eb8da@arm.com> <06c74db8-4d10-4a41-9a05-776f8dca7189@redhat.com> <429f2873-8532-4cc8-b0e1-1c3de9f224d9@arm.com> <7a0bbe69-1e3d-4263-b206-da007791a5c4@redhat.com> <2450e4f8-236f-49ce-8bd3-b30a6d8c5e57@arm.com> In-Reply-To: <2450e4f8-236f-49ce-8bd3-b30a6d8c5e57@arm.com> From: Yang Shi Date: Wed, 3 Jul 2024 09:08:01 -0700 Message-ID: Subject: Re: [PATCH] support "THPeligible" semantics for mTHP with anonymous shmem To: Ryan Roberts Cc: David Hildenbrand , Baolin Wang , Bang Li , hughd@google.com, akpm@linux-foundation.org, wangkefeng.wang@huawei.com, ziy@nvidia.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 8BA16140009 X-Stat-Signature: rhkejsq3xk5p7snyjh9wjkzdhhmqcf94 X-HE-Tag: 1720022896-857231 X-HE-Meta: U2FsdGVkX1/FA+qhxVIhJ32sr1g/CUZQfpARSd92X7WLha8Fl7cSeE/+WUXeV1S1mFB5SbWWgbj42K4ItAOiuvlSPL+Zp6Y44qWhKeQPE1w99C7w5U7I30c6w5aiuaBkYrbQ4XXIdoBI2S5z7Wto/Lr8/qLerQF7Nmarzqt0vUzNXlmP+NGC0t5MpQ4rR/I/gRkYAoQF+SRhpMuOcOWAULzedgZaT9fbdX4h8j5eeYZskUx+J2bCTObsokBkBPduqlDx5wmsHrz2t5NPi6uRTWBUzoI76JgPhSMVmm6DhlrnOdPcdTyufJdIw9D1lEu+i5qN0sgeYLBPjcxDTCIRNdUOlRpUkv68JsIaNZMCS9T3lvLQaLYyk/L2IvFh9bja8dklTsslP/MvTE0o6FJgwln+aksm/bHkuiL8PENG1vsNDcAPT9aquln//btumoqdNbtC8CkLyWwcMpjay0hkstQEsVlxnxnZ7eR4X/z4wiVD3HPyk2LCqz/ikqfF7YYtKYpYrmDDRjzqUP28/EbAFNgcnypUJ6SCnLestc0mgrvL8H/FGi/U4kiDaXVtB4ibfh+tlvPHdSc8V87VABlafsZnzp3hT7kiAOyK7HVaKip0iEDVCPfA7lvYWULTWlk076ogDTw2xhS7MyYEh3wgqHa8mlkxPO7FLWP5cCq8rS9G+nX2j0WhVpo8L+wb1RXjdNMXSccQ4SHZJSf4k9CtgVuZ4j+oSGeZ9bQRyDQph8eWAEtW6nsS7tofQ8j+HR3Qg7mChORz3hbK21FpBBEYw2JANTLh8mhZnhmecnd2BYd3RuyWn8blsYD2q/xlLb8VggSe2ehxigsqvB3HjsU4d3XaRYfdQjlbEGoAinC+ofxWihx/a8LfJLDTqKc7UkUhz1QxyJxniSLN/U3oSLpRCvlvYwjcu0NCTT8W7+oj9EhWH5rR0aUoqt/ob2XsAtFC/0Pv+It7X/bZESJX14J XlJvLsbp NmoGJQE21h7OGh5/Z5yUMqsPZdttbl254HGEyBp8+yoy4JExKBXJ9zzmH1A/FLXZWle4gIVjenBatg85pfrB5eezbfTXd4Zr4nxswLe0xBkTkJ+kKzm3wZO2C18TJyzmWZkjjsXPHyp1cc6tUtuuQPY223qSr8XHMYDvsmGSgK+44vHvy8GOX1irD/acf657GXM4BbfSmgohqjLr608te+lIy6BgLf5UG8ePIVYK7tEAAc3rHCove+797x6HUMDkdt6nE1+yumWohkziaSxgCSH2pmrBbepHk14atsdB4ZnoEMwJ0ryqR3uxHSQAY6l3MnAxZulRwZ4BtJGlpUa3nRCkwZuAgUq79ssUfadk4te8zv7CgUUZNyd2zMu1pWM79xvw/VDzxCYfxc/V34v+V4UyjMrfAAJJ4JH7OOjfYpz1nrxc9qFmfHsHpYg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000011, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jul 2, 2024 at 1:24=E2=80=AFAM Ryan Roberts = wrote: > > On 01/07/2024 19:20, Yang Shi wrote: > > On Mon, Jul 1, 2024 at 3:23=E2=80=AFAM David Hildenbrand wrote: > >> > >> On 01.07.24 12:16, Ryan Roberts wrote: > >>> On 01/07/2024 10:17, David Hildenbrand wrote: > >>>> On 01.07.24 11:14, Ryan Roberts wrote: > >>>>> On 01/07/2024 09:57, David Hildenbrand wrote: > >>>>>> On 01.07.24 10:50, Ryan Roberts wrote: > >>>>>>> On 01/07/2024 09:48, David Hildenbrand wrote: > >>>>>>>> On 01.07.24 10:40, Ryan Roberts wrote: > >>>>>>>>> On 01/07/2024 09:33, Baolin Wang wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On 2024/7/1 15:55, Ryan Roberts wrote: > >>>>>>>>>>> On 28/06/2024 11:49, Bang Li wrote: > >>>>>>>>>>>> After the commit 7fb1b252afb5 ("mm: shmem: add mTHP support = for > >>>>>>>>>>>> anonymous shmem"), we can configure different policies throu= gh > >>>>>>>>>>>> the multi-size THP sysfs interface for anonymous shmem. But > >>>>>>>>>>>> currently "THPeligible" indicates only whether the mapping i= s > >>>>>>>>>>>> eligible for allocating THP-pages as well as the THP is PMD > >>>>>>>>>>>> mappable or not for anonymous shmem, we need to support sema= ntics > >>>>>>>>>>>> for mTHP with anonymous shmem similar to those for mTHP with > >>>>>>>>>>>> anonymous memory. > >>>>>>>>>>>> > >>>>>>>>>>>> Signed-off-by: Bang Li > >>>>>>>>>>>> --- > >>>>>>>>>>>> fs/proc/task_mmu.c | 10 +++++++--- > >>>>>>>>>>>> include/linux/huge_mm.h | 11 +++++++++++ > >>>>>>>>>>>> mm/shmem.c | 9 +-------- > >>>>>>>>>>>> 3 files changed, 19 insertions(+), 11 deletions(-) > >>>>>>>>>>>> > >>>>>>>>>>>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > >>>>>>>>>>>> index 93fb2c61b154..09b5db356886 100644 > >>>>>>>>>>>> --- a/fs/proc/task_mmu.c > >>>>>>>>>>>> +++ b/fs/proc/task_mmu.c > >>>>>>>>>>>> @@ -870,6 +870,7 @@ static int show_smap(struct seq_file *m,= void *v) > >>>>>>>>>>>> { > >>>>>>>>>>>> struct vm_area_struct *vma =3D v; > >>>>>>>>>>>> struct mem_size_stats mss =3D {}; > >>>>>>>>>>>> + bool thp_eligible; > >>>>>>>>>>>> smap_gather_stats(vma, &mss, 0); > >>>>>>>>>>>> @@ -882,9 +883,12 @@ static int show_smap(struct seq_f= ile *m, void > >>>>>>>>>>>> *v) > >>>>>>>>>>>> __show_smap(m, &mss, false); > >>>>>>>>>>>> - seq_printf(m, "THPeligible: %8u\n", > >>>>>>>>>>>> - !!thp_vma_allowable_orders(vma, vma->vm_flags, > >>>>>>>>>>>> - TVA_SMAPS | TVA_ENFORCE_SYSFS, THP_ORDERS_AL= L)); > >>>>>>>>>>>> + thp_eligible =3D !!thp_vma_allowable_orders(vma, vma->v= m_flags, > >>>>>>>>>>>> + TVA_SMAPS | TVA_ENFORCE_SYSFS, THP_= ORDERS_ALL); > >>>>>>>>>>>> + if (vma_is_anon_shmem(vma)) > >>>>>>>>>>>> + thp_eligible =3D > >>>>>>>>>>>> !!shmem_allowable_huge_orders(file_inode(vma->vm_file), > >>>>>>>>>>>> + vma, vma->vm_pgoff, thp_eligibl= e); > >>>>>>>>>>> > >>>>>>>>>>> Afraid I haven't been following the shmem mTHP support work a= s much as I > >>>>>>>>>>> would > >>>>>>>>>>> have liked, but is there a reason why we need a separate func= tion for > >>>>>>>>>>> shmem? > >>>>>>>>>> > >>>>>>>>>> Since shmem_allowable_huge_orders() only uses shmem specific l= ogic to > >>>>>>>>>> determine > >>>>>>>>>> if huge orders are allowable, there is no need to complicate t= he > >>>>>>>>>> thp_vma_allowable_orders() function by adding more shmem relat= ed logic, > >>>>>>>>>> making > >>>>>>>>>> it more bloated. In my view, providing a dedicated helper > >>>>>>>>>> shmem_allowable_huge_orders(), specifically for shmem, simplif= ies the logic. > >>>>>>>>> > >>>>>>>>> My point was really that a single interface (thp_vma_allowable_= orders) > >>>>>>>>> should be > >>>>>>>>> used to get this information. I have no strong opinon on how th= e > >>>>>>>>> implementation > >>>>>>>>> of that interface looks. What you suggest below seems perfectly= reasonable > >>>>>>>>> to me. > >>>>>>>> > >>>>>>>> Right. thp_vma_allowable_orders() might require some care as dis= cussed in > >>>>>>>> other > >>>>>>>> context (cleanly separate dax and shmem handling/orders). But th= at would be > >>>>>>>> follow-up cleanups. > >>>>>>> > >>>>>>> Are you planning to do that, or do you want me to send a patch? > >>>>>> > >>>>>> I'm planning on looking into some details, especially the interact= ion with large > >>>>>> folios in the pagecache. I'll let you know once I have a better id= ea what > >>>>>> actually should be done :) > >>>>> > >>>>> OK great - I'll scrub it from my todo list... really getting things= done today :) > >>>> > >>>> Resolved the khugepaged thiny already? :P > >>>> > >>>> [khugepaged not active when only enabling the sub-size via the 2M fo= lder IIRC] > >>> > >>> Hmm... baby brain? > >> > >> :) > >> > >> I think I only mentioned it in a private mail at some point. > >> > >>> > >>> Sorry about that. I've been a bit useless lately. For some reason it = wasn't on > >>> my list, but its there now. Will prioritise it, because I agree it's = not good. > >> > >> > >> IIRC, if you do > >> > >> echo never > /sys/kernel/mm/transparent_hugepage/enabled > >> echo always > /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/ena= bled > >> > >> khugepaged will not get activated. > > > > khugepaged is controlled by the top level knob. > > What do you mean by "top level knob"? I assume > /sys/kernel/mm/transparent_hugepage/enabled ? Yes. > > If so, that's not really a thing in its own right; its just the legacy PM= D-size > THP control, and we only take any notice of it if a per-size control is s= et to > "inherit". So if we have: > > # echo always > /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enab= led > > Then by design, /sys/kernel/mm/transparent_hugepage/enabled should be ign= ored. > > > But the above setting > > sounds confusing, can we disable the top level knob, but enable it on > > a per-order basis? TBH, it sounds weird and doesn't make too much > > sense to me. > > Well that's the design and that's how its documented. It's done this way = for > back-compat. All controls are now per-size. But at boot, we default all p= er-size > controls to "never" except for the PMD-sized control, which is defaulted = to > "inherit". That way, an unenlightened user-space can still control PMD-si= zed THP > via the legacy (top-level) control. But enlightened apps can directly con= trol > per-size. OK, good to know. > > I'm not sure how your way would work, because you would have 2 controls > competing to do the same thing? I don't see how they compete if they are 2-level knobs. And I failed to see how it achieved back-compat. For example, memcached reads /sys/kernel/mm/transparent_hugepage/enabled to determine whether it should manage memory in huge page (2M) granularity. If the setting is set to : # echo never > /sys/kernel/mm/transparent_hugepage/enabled # echo always > /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enable= d memcached will manage memory in 4K granularity, but 2M THP is actually enabled unless memcached checks the per-order knobs. If we use 2-level mode, memcached doesn't need check per-order setting at all in order to know whether THP is enabled or not. And it actually doesn't care about what orders are enabled, it assumes THP size is 2M (or PMD size). Even though 2M is not enabled but lower orders are enabled, memcached still can fully utilize the mTHP since the memory chunk managed by memcached is still 2M aligned in this setting. So unenlightened applications still can work well. Jemalloc should do the similar thing if I remember correctly. > > > > >> > >> -- > >> Cheers, > >> > >> David / dhildenb > >> > >> >