From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A056AC67861 for ; Fri, 5 Apr 2024 10:51:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 32F1D6B0156; Fri, 5 Apr 2024 06:51:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2DEFA6B0157; Fri, 5 Apr 2024 06:51:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 181216B0158; Fri, 5 Apr 2024 06:51:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id EF58D6B0156 for ; Fri, 5 Apr 2024 06:51:46 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 8C233161110 for ; Fri, 5 Apr 2024 10:51:46 +0000 (UTC) X-FDA: 81975162612.22.133AC02 Received: from mail-qk1-f180.google.com (mail-qk1-f180.google.com [209.85.222.180]) by imf30.hostedemail.com (Postfix) with ESMTP id E81328001C for ; Fri, 5 Apr 2024 10:51:44 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="fw1p/Cki"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.180 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712314304; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=peV9nuoyWG5aQfWFhx61nJUZnBgUK1pfdvLppGafH4k=; b=8hLXo5IJ9CCOhXEnWhssTXpM62O9Z/p0j2doU0ls0R1G2AiiYQ3hatPjj5peYx0TdD6N/I NGZYgqcU1/A9GziX584Hcd9saVBfCMaGjob8rebyhG/ZELfm5llrWJBDPk5n1RFE6h4PqC N8NxiaJh/0bITveccA8N/XxUwqM02tU= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="fw1p/Cki"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.180 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712314304; a=rsa-sha256; cv=none; b=Y7KcArWFGGbI+Str/tjw+PF0aXXXrh1JFJP+HIGLj/kkif8Ojc3cPe4uqU243TG7IQQT4c O6uOCEnuBt+XhAF6G9VXsx6baA6s8XgX4g8K6aQQWfVIK6A4WwI18ODsdkxvxpHma3EIgJ 3epHvCYqR+TInIqMQObEdY6uJi4ZLM8= Received: by mail-qk1-f180.google.com with SMTP id af79cd13be357-78be3087929so127361285a.1 for ; Fri, 05 Apr 2024 03:51:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712314304; x=1712919104; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=peV9nuoyWG5aQfWFhx61nJUZnBgUK1pfdvLppGafH4k=; b=fw1p/CkilGU0tSxxu/52OZrobNKmgT79rXVs5Rd+r+XdqsCPVDiqdSVSkAPEgwWSbi gmzECqNyG1Dqz50tnbZX4SUmW1l3bbnrdD69HN5Fw+gJH+lbZM/cO589/m7e9loztzOx 8UOlYz1WdVVIV+jvk99iT/2eZ209BzJfBwNKRDgBJIiOome9m3+T7bMJp3xEkg3DEwLK Z8D2ELdbKZWeQie2WbQrnMCW8NOodSavJ2OH+kzVTyQeAgI6hHQmKF3l3Kg0Rl6UaXNW Hu+GXdNAe5hMTp5C0Oi6pqN/7ASYVIs8+VXSbJ4SgYAty2iOYdwzZm6N/YRyGWSvxPAE 0b6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712314304; x=1712919104; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=peV9nuoyWG5aQfWFhx61nJUZnBgUK1pfdvLppGafH4k=; b=vnD/8v7J9hRmhD21+DDfGWjugoHO+9uflq6wE25w/CK5YvBF5cjCujKjgV8nNw5vVc N5ay6CuKpKk5GupO97nl/IIL0+A5ZKDwNWu6c0I+WLHP1qFwUQYW4jEiSvMa6Uvvgl5C KoysKvwPKvfV0bW3b78hunAQFhFm37956GlCu1L1CcbVTQeSVyqBnq1803PgrxGGv6Dq HEywgJr68MpUopG217T3HV5/bvSPADVl6APY9oE9qWgicnBi+SHnRi46aQ+a0bxS30Nr 4K1AKpAh6u0GOpMEY72+3/yJbnccoJXZBieAL0Xnb/JkTioQB9stUdWjwK1V5DVmuHCx lcVQ== X-Forwarded-Encrypted: i=1; AJvYcCXaWnDnjZDXr5M2O07YPPKuvKtvL2op8AjauUnsYjEyKQIAsYweEF19CB6cDE4yrTRwJ+fQcmIcuXxQ1TozHZj+o0A= X-Gm-Message-State: AOJu0YyhEDJPZnCCjT4xOGpCKaSmmjCqr8/HtJq1RKCUL6D1O6TqTj6x NMVxcPu3t2WLNn663/LmvS9CiM2wGK6n6Ji7Ps/QQG5jNmFadSYYCgNtDOSevgKNfsNsx6/RHe5 A3MIGiiT6pzYYm9nEmc2YRu/tT/FRXAZX+DA= X-Google-Smtp-Source: AGHT+IGmuHCIwWZJHMjeA8cREkt/KEr3Kk2FV9aOKXhI9IN5fJL4qgsXwpRnPRIfuJRm5TnPTwmHMAlR5DLifjL+QjM= X-Received: by 2002:a05:6214:2021:b0:699:3bd2:9624 with SMTP id 1-20020a056214202100b006993bd29624mr1118230qvf.55.1712314303839; Fri, 05 Apr 2024 03:51:43 -0700 (PDT) MIME-Version: 1.0 References: <20240403035502.71356-1-21cnbao@gmail.com> <30392471-71f9-4eb1-8855-d9c12499346f@redhat.com> <38eee542-8567-4c42-ad63-778b4a20c153@redhat.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Fri, 5 Apr 2024 23:51:32 +1300 Message-ID: Subject: Re: [PATCH v3] mm: add per-order mTHP alloc_success and alloc_fail counters To: David Hildenbrand Cc: Ryan Roberts , akpm@linux-foundation.org, linux-mm@kvack.org, cerasuolodomenico@gmail.com, chrisl@kernel.org, kasong@tencent.com, peterx@redhat.com, surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org, yosryahmed@google.com, yuzhao@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: E81328001C X-Stat-Signature: ipyebgt8r1n7rkxot5jwyyraacht6ej4 X-HE-Tag: 1712314304-713654 X-HE-Meta: U2FsdGVkX1+M0ygYy56CwESm2lv22WB0rE/9zWsduIA1lpn0BxgWlxWIp+05kdW63KUHgZhOHud70XouXskf8uTpLY214RVP1DrdrAgT/fvS/1IaA1Xu+43E3gvJFJH0GLVmqI9nGBPEUq71OGNj2dLrAG6GEBN2sfeGF6wsuLtXg7RJpZVCw4n75oanjsaGvyKrd5biClxl+Dnmx9z22WrzTHeT4al3UDfDbNnOPiWgNv4x4xLc5h2EgHuRCeNLDE6qq9wpX4y5Cf20P4k3KwFwoKeP6Ld52/YpNcNi+//l/zSb0bAXaEvMvfaHLjhTk1SBHTumVkcao4UdS4SkgiwQ/yVEkZNEsZxS3ppN0mv7G/n3jrsH1oRkOifCLPm3RZihfdhi5LKQXA1byxVzFpYr10GWtA/0nNOtlFw/MDPAXg9qa9R/1067ZDY1m9d+VZQgY9hLVmK/x4shoPiSvTJy3SfcFsfPuTpSrRhRHIdQqwNeHryQXTmIGJvD2QG/drCQfKYODAjxRYn58uAwwvxvBBY6MIlA8vrSA315RvQGXxhRXTp5+A2FsjQr8OOyf/ULP7fbZmVbE9VRtLlAEX2S1svWuSMvdTliOjjmGWc9uPgnWnOiUzu2JO7Evv5Ihzc462X5mq7hGSmwf33NOKPAKyxfIqLyvXK5gHN3CfaiDipGSQdyXqZd0wyxO9ry7p19B+BSXYqOb+mZGK/igTXyBzpnytKXnQlUS5rfqcHCuU22rExDAPJh2HHE8fDz2L55sOMB4++uJYCdvVfpoYrqLgWgbqArQ/6VEOdCwlNbpwDHfySqBDhqes/QFGFG7JXtcAFfij2T6wJov3+M8akRb6jLhLZrTGdNiZ1gTYYqzwSY4G0kqLhNnn3UtNkfxGWIJPo/OQWOJhtnrAKlFwC+15pjFOP+ODzSQZjd9lM52BT7kqwMg7Q5AycB6LEOrIWVENNUiDtgHLG8tcD rMdPgMpO /8twapUif37hlFh4aIRYRmk04JHnGv8c8uMf/YyoVH677RWjix3HPnEd8Im5oMWqHSXpgM/TCT2XeCoxKmoYMe5sKRw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Apr 5, 2024 at 11:15=E2=80=AFPM David Hildenbrand wrote: > > On 05.04.24 11:24, Barry Song wrote: > > On Fri, Apr 5, 2024 at 10:04=E2=80=AFPM David Hildenbrand wrote: > >> > >> On 05.04.24 09:21, Ryan Roberts wrote: > >>> On 05/04/2024 07:29, David Hildenbrand wrote: > >>>> On 05.04.24 06:01, Barry Song wrote: > >>>>> On Fri, Apr 5, 2024 at 3:57=E2=80=AFPM Barry Song <21cnbao@gmail.co= m> wrote: > >>>>>> > >>>>>> On Fri, Apr 5, 2024 at 4:31=E2=80=AFAM David Hildenbrand wrote: > >>>>>>> > >>>>>>> On 04.04.24 09:21, Ryan Roberts wrote: > >>>>>>>> On 03/04/2024 22:00, Barry Song wrote: > >>>>>>>>> On Thu, Apr 4, 2024 at 12:48=E2=80=AFAM Ryan Roberts wrote: > >>>>>>>>>> > >>>>>>>>>> On 03/04/2024 09:22, David Hildenbrand wrote: > >>>>>>>>>>> On 03.04.24 05:55, Barry Song wrote: > >>>>>>>>>>>> From: Barry Song > >>>>>>>>>>>> > >>>>>>>>>>>> Profiling a system blindly with mTHP has become challenging = due > >>>>>>>>>>>> to the lack of visibility into its operations. Presenting th= e > >>>>>>>>>>>> success rate of mTHP allocations appears to be pressing need= . > >>>>>>>>>>>> > >>>>>>>>>>>> Recently, I've been experiencing significant difficulty debu= gging > >>>>>>>>>>>> performance improvements and regressions without these figur= es. > >>>>>>>>>>>> It's crucial for us to understand the true effectiveness of > >>>>>>>>>>>> mTHP in real-world scenarios, especially in systems with > >>>>>>>>>>>> fragmented memory. > >>>>>>>>>>>> > >>>>>>>>>>>> This patch sets up the framework for per-order mTHP counters= , > >>>>>>>>>>>> starting with the introduction of anon_alloc_success and > >>>>>>>>>>>> anon_alloc_fail counters. Incorporating additional counters > >>>>>>>>>>>> should now be straightforward as well. > >>>>>>>>>>>> > >>>>>>>>>>>> Signed-off-by: Barry Song > >>>>>>>>>>>> --- > >>>>>>>>>>>> -v3: > >>>>>>>>>>>> * save some memory as order-0 and order-1 can't be THP= , Ryan; > >>>>>>>>>>>> * rename to anon_alloc as right now we only support an= on to address > >>>>>>>>>>>> David's comment; > >>>>>>>>>>>> * drop a redundant "else", Ryan > >>>>>>>>>>>> > >>>>>>>>>>>> include/linux/huge_mm.h | 18 ++++++++++++++ > >>>>>>>>>>>> mm/huge_memory.c | 54 +++++++++++++++++++++++++= ++++++++++++++++ > >>>>>>>>>>>> mm/memory.c | 2 ++ > >>>>>>>>>>>> 3 files changed, 74 insertions(+) > >>>>>>>>>>>> > >>>>>>>>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm= .h > >>>>>>>>>>>> index e896ca4760f6..5e9af6be9537 100644 > >>>>>>>>>>>> --- a/include/linux/huge_mm.h > >>>>>>>>>>>> +++ b/include/linux/huge_mm.h > >>>>>>>>>>>> @@ -70,6 +70,7 @@ extern struct kobj_attribute shmem_enabled= _attr; > >>>>>>>>>>>> * (which is a limitation of the THP implementation). > >>>>>>>>>>>> */ > >>>>>>>>>>>> #define THP_ORDERS_ALL_ANON ((BIT(PMD_ORDER + 1) - = 1) & ~(BIT(0) > >>>>>>>>>>>> | BIT(1))) > >>>>>>>>>>>> +#define THP_MIN_ORDER 2 > >>>>>>>>>>>> /* > >>>>>>>>>>>> * Mask of all large folio orders supported for file T= HP. > >>>>>>>>>>>> @@ -264,6 +265,23 @@ unsigned long thp_vma_allowable_orders(= struct > >>>>>>>>>>>> vm_area_struct *vma, > >>>>>>>>>>>> enforce_sysfs, orders); > >>>>>>>>>>>> } > >>>>>>>>>>>> +enum thp_event_item { > >>>>>>>>>>>> + THP_ANON_ALLOC_SUCCESS, > >>>>>>>>>>>> + THP_ANON_ALLOC_FAIL, > >>>>>>>>>>>> + NR_THP_EVENT_ITEMS > >>>>>>>>>>>> +}; > >>>>>>>>>>> > >>>>>>>>>>> Maybe use a prefix that resembles matches the enum name and i= s "obviously" > >>>>>>>>>>> different to the ones in vm_event_item.h, like > >>>>>>>>>>> > >>>>>>>>>>> enum thp_event { > >>>>>>>>>>> THP_EVENT_ANON_ALLOC_SUCCESS, > >>>>>>>>>>> THP_EVENT_ANON_ALLOC_FAIL, > >>>>>>>>>>> __THP_EVENT_COUNT, > >>>>>>>>>>> }; > >>>>>>>>>> > >>>>>>>>>> FWIW, I'd personally replace "event" with "stat". For me "even= t" only ever > >>>>>>>>>> increments, but "stat" can increment and decrement. An event i= s a type of > >>>>>>>>>> stat. > >>>>>>>>>> > >>>>>>>>>> You are only adding events for now, but we have identified a n= eed for > >>>>>>>>>> inc/dec > >>>>>>>>>> stats that will be added in future. > >>>>>>>>> > >>>>>>>>> What about the below? > >>>>>>>>> > >>>>>>>>> enum thp_stat { > >>>>>> > >>>>>> It seems we still need to use enum thp_stat_item rather than thp_s= tat. > >>>>>> This follows > >>>>>> enum zone_stat_item > >>>>>> enum numa_stat_item > >>>>>> enum node_stat_item > >>>> > >>>> > >>>> Right, we can do that. > >>>> > >>>> [...] > >>>> > >>>>>> ok. > >>>>> > >>>>> Hi David, Ryan, > >>>>> > >>>>> I've named everything as follows. Please let me know if you have an= y further > >>>>> suggestions before I send the updated version :-) > >>>>> > >>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > >>>>> index e896ca4760f6..cc13fa14aa32 100644 > >>>>> --- a/include/linux/huge_mm.h > >>>>> +++ b/include/linux/huge_mm.h > >>>>> @@ -264,6 +264,23 @@ unsigned long thp_vma_allowable_orders(struct > >>>>> vm_area_struct *vma, > >>>>> enforce_sysfs, orders)= ; > >>>>> } > >>>>> > >>>>> +enum thp_stat_item { > >>>>> + THP_STAT_ANON_ALLOC, > >>>>> + THP_STAT_ANON_ALLOC_FALLBACK, > >>>>> + __THP_STAT_COUNT > >>>>> +}; > >>>>> + > >>>> > >>>> > >>>> Final bikeshedding: so we want to call all of that MTHP_, mthp_ etc,= to make it > >>>> clearer that this is per mTHP size? > >>> > >>> Perviously we concluded that mTHP was only for commit logs, and in th= e longer > >>> term we wanted THP to mean the full set of sizes like it does for hug= etlb. > >>> Neither "mTHP" nor "multi-size THP" currently exist in the code base. > >> > >> True, but some indication that these are "per THP size" might be reaso= nable. > > > > It seems quite reasonable, especially considering the existence of THP = counters. > > Additionally, there's even an overlap between THP counters and mTHP cou= nters, > > specifically for PMD_ORDER. This complexity could potentially confuse > > readers of the code. > > > > I'm perfectly fine with using "M/m" as a prefix, as long as sysfs > > hasn't exposed it. > > Moreover, "M/m" remains a preferable prefix given that "PER_SIZE" is > > overly long, > > and people are already familiar with the concept of mTHP. > > I think the important part is to not use "mTHP" in any of our ABI. As we > expose these stats per THP size, using mTHP internally should be ok. But > I'm happy to hear alternatives that express "per THP size" in a better wa= y. Indeed, as long as we maintain stability in the node names, the designation of "mTHP" remains merely an implementation detail. Therefore, in version 4, I've opted to utilize "mthp" for these macros and variables. Once we discov= er a more suitable approach, we can proceed with additional renaming as necessary :-) At present, we've implemented mthp allocation within PF and are executing non-split swapout. It's crucial now for us to assess their effectiveness an= d operational scope. > > -- > Cheers, > > David / dhildenb > Thanks Barry