From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 030A5CD11C2 for ; Fri, 5 Apr 2024 09:08:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 918FC6B00B4; Fri, 5 Apr 2024 05:08:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8C98E6B0127; Fri, 5 Apr 2024 05:08:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7B85A6B012B; Fri, 5 Apr 2024 05:08:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 5DC916B00B4 for ; Fri, 5 Apr 2024 05:08:31 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 253DBC0AD7 for ; Fri, 5 Apr 2024 09:08:31 +0000 (UTC) X-FDA: 81974902422.06.C2EA341 Received: from mail-ua1-f44.google.com (mail-ua1-f44.google.com [209.85.222.44]) by imf18.hostedemail.com (Postfix) with ESMTP id 44B511C000D for ; Fri, 5 Apr 2024 09:08:29 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=gwHZTg8b; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.44 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712308109; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mrCGMHv0pPpZMIY8b8bscOGw5TyMzpNvgeSfbWdflNQ=; b=3W56swFFy9FgWBucUOEivnXf7wP3hM7bXlmndbhjh+hjQ2/ntsSKWl/uuxLJ8GGp3YFhdb VzyQvBi+gJINi+oHqQkjuIsdExqQ7ZCClUtUSTwmh84ecfdWR+pE3/JVFGEfdD5klwD9Jl 7qmwF3ZXaR1jCb3W6HJOVIce8FzE6CM= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=gwHZTg8b; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.44 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712308109; a=rsa-sha256; cv=none; b=0FcDOd5+vqVM1h1HLfWtTJpXR7wMgwQfElmTOBEhoXv1ma0TSPPkwvTCNWdL2dmVJdFafD TlSinAWKX/5lGNP2Ji2+7/UM29CfZuIddhZModTd2f5kMVrQHCqRntEmYVVG+/skomvF2s pFtpPe70/2FHx5wdSrSL7Y9aeNAsMlU= Received: by mail-ua1-f44.google.com with SMTP id a1e0cc1a2514c-7e04e70c372so695127241.0 for ; Fri, 05 Apr 2024 02:08:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712308108; x=1712912908; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=mrCGMHv0pPpZMIY8b8bscOGw5TyMzpNvgeSfbWdflNQ=; b=gwHZTg8bdcJg3jp7KjKt6x6orTof6TORZVzu+/sdLmguTUqlp4qHpezWsGUJ/2UE8e UYjJJPNxb7KH39/MWUnrKMYI+QcxvA7jKWhLFcRGxezjZt8+J/7v2/RZHEhyvW1OziDv YIoP9jZvJ3Y7KPdLI3BSYR7yUtXWDoGLUHq5zG4G50RVQwYnHzCqR+ETV6lttjpEsi3z mO5BPRDkpI8EOKRlN2rANBpijGy6v5qUS2gVlPXXUXkjA6ssbOWrwRRWGuANPQJlQFbf koXTS/nhuaUr669ai6kxQvoVJqGE9oDTvLADyaNw8vzGhQJsU76ePrpBVlIrSuFDoO+o EAng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712308108; x=1712912908; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mrCGMHv0pPpZMIY8b8bscOGw5TyMzpNvgeSfbWdflNQ=; b=oRfQdEqNWuIzwQiIbCKWRJj+bYn1hw7tg/blw+XvZtZrgZV7iOyVJqIrknevw8GGrr dLAkBYHppQqqndpBQJ+L7x9slmv36SSpl1PJL/pbKmOxRXF6AZp+T1aqI+XVxgrjPonl KOjVPXamoz+IQKofurYieFU1lzohPxwpX8HW8fcnQ5+qeSmTUNb0JBryNUf59yM1Oi2v cr3bL5JWZ7u+2zAcJl40YYsRkgA7SLfIU9VD8L/XVmlq+/C1jD+cVqQ6IHr73l/KOH7o PncEFloA5EyE64COLvxjpXlzqwTOa2CLPXWKn2RYd1FRAC5AouwA/ZhvdoGvDSZZhAkH PXOQ== X-Forwarded-Encrypted: i=1; AJvYcCUWTXe8KPGxDXhLYg/5aWI3o192IgWvphyy2/Sb5cJ8oBJ0lbzyPWm+FVs24n9fZ7CHVSqm8fts3ymlj6ZM4mSc8q0= X-Gm-Message-State: AOJu0Ywos/20DLTcjup53SBZT5kBMIlDfsdD6Y6ROr6AsjdIDOQhSj7G l9PewEZWBqUaNnlNjfcZ9EkL/1U7GUDsEASuo9zknXip84/aFO6qYl6VtLlgs0vumLk6RASY48m U/uzMSpB8dL2plMJrjwwaI2Q5nps= X-Google-Smtp-Source: AGHT+IFvrCH38h/cwZqbvr8CJzy3cdh867nhzlKhAyd29aJmoe1CCqguZn/e+J4Txo9B5AyhEvHuWmAsK6X7RO/q2Ts= X-Received: by 2002:a1f:de01:0:b0:4d3:b326:5ae8 with SMTP id v1-20020a1fde01000000b004d3b3265ae8mr741189vkg.14.1712308108134; Fri, 05 Apr 2024 02:08:28 -0700 (PDT) MIME-Version: 1.0 References: <20240403035502.71356-1-21cnbao@gmail.com> <30392471-71f9-4eb1-8855-d9c12499346f@redhat.com> <3e5e87b1-b26b-4a93-ab56-9f23bd56a02a@arm.com> In-Reply-To: <3e5e87b1-b26b-4a93-ab56-9f23bd56a02a@arm.com> From: Barry Song <21cnbao@gmail.com> Date: Fri, 5 Apr 2024 22:08:16 +1300 Message-ID: Subject: Re: [PATCH v3] mm: add per-order mTHP alloc_success and alloc_fail counters To: Ryan Roberts Cc: David Hildenbrand , akpm@linux-foundation.org, linux-mm@kvack.org, cerasuolodomenico@gmail.com, chrisl@kernel.org, kasong@tencent.com, peterx@redhat.com, surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org, yosryahmed@google.com, yuzhao@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 44B511C000D X-Stat-Signature: 7cmieokq4e8a5daspnx69qt3zyo976yr X-Rspam-User: X-HE-Tag: 1712308109-610326 X-HE-Meta: U2FsdGVkX1+eDD0jr2msiDU++BegQo+9TNDA8UxlSCdmY5WGFbXxtUhrOyGv8bUexjnnJ7m7D8G5f3yIW6vVI8IyhGUt3Zg57gqX/8LMimIuKtKTt9zt2OoLlkY6bFsu9UDTgj6390hmN42erIpX7mxUjiJXIv5/LhtQ9US7l5d0TjyaHFNLkhL11g2eBbPum81kmv09krQ886/T73fjYi8aU/Yq2bzIiC94Garq+gI3tpWXP1zb8vOTpMsn5QL2nJwnJ8SNzUiizIn6Zq2QhzeDmOH+beejQZyVMLzMcekwZoj+KDJj590yFQzhqHCDupHamb78DoSiX3YAhlVgJz8nQKhCc/IcD2S0Hxcv3LYJyvzr31K/9MD8XbzA+bVsSuDOnfYSQrlBNXaLpMtNulMTCyiCbnW17i0huogsyJoXaOpmE6vZlonhVxu70HLY4Oghj0v6+w5buAdVGtIbdqJIwt6ANYeOsBiyES+2ffNbEB6oW4jxXtATECujCDUKA6X3QneKq+Ig3OiFbAiXBS3izbm2wtnBbHTC6ZzSkfdtcvRdQMOr1waK73SID1I266g1mc9EJ79ebFJglPwajlzkxZ2BOLhiXDrf/vNfdH+ofxpWDCT8rxpgHpzhfbbjRy6KUNA51B9LyqJapPgV5K/xNHiYznNDy2tVntzoxmveQh8NnaZeISFs1cyEGyRHQJTCeEqub5TkDytMuD2z+9hs7fVao2XNns2HY5Awz2Nyool7aEyPswQDyBIvysF8vLyvU8cet5J+JuzyxC27GF5A4fAGpHUOMzRPjOTwdoab3tbv7zvQ8EisHGXiwFnXmMvwiJloUExs3QBwFE5Et0xI8PuUf9onz0/B0tqLXZwX8Ae6Q6QtgF/xyrONpBPBcf//yLhzW7vu0VRqMJ75EtYzcBZgC07QxjDlkCdRFSIiiTEddiJFReGR5YaUTNKXtDMM5Qa950cb0rDFwTw wV/MCSI9 PL3CMXRv5dPZNUKXXFLpc0AY2xWpQNLOTohr4zYXGvZvYSKRkf8YFQ3mtu0pUTfRjOGbpaPKt9/al5y0Ie1mxz7BC6D/25fH4ChVIoCLiLFhbd72d5WCDk/7VdxqTCscFwt/i/YjErGXfC9pjldRVtuBhVmmu6ubYloRJMerWcNa7o1O5CUWkm0wOS8LAHj1ucdP+KJbNZkPUqokpV05b6MkbYGD0lXE7ZJApOoBZPjmLUMMA6DMisV/np3d5iAVfc6eDurzR9urlxYUu/YfVGTomjVPUGbh/dxFB4bxKhOWFTjDl3+JcglpUzxsGhE7mJx9EUnwEpcszgFyNvoM0cIv2zKGfjcVJPJNxu/2/JwbUpyeFXVdQM0EnwazD710ubCPiH0Kepy1wo5IybDju35HxHZFBPODqgEHjFydy48Dgn/onTrWz9pXH/ho6RNjjhSc1uRY6TpFWBZd/g0mdlJkcMZ6nhiBbbyAK3Oamiizm8eLPvM2xh7o58ALnjbGwgA6TG2uLk/kLDjiS+I67hJh++bR1dP9c+F09aRvcV9wlq5yHHnPN8vYh+3s2e1KU+07a X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Apr 5, 2024 at 8:18=E2=80=AFPM Ryan Roberts = wrote: > > On 05/04/2024 05:01, Barry Song wrote: > > On Fri, Apr 5, 2024 at 3:57=E2=80=AFPM Barry Song <21cnbao@gmail.com> w= rote: > >> > >> On Fri, Apr 5, 2024 at 4:31=E2=80=AFAM David Hildenbrand wrote: > >>> > >>> On 04.04.24 09:21, Ryan Roberts wrote: > >>>> On 03/04/2024 22:00, Barry Song wrote: > >>>>> On Thu, Apr 4, 2024 at 12:48=E2=80=AFAM Ryan Roberts wrote: > >>>>>> > >>>>>> On 03/04/2024 09:22, David Hildenbrand wrote: > >>>>>>> On 03.04.24 05:55, Barry Song wrote: > >>>>>>>> From: Barry Song > >>>>>>>> > >>>>>>>> Profiling a system blindly with mTHP has become challenging due > >>>>>>>> to the lack of visibility into its operations. Presenting the > >>>>>>>> success rate of mTHP allocations appears to be pressing need. > >>>>>>>> > >>>>>>>> Recently, I've been experiencing significant difficulty debuggin= g > >>>>>>>> performance improvements and regressions without these figures. > >>>>>>>> It's crucial for us to understand the true effectiveness of > >>>>>>>> mTHP in real-world scenarios, especially in systems with > >>>>>>>> fragmented memory. > >>>>>>>> > >>>>>>>> This patch sets up the framework for per-order mTHP counters, > >>>>>>>> starting with the introduction of anon_alloc_success and > >>>>>>>> anon_alloc_fail counters. Incorporating additional counters > >>>>>>>> should now be straightforward as well. > >>>>>>>> > >>>>>>>> Signed-off-by: Barry Song > >>>>>>>> --- > >>>>>>>> -v3: > >>>>>>>> * save some memory as order-0 and order-1 can't be THP, Ryan; > >>>>>>>> * rename to anon_alloc as right now we only support anon to a= ddress > >>>>>>>> David's comment; > >>>>>>>> * drop a redundant "else", Ryan > >>>>>>>> > >>>>>>>> include/linux/huge_mm.h | 18 ++++++++++++++ > >>>>>>>> mm/huge_memory.c | 54 ++++++++++++++++++++++++++++++++= +++++++++ > >>>>>>>> mm/memory.c | 2 ++ > >>>>>>>> 3 files changed, 74 insertions(+) > >>>>>>>> > >>>>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > >>>>>>>> index e896ca4760f6..5e9af6be9537 100644 > >>>>>>>> --- a/include/linux/huge_mm.h > >>>>>>>> +++ b/include/linux/huge_mm.h > >>>>>>>> @@ -70,6 +70,7 @@ extern struct kobj_attribute shmem_enabled_att= r; > >>>>>>>> * (which is a limitation of the THP implementation). > >>>>>>>> */ > >>>>>>>> #define THP_ORDERS_ALL_ANON ((BIT(PMD_ORDER + 1) - 1) & ~(= BIT(0) | BIT(1))) > >>>>>>>> +#define THP_MIN_ORDER 2 > >>>>>>>> /* > >>>>>>>> * Mask of all large folio orders supported for file THP. > >>>>>>>> @@ -264,6 +265,23 @@ unsigned long thp_vma_allowable_orders(stru= ct > >>>>>>>> vm_area_struct *vma, > >>>>>>>> enforce_sysfs, orders); > >>>>>>>> } > >>>>>>>> +enum thp_event_item { > >>>>>>>> + THP_ANON_ALLOC_SUCCESS, > >>>>>>>> + THP_ANON_ALLOC_FAIL, > >>>>>>>> + NR_THP_EVENT_ITEMS > >>>>>>>> +}; > >>>>>>> > >>>>>>> Maybe use a prefix that resembles matches the enum name and is "o= bviously" > >>>>>>> different to the ones in vm_event_item.h, like > >>>>>>> > >>>>>>> enum thp_event { > >>>>>>> THP_EVENT_ANON_ALLOC_SUCCESS, > >>>>>>> THP_EVENT_ANON_ALLOC_FAIL, > >>>>>>> __THP_EVENT_COUNT, > >>>>>>> }; > >>>>>> > >>>>>> FWIW, I'd personally replace "event" with "stat". For me "event" o= nly ever > >>>>>> increments, but "stat" can increment and decrement. An event is a = type of stat. > >>>>>> > >>>>>> You are only adding events for now, but we have identified a need = for inc/dec > >>>>>> stats that will be added in future. > >>>>> > >>>>> What about the below? > >>>>> > >>>>> enum thp_stat { > >> > >> It seems we still need to use enum thp_stat_item rather than thp_stat. > >> This follows > >> enum zone_stat_item > >> enum numa_stat_item > >> enum node_stat_item > >> > >> And most importantly, the below looks much better > >> > >> enum thp_stat_item { > >> THP_STAT_ANON_ALLOC, > >> THP_STAT_ANON_ALLOC_FALLBACK, > >> __THP_STAT_COUNT > >> }; > >> > >> struct thp_state { > >> unsigned long state[PMD_ORDER + 1][__THP_STAT_COUNT]; > >> }; > >> > >> DECLARE_PER_CPU(struct thp_state, thp_states); > >> > >> than > >> > >> enum thp_stat { > >> THP_STAT_ANON_ALLOC, > >> THP_STAT_ANON_ALLOC_FALLBACK, > >> __THP_STAT_COUNT > >> }; > >> > >> struct thp_state { > >> unsigned long state[PMD_ORDER + 1][__THP_STAT_COUNT]; > >> }; > >> > >>>>> THP_EVENT_ANON_ALLOC, > >>>>> THP_EVENT_ANON_ALLOC_FALLBACK, > >>>>> THP_EVENT_SWPOUT, > >>>>> THP_EVENT_SWPOUT_FALLBACK, > >>>>> ... > >>>>> THP_NR_ANON_PAGES, > >>>>> THP_NR_FILE_PAGES, > >>>> > >>>> I find this ambiguous; Is it the number of THPs or the number of bas= e pages? > >>>> > >>>> I think David made the point about incorporating the enum name into = the labels > >>>> too, so that there can be no namespace confusion. How about: > >>>> > >>>> __ > >>>> > >>>> So: > >>>> > >>>> THP_STAT_EV_ANON_ALLOC > >>>> THP_STAT_EV_ANON_ALLOC_FALLBACK > >>>> THP_STAT_EV_ANON_PARTIAL > >>>> THP_STAT_EV_SWPOUT > >>>> THP_STAT_EV_SWPOUT_FALLBACK > >>>> ... > >>>> THP_STAT_NR_ANON > >>>> THP_STAT_NR_FILE > >>>> ... > >>>> __THP_STAT_COUNT > >>> > >>> I'd even drop the "EV". "NR_ANON" vs "ANON_ALLOC" etc. is expressive = enough. > >> > >> ok. > > > > Hi David, Ryan, > > > > I've named everything as follows. Please let me know if you have any fu= rther > > suggestions before I send the updated version :-) > > Please treat all my comments below as optional - I'm just stating my pref= erence! > > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > > index e896ca4760f6..cc13fa14aa32 100644 > > --- a/include/linux/huge_mm.h > > +++ b/include/linux/huge_mm.h > > @@ -264,6 +264,23 @@ unsigned long thp_vma_allowable_orders(struct > > vm_area_struct *vma, > > enforce_sysfs, orders); > > } > > > > +enum thp_stat_item { > > + THP_STAT_ANON_ALLOC, > > + THP_STAT_ANON_ALLOC_FALLBACK, > > + __THP_STAT_COUNT > > +}; > > + > > +struct thp_state { > > + unsigned long state[PMD_ORDER + 1][__THP_STAT_COUNT]; > > Why using "state" here? I think "stats" is more appropriate? (as in short= for > "statistics") i.e. `struct thp_stats` and `unsigned long stats[][]`. Sounds good. > > > +}; > > + > > +DECLARE_PER_CPU(struct thp_state, thp_states); > > + > > +static inline void count_thp_state(int order, enum thp_stat_item item) > > +{ > > + this_cpu_inc(thp_states.state[order][item]); > > +} > > + > > #define transparent_hugepage_use_zero_page() = \ > > (transparent_hugepage_flags & = \ > > (1< > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index 9d4b2fbf6872..e704b4408181 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -526,6 +526,46 @@ static const struct kobj_type thpsize_ktype =3D { > > .sysfs_ops =3D &kobj_sysfs_ops, > > }; > > > > +DEFINE_PER_CPU(struct thp_state, thp_states) =3D {{{0}}}; > > Does this need to be explicitly zeroed? Won't that be the default initial= state, > just like for regular globals? Perhaps PER_CPU is special? I really don't know. uninitialized data is usually put in bss section so ke= rnel will memset them to 0. but uninitialized per-cpu data is still in .data..pe= rcpu section and I can't find where memset is done in this area as this section can also contain initialized per-cpu data. Will compilers fill zero for uninitialized per-cpu data automatically so kernel just thinks they are initialized 0 data? anyway, I copied the code from vmstat DEFINE_PER_CPU(struct vm_event_state, vm_event_states) =3D {{0}}; > > > + > > +static unsigned long sum_thp_states(int order, enum thp_stat_item item= ) > > Again, I'd call it sum_thp_stats(). sounds good. > > > +{ > > + unsigned long sum =3D 0; > > + int cpu; > > + > > + for_each_online_cpu(cpu) { > > + struct thp_state *this =3D &per_cpu(thp_states, cpu); > > + > > + sum +=3D this->state[order][item]; > > + } > > + > > + return sum; > > +} > > + > > +#define THP_STATE_ATTR(_name, _index) = \ > > And THP_STATS_ATTR(); they are going to live in the "stats" directory aft= er all. sounds good. > > > +static ssize_t _name##_show(struct kobject *kobj, = \ > > + struct kobj_attribute *attr, char *buf) = \ > > +{ = \ > > + int order =3D to_thpsize(kobj)->order; = \ > > + = \ > > + return sysfs_emit(buf, "%lu\n", sum_thp_states(order, _index));= \ > > +} = \ > > +static struct kobj_attribute _name##_attr =3D __ATTR_RO(_name) > > + > > +THP_STATE_ATTR(anon_alloc, THP_STAT_ANON_ALLOC); > > +THP_STATE_ATTR(anon_alloc_fallback, THP_STAT_ANON_ALLOC_FALLBACK); > > > >> > >>> > >>> -- > >>> Cheers, > >>> > >>> David / dhildenb > > > > Thanks > > Barry >