From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9401C67861 for ; Fri, 5 Apr 2024 07:19:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 28BE96B010C; Fri, 5 Apr 2024 03:19:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 23C806B010D; Fri, 5 Apr 2024 03:19:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 104896B010E; Fri, 5 Apr 2024 03:19:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E6B2F6B010C for ; Fri, 5 Apr 2024 03:19:01 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 62EF641112 for ; Fri, 5 Apr 2024 07:19:01 +0000 (UTC) X-FDA: 81974626482.11.D8A55A4 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf23.hostedemail.com (Postfix) with ESMTP id 9AF3714000C for ; Fri, 5 Apr 2024 07:18:58 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf23.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712301539; a=rsa-sha256; cv=none; b=Xe9ru8ff142hDjIMt3G3mlMLTNgRnVzcsszQgsOZH9QNT7aEFwyVXYoscuk8LL59joCn5r pyjYYFBCTmGBeUlFaf/Lux3Doz8elhu+ZMD9p1GWY85UiDoVnQxxeDOBORgPYZO1IaUnCH gfiPko+kyq2X1WcrKE6i4RDKAsxKRcI= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf23.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712301539; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=83/rrYfSjLeLJA1ibbGAtMY4F3wV2I37Ej8iMPCsWVE=; b=veQkUCXM6KaajIDoeXT6x9hsoyT3kXlXe4khvjsCVHKWHxHkXZN2OUxz5I9PH73zduKVVk 0ni0I/VNXLqgsvzYh4u4sRr9RZsLKMyIIiUY2lU80VdEPDmo71Pu0nEtZJSWKhFT3NE7zs AuMfuXyBQWdoy/onmTPniRcvi2vVgOk= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2EB5DFEC; Fri, 5 Apr 2024 00:19:28 -0700 (PDT) Received: from [10.57.74.169] (unknown [10.57.74.169]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E15303F64C; Fri, 5 Apr 2024 00:18:55 -0700 (PDT) Message-ID: <3e5e87b1-b26b-4a93-ab56-9f23bd56a02a@arm.com> Date: Fri, 5 Apr 2024 08:18:54 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3] mm: add per-order mTHP alloc_success and alloc_fail counters To: Barry Song <21cnbao@gmail.com>, David Hildenbrand Cc: akpm@linux-foundation.org, linux-mm@kvack.org, cerasuolodomenico@gmail.com, chrisl@kernel.org, kasong@tencent.com, peterx@redhat.com, surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org, yosryahmed@google.com, yuzhao@google.com References: <20240403035502.71356-1-21cnbao@gmail.com> <30392471-71f9-4eb1-8855-d9c12499346f@redhat.com> Content-Language: en-GB From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 9AF3714000C X-Stat-Signature: us5hrkg5rhbkqg9hsdew5pw3hswxwcc1 X-Rspam-User: X-HE-Tag: 1712301538-944095 X-HE-Meta: U2FsdGVkX1/HkkCogjuvDOtGGjzFggMwjqvdGkubk2XIur6XAVFBqpNyIEN+3BteS3/RYZ8+7b2fETesbbnQETNuNbc2OJwtzZyKIbEs4BcG3OiQA5dJ1Kaq/b6tO7P2bn1GmRXCQgejpKEbrqNEJfJ8HM8GtdChk537Nl+laqHI4pwMioihsDBpgsDt5r755RDry19hm5cQWXiXjJpH+Mt/ox7N84im4a7D82KZxw8mKXVRqKtnW7rXSH6C5+0sfvWOCyOMKqSLmExWB35yhZptW9FLNt0h9Q5qaP9d2UzJUJPV6S0Ep4k4o3LyDpyZRfwPhgbhtDlTUTUQPB9jcU6mNu8SErZzwKDSMqhdttx4u1hiOkMjC2PdqFdX/dTgJJpx1tzElI9PIbtSGq26FnlpVbSQQ7NoxSPecfU8tPhNqZpiGFt9dlwCRv7tCwag8oJ4Il13z5Lhrh7qwAg+smOObr0KIUq7NsEaY3Sbs6zPvfyKo8BwB67v+HgTQfkDqyZg1Nndxtodo6COvbPP9xXTQ/bBejmgkwIJ+yRlMOqMsOBBH3y987aypOH9DJqi2xJsBy77qBIwdKCxCvsAQ6ZW7B4RAi4i56K7Kv86OXsND/N95KY3EKBBsmo+/bqp6+S89QPul/V7uZENKZpWyDpyz7O/aCTcY2UFE1MCHF7HhJBY4aIUpUnPt2ly7PVHoaEiGt7BrnYzFzpUnRNtcfNJnkceMVzuVO6fSxqk9sWcDcpcGXTzECDcfOh3vv2VFKJdOrrTQAwfp/40fr7g3m+0I12GLoiNZgjBomdWjPXXerHLtdEWudCK6eFqXafeR24R8UFmsyCDTWTH1X9B4Em0n/sZF4aXs0M7RBYdrXy4Scc5oNRHV3F1VGzS8u+Fd5Zfs6EhLrU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 05/04/2024 05:01, Barry Song wrote: > On Fri, Apr 5, 2024 at 3:57 PM Barry Song <21cnbao@gmail.com> wrote: >> >> On Fri, Apr 5, 2024 at 4:31 AM David Hildenbrand wrote: >>> >>> On 04.04.24 09:21, Ryan Roberts wrote: >>>> On 03/04/2024 22:00, Barry Song wrote: >>>>> On Thu, Apr 4, 2024 at 12:48 AM Ryan Roberts wrote: >>>>>> >>>>>> On 03/04/2024 09:22, David Hildenbrand wrote: >>>>>>> On 03.04.24 05:55, Barry Song wrote: >>>>>>>> From: Barry Song >>>>>>>> >>>>>>>> Profiling a system blindly with mTHP has become challenging due >>>>>>>> to the lack of visibility into its operations. Presenting the >>>>>>>> success rate of mTHP allocations appears to be pressing need. >>>>>>>> >>>>>>>> Recently, I've been experiencing significant difficulty debugging >>>>>>>> performance improvements and regressions without these figures. >>>>>>>> It's crucial for us to understand the true effectiveness of >>>>>>>> mTHP in real-world scenarios, especially in systems with >>>>>>>> fragmented memory. >>>>>>>> >>>>>>>> This patch sets up the framework for per-order mTHP counters, >>>>>>>> starting with the introduction of anon_alloc_success and >>>>>>>> anon_alloc_fail counters. Incorporating additional counters >>>>>>>> should now be straightforward as well. >>>>>>>> >>>>>>>> Signed-off-by: Barry Song >>>>>>>> --- >>>>>>>> -v3: >>>>>>>> * save some memory as order-0 and order-1 can't be THP, Ryan; >>>>>>>> * rename to anon_alloc as right now we only support anon to address >>>>>>>> David's comment; >>>>>>>> * drop a redundant "else", Ryan >>>>>>>> >>>>>>>> include/linux/huge_mm.h | 18 ++++++++++++++ >>>>>>>> mm/huge_memory.c | 54 +++++++++++++++++++++++++++++++++++++++++ >>>>>>>> mm/memory.c | 2 ++ >>>>>>>> 3 files changed, 74 insertions(+) >>>>>>>> >>>>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >>>>>>>> index e896ca4760f6..5e9af6be9537 100644 >>>>>>>> --- a/include/linux/huge_mm.h >>>>>>>> +++ b/include/linux/huge_mm.h >>>>>>>> @@ -70,6 +70,7 @@ extern struct kobj_attribute shmem_enabled_attr; >>>>>>>> * (which is a limitation of the THP implementation). >>>>>>>> */ >>>>>>>> #define THP_ORDERS_ALL_ANON ((BIT(PMD_ORDER + 1) - 1) & ~(BIT(0) | BIT(1))) >>>>>>>> +#define THP_MIN_ORDER 2 >>>>>>>> /* >>>>>>>> * Mask of all large folio orders supported for file THP. >>>>>>>> @@ -264,6 +265,23 @@ unsigned long thp_vma_allowable_orders(struct >>>>>>>> vm_area_struct *vma, >>>>>>>> enforce_sysfs, orders); >>>>>>>> } >>>>>>>> +enum thp_event_item { >>>>>>>> + THP_ANON_ALLOC_SUCCESS, >>>>>>>> + THP_ANON_ALLOC_FAIL, >>>>>>>> + NR_THP_EVENT_ITEMS >>>>>>>> +}; >>>>>>> >>>>>>> Maybe use a prefix that resembles matches the enum name and is "obviously" >>>>>>> different to the ones in vm_event_item.h, like >>>>>>> >>>>>>> enum thp_event { >>>>>>> THP_EVENT_ANON_ALLOC_SUCCESS, >>>>>>> THP_EVENT_ANON_ALLOC_FAIL, >>>>>>> __THP_EVENT_COUNT, >>>>>>> }; >>>>>> >>>>>> FWIW, I'd personally replace "event" with "stat". For me "event" only ever >>>>>> increments, but "stat" can increment and decrement. An event is a type of stat. >>>>>> >>>>>> You are only adding events for now, but we have identified a need for inc/dec >>>>>> stats that will be added in future. >>>>> >>>>> What about the below? >>>>> >>>>> enum thp_stat { >> >> It seems we still need to use enum thp_stat_item rather than thp_stat. >> This follows >> enum zone_stat_item >> enum numa_stat_item >> enum node_stat_item >> >> And most importantly, the below looks much better >> >> enum thp_stat_item { >> THP_STAT_ANON_ALLOC, >> THP_STAT_ANON_ALLOC_FALLBACK, >> __THP_STAT_COUNT >> }; >> >> struct thp_state { >> unsigned long state[PMD_ORDER + 1][__THP_STAT_COUNT]; >> }; >> >> DECLARE_PER_CPU(struct thp_state, thp_states); >> >> than >> >> enum thp_stat { >> THP_STAT_ANON_ALLOC, >> THP_STAT_ANON_ALLOC_FALLBACK, >> __THP_STAT_COUNT >> }; >> >> struct thp_state { >> unsigned long state[PMD_ORDER + 1][__THP_STAT_COUNT]; >> }; >> >>>>> THP_EVENT_ANON_ALLOC, >>>>> THP_EVENT_ANON_ALLOC_FALLBACK, >>>>> THP_EVENT_SWPOUT, >>>>> THP_EVENT_SWPOUT_FALLBACK, >>>>> ... >>>>> THP_NR_ANON_PAGES, >>>>> THP_NR_FILE_PAGES, >>>> >>>> I find this ambiguous; Is it the number of THPs or the number of base pages? >>>> >>>> I think David made the point about incorporating the enum name into the labels >>>> too, so that there can be no namespace confusion. How about: >>>> >>>> __ >>>> >>>> So: >>>> >>>> THP_STAT_EV_ANON_ALLOC >>>> THP_STAT_EV_ANON_ALLOC_FALLBACK >>>> THP_STAT_EV_ANON_PARTIAL >>>> THP_STAT_EV_SWPOUT >>>> THP_STAT_EV_SWPOUT_FALLBACK >>>> ... >>>> THP_STAT_NR_ANON >>>> THP_STAT_NR_FILE >>>> ... >>>> __THP_STAT_COUNT >>> >>> I'd even drop the "EV". "NR_ANON" vs "ANON_ALLOC" etc. is expressive enough. >> >> ok. > > Hi David, Ryan, > > I've named everything as follows. Please let me know if you have any further > suggestions before I send the updated version :-) Please treat all my comments below as optional - I'm just stating my preference! > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index e896ca4760f6..cc13fa14aa32 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -264,6 +264,23 @@ unsigned long thp_vma_allowable_orders(struct > vm_area_struct *vma, > enforce_sysfs, orders); > } > > +enum thp_stat_item { > + THP_STAT_ANON_ALLOC, > + THP_STAT_ANON_ALLOC_FALLBACK, > + __THP_STAT_COUNT > +}; > + > +struct thp_state { > + unsigned long state[PMD_ORDER + 1][__THP_STAT_COUNT]; Why using "state" here? I think "stats" is more appropriate? (as in short for "statistics") i.e. `struct thp_stats` and `unsigned long stats[][]`. > +}; > + > +DECLARE_PER_CPU(struct thp_state, thp_states); > + > +static inline void count_thp_state(int order, enum thp_stat_item item) > +{ > + this_cpu_inc(thp_states.state[order][item]); > +} > + > #define transparent_hugepage_use_zero_page() \ > (transparent_hugepage_flags & \ > (1< diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 9d4b2fbf6872..e704b4408181 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -526,6 +526,46 @@ static const struct kobj_type thpsize_ktype = { > .sysfs_ops = &kobj_sysfs_ops, > }; > > +DEFINE_PER_CPU(struct thp_state, thp_states) = {{{0}}}; Does this need to be explicitly zeroed? Won't that be the default initial state, just like for regular globals? Perhaps PER_CPU is special? > + > +static unsigned long sum_thp_states(int order, enum thp_stat_item item) Again, I'd call it sum_thp_stats(). > +{ > + unsigned long sum = 0; > + int cpu; > + > + for_each_online_cpu(cpu) { > + struct thp_state *this = &per_cpu(thp_states, cpu); > + > + sum += this->state[order][item]; > + } > + > + return sum; > +} > + > +#define THP_STATE_ATTR(_name, _index) \ And THP_STATS_ATTR(); they are going to live in the "stats" directory after all. > +static ssize_t _name##_show(struct kobject *kobj, \ > + struct kobj_attribute *attr, char *buf) \ > +{ \ > + int order = to_thpsize(kobj)->order; \ > + \ > + return sysfs_emit(buf, "%lu\n", sum_thp_states(order, _index)); \ > +} \ > +static struct kobj_attribute _name##_attr = __ATTR_RO(_name) > + > +THP_STATE_ATTR(anon_alloc, THP_STAT_ANON_ALLOC); > +THP_STATE_ATTR(anon_alloc_fallback, THP_STAT_ANON_ALLOC_FALLBACK); > >> >>> >>> -- >>> Cheers, >>> >>> David / dhildenb > > Thanks > Barry