From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C871CD1284 for ; Tue, 2 Apr 2024 08:58:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D0C206B0098; Tue, 2 Apr 2024 04:58:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CE1C66B0099; Tue, 2 Apr 2024 04:58:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BD14E6B009A; Tue, 2 Apr 2024 04:58:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 99CB16B0098 for ; Tue, 2 Apr 2024 04:58:34 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 5C197A09F3 for ; Tue, 2 Apr 2024 08:58:34 +0000 (UTC) X-FDA: 81963990948.26.3BC26D2 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf28.hostedemail.com (Postfix) with ESMTP id 19C98C000A for ; Tue, 2 Apr 2024 08:58:30 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf28.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712048311; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oYsvbUanaXVZ86ZrSaon8WckPb0/yT68cvdVPeUHfVM=; b=gUTKmf8nt0NE+Hw3OoV1xTCDCZLHpOtyhJclYp9vJmi+teWmFf+OJyjHGqh39knhBymOyo EyycM50T4lK85ayce1pJes787TkzN4bQZgRA10Rpi+B52kG3p81nIK5zs28S8G5Xav3lPZ IpZPvAPcFHgNfBRSAl5Jaf7LNehobyU= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf28.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712048311; a=rsa-sha256; cv=none; b=vdU7Q+/iIbhqrXckatlN4t/ByNXI0R/pFGu3aaqe8YdKfiqmOE//A9KaedttPkXZ0MoKuN mPxMWpvsQXZieny6Yc8dgh0F6TWXWRaHqKs4ilCaS4pf8W9UCd8TVRNim0doJMXLYgE8vN iLjrsqD/qoktyqvxVy52OojqG5cGY0U= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id ED57B1042; Tue, 2 Apr 2024 01:59:01 -0700 (PDT) Received: from [10.57.73.65] (unknown [10.57.73.65]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id F3E413F64C; Tue, 2 Apr 2024 01:58:27 -0700 (PDT) Message-ID: Date: Tue, 2 Apr 2024 09:58:26 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] mm: add per-order mTHP alloc_success and alloc_fail counters Content-Language: en-GB To: Barry Song <21cnbao@gmail.com>, akpm@linux-foundation.org, linux-mm@kvack.org Cc: willy@infradead.org, david@redhat.com, cerasuolodomenico@gmail.com, kasong@tencent.com, surenb@google.com, v-songbaohua@oppo.com, yosryahmed@google.com, yuzhao@google.com, chrisl@kernel.org, peterx@redhat.com References: <20240328095139.143374-1-21cnbao@gmail.com> From: Ryan Roberts In-Reply-To: <20240328095139.143374-1-21cnbao@gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 19C98C000A X-Stat-Signature: jwax5j9cusrzewm43un1mnacus6bst3f X-Rspam-User: X-HE-Tag: 1712048310-176583 X-HE-Meta: U2FsdGVkX1+YLt8pvAbZp+wHBi6GXUVENbentd4rHRZaEJT4wh2/HC6Bpfbjs/Yh8VjL1n+p2Gt2qgZbO32wdmtq7iPsmJsJdMz3uMXiZToAnqGKTZYJ0sBOIvQVEPvKQswwBuuibv9PjBLKN7NMTI+4dPYm/v9X6dJsQlAoyYD8nUs3YNFrzySE1h2J4DYK1ksg9DSuxHaUrAhWD8WNr+Z9dAz/Lm0NydIgf7SPHGbJtBMbGVNOKmd3paoFKyLIhLPHttSvCY/lboapPiVHdelNh0aNwUaadk2o5T7jx9R4OkQyR5GCrSsXbq38czmyEZ4ifTXdCTIOS221x9bgdu9Lb+Er4lQ78S9GnkpUtVxGBnpSmdaPHKBJIzX+N+oEByrDgg+9UzZYCjfCd221pn81nNlo9yMn4MaF12fFwtbMc/zzog6AAOzuyQRH7A/0p5DbZHBoBl1i2Si94mzRTz3f8HDa3gtau6M7sszLD58NLJm/uF9s+8dsgDdTX7bU1d0dwtsykVrdXxeVrytGYbfzSwx+2djuCQ4/lHgCXIggM1XSXHE31D3OstmJAuR96YcBtJMW68Zlsh8v49Bvg8gJcxjw8/nBuiXXz4wS7vT9K7AUw26MfrXBRLPEyQYI2/3VZof8u7/tSr5wftrUCjTYriBeBDZAHy2VAFNmUzLX51xnB8+Puto/YICIfx6vEcNddg8oYAH513mXjIQNuo/55kpuCHSM8IGjaXT7OrfzwvxfW3j0ucTg+IRfC5znVo0QAV8gYyx/mNOO2O0GewjARhe1/8qJ0GmVLU9b6RWVLy0laBHqvCxcuphRtJFkryuJOcd3Cyn4OyOe1+9FPkcI8ou71QgXjPfY4NC0u+IM3JFTvbA44lkv/+6xijJjKDFJkZMFnJ+ZgSmh+8iSlwLZn/XPNb7Utb2/KXxmv/Yy5FM2pZb1KxnrcMeALkfh2zJ49XWE9EDLTlC6cZx w7hCguWl oZ7CNvl9wKScBDhRCs6W5Qt1qWCZmeCK8Kj1NGIWwL1/HdRXIv9keAzdSUzX+FXBzGD3uVGoqG+Ei7kFbK9suXe8VW0ji5wxxfyUOdu1FfMDxyPyNfTIVgmE0+l+rdoT/Fz5bThGn1bIlHnhwNKCuPxIdUb3dHKW8BReS2PQldLn0FlzWn/rlWvmuyM9zSI5oeEQwJOIswCsvBCc00ynYAmCNNQS7nq2NcpwNr1li3Tm5neSg86Od39+dJIuBOjrSDO23T+3voTjrI013FJQn77UQzCYdUh47YeG7PRQHflSF+oz0v5sul9HXvG9uNGCa/Z19IClzqsCPeFKfGfAyqpb8wA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 28/03/2024 09:51, Barry Song wrote: > From: Barry Song > > Profiling a system blindly with mTHP has become challenging due > to the lack of visibility into its operations. Presenting the > success rate of mTHP allocations appears to be pressing need. > > Recently, I've been experiencing significant difficulty debugging > performance improvements and regressions without these figures. > It's crucial for us to understand the true effectiveness of > mTHP in real-world scenarios, especially in systems with > fragmented memory. > > This patch sets up the framework for per-order mTHP counters, > starting with the introduction of alloc_success and alloc_fail > counters. Incorporating additional counters should now be > straightforward as well. > > The initial two unsigned longs for each event are unused, given > that order-0 and order-1 are not mTHP. Nonetheless, this refinement > improves code clarity. Overall, this approach looks good to me - thanks for doing this! > > Signed-off-by: Barry Song > --- > -v2: > * move to sysfs and provide per-order counters; David, Ryan, Willy > -v1: > https://lore.kernel.org/linux-mm/20240326030103.50678-1-21cnbao@gmail.com/ > > include/linux/huge_mm.h | 17 +++++++++++++ > mm/huge_memory.c | 54 +++++++++++++++++++++++++++++++++++++++++ > mm/memory.c | 3 +++ > 3 files changed, 74 insertions(+) > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index e896ca4760f6..27fa26a22a8f 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -264,6 +264,23 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma, > enforce_sysfs, orders); > } > > +enum thp_event_item { > + THP_ALLOC_SUCCESS, > + THP_ALLOC_FAIL, > + NR_THP_EVENT_ITEMS > +}; > + > +struct thp_event_state { > + unsigned long event[PMD_ORDER + 1][NR_THP_EVENT_ITEMS]; > +}; > + > +DECLARE_PER_CPU(struct thp_event_state, thp_event_states); > + > +static inline void count_thp_event(int order, enum thp_event_item item) We previously concluded that we will likely want a "currently allocated THPs" counter, which will need both increment and decrement. So perhaps "thp_event_inc()" (and eventual "thp_event_dec()" are more appropriate? And perhaps "event" isn't the right description either since it implies always counting upwards (to me at least). Although I guess you have borrowed from the vmstat pattern? That file has some instantaneous counters, so how does it decrement? > +{ > + this_cpu_inc(thp_event_states.event[order][item]); Could we declare as "PMD_ORDER - 1" then do [order - 2] here? That would save 16 bytes per CPU. (8K in a system with 512 CPUs). Possibly with a order_index() helper since the *_show() functions will need this too. > +} > + > #define transparent_hugepage_use_zero_page() \ > (transparent_hugepage_flags & \ > (1< diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 1683de78c313..addd093d8410 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -526,6 +526,52 @@ static const struct kobj_type thpsize_ktype = { > .sysfs_ops = &kobj_sysfs_ops, > }; > > +DEFINE_PER_CPU(struct thp_event_state, thp_event_states) = {{{0}}}; > + > +static unsigned long sum_thp_events(int order, enum thp_event_item item) > +{ > + unsigned long sum = 0; > + int cpu; > + > + for_each_online_cpu(cpu) { > + struct thp_event_state *this = &per_cpu(thp_event_states, cpu); > + > + sum += this->event[order][item]; > + } > + > + return sum; > +} > + > +static ssize_t alloc_success_show(struct kobject *kobj, > + struct kobj_attribute *attr, char *buf) > +{ > + int order = to_thpsize(kobj)->order; > + > + return sysfs_emit(buf, "%lu\n", sum_thp_events(order, THP_ALLOC_SUCCESS)); > +} > + > +static ssize_t alloc_fail_show(struct kobject *kobj, > + struct kobj_attribute *attr, char *buf) > +{ > + int order = to_thpsize(kobj)->order; > + > + return sysfs_emit(buf, "%lu\n", sum_thp_events(order, THP_ALLOC_FAIL)); > +} > + > +static struct kobj_attribute alloc_success_attr = __ATTR_RO(alloc_success); > +static struct kobj_attribute alloc_fail_attr = __ATTR_RO(alloc_fail); > + > +static struct attribute *stats_attrs[] = { > + &alloc_success_attr.attr, > + &alloc_fail_attr.attr, > + NULL, > +}; > + > +static struct attribute_group stats_attr_group = { > + .name = "stats", > + .attrs = stats_attrs, > +}; > + > static struct thpsize *thpsize_create(int order, struct kobject *parent) > { > unsigned long size = (PAGE_SIZE << order) / SZ_1K; > @@ -549,6 +595,12 @@ static struct thpsize *thpsize_create(int order, struct kobject *parent) > return ERR_PTR(ret); > } > > + ret = sysfs_create_group(&thpsize->kobj, &stats_attr_group); > + if (ret) { > + kobject_put(&thpsize->kobj); > + return ERR_PTR(ret); > + } > + > thpsize->order = order; > return thpsize; > } > @@ -1050,8 +1102,10 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf) > folio = vma_alloc_folio(gfp, HPAGE_PMD_ORDER, vma, haddr, true); > if (unlikely(!folio)) { > count_vm_event(THP_FAULT_FALLBACK); > + count_thp_event(HPAGE_PMD_ORDER, THP_ALLOC_FAIL); > return VM_FAULT_FALLBACK; > } > + count_thp_event(HPAGE_PMD_ORDER, THP_ALLOC_SUCCESS); > return __do_huge_pmd_anonymous_page(vmf, &folio->page, gfp); > } > > diff --git a/mm/memory.c b/mm/memory.c > index 984b138f85b4..c9c1031c2ecb 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4365,7 +4365,10 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf) > } > folio_throttle_swaprate(folio, gfp); > clear_huge_page(&folio->page, vmf->address, 1 << order); > + count_thp_event(order, THP_ALLOC_SUCCESS); > return folio; > + } else { Do we need this else, given the last statement in the "if" clause is return? > + count_thp_event(order, THP_ALLOC_FAIL); > } > next: > order = next_order(&orders, order); Thanks, Ryan