From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D4655C3DA59
	for <linux-mm@archiver.kernel.org>; Fri, 19 Jul 2024 08:56:43 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 64D356B0088; Fri, 19 Jul 2024 04:56:43 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 5D65F6B0089; Fri, 19 Jul 2024 04:56:43 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 476936B008C; Fri, 19 Jul 2024 04:56:43 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13])
	by kanga.kvack.org (Postfix) with ESMTP id 253B06B0088
	for <linux-mm@kvack.org>; Fri, 19 Jul 2024 04:56:43 -0400 (EDT)
Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay03.hostedemail.com (Postfix) with ESMTP id B9C57A0546
	for <linux-mm@kvack.org>; Fri, 19 Jul 2024 08:56:42 +0000 (UTC)
X-FDA: 82355896644.16.74E2FCE
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by imf28.hostedemail.com (Postfix) with ESMTP id 84F2BC0005
	for <linux-mm@kvack.org>; Fri, 19 Jul 2024 08:56:40 +0000 (UTC)
Authentication-Results: imf28.hostedemail.com;
	dkim=none;
	spf=pass (imf28.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com;
	dmarc=pass (policy=none) header.from=arm.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1721379359;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=UZiVYBSjlXgZWchpYXqBOnpqDv00Bx+dm99GKvRKekg=;
	b=mYAPSKEke5Oa8ObLzHwRn/pM0lJWUFmKqgFvaXd/DdB2vfTAhIpsnnlHuQNed5YZiXmgkH
	AeHP7apoHQT8wsX4VKdAsVXeFJPnjYtSVAV6KRHqS/077DP6qp7L2VB7NW9ve60jtxwzne
	rYdo4Y+T34O1NTwnVPBeLR9Gj7Iw028=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721379359; a=rsa-sha256;
	cv=none;
	b=aaLy9IhnxTEdpQGdlQReWDzJBIIN0Q1uU+HhCWP507gCrvTaUbwP4vxs5/wRNdagLSgkXs
	hTV7gWt5VOVd8HqaVOaTsm/Zup0u85vyih0v/vT6B5wvzF5JkBGcvE8RmndBBUy3z32Gl6
	D81vephBgIao13gBrLWxqfWQ3nFhUDc=
ARC-Authentication-Results: i=1;
	imf28.hostedemail.com;
	dkim=none;
	spf=pass (imf28.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com;
	dmarc=pass (policy=none) header.from=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DD3091042;
	Fri, 19 Jul 2024 01:57:04 -0700 (PDT)
Received: from [10.57.76.151] (unknown [10.57.76.151])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D2DFD3F762;
	Fri, 19 Jul 2024 01:56:37 -0700 (PDT)
Message-ID: <223acdf2-ba25-4c31-94df-a212d31bd6c3@arm.com>
Date: Fri, 19 Jul 2024 09:56:36 +0100
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH v2 3/3] mm: mTHP stats for pagecache folio allocations
Content-Language: en-GB
To: Barry Song <baohua@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>, Hugh Dickins
 <hughd@google.com>, Jonathan Corbet <corbet@lwn.net>,
 "Matthew Wilcox (Oracle)" <willy@infradead.org>,
 David Hildenbrand <david@redhat.com>, Lance Yang <ioworker0@gmail.com>,
 Baolin Wang <baolin.wang@linux.alibaba.com>, Gavin Shan <gshan@redhat.com>,
 linux-kernel@vger.kernel.org, linux-mm@kvack.org
References: <20240716135907.4047689-1-ryan.roberts@arm.com>
 <20240716135907.4047689-4-ryan.roberts@arm.com>
 <CAGsJ_4xfiVNH-cQtD_rMrHvzx1a9Ap6CcqsqbxyAEOTB-9Jvhw@mail.gmail.com>
From: Ryan Roberts <ryan.roberts@arm.com>
In-Reply-To: <CAGsJ_4xfiVNH-cQtD_rMrHvzx1a9Ap6CcqsqbxyAEOTB-9Jvhw@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Rspam-User: 
X-Rspamd-Server: rspam04
X-Rspamd-Queue-Id: 84F2BC0005
X-Stat-Signature: wrsgnkd9zow6b7fqa5kt48wzsf68q3mm
X-HE-Tag: 1721379400-802587
X-HE-Meta: U2FsdGVkX1/dvhz1L0tvWOu6ZJaFW+fRhYj4wA5tT0wutjnUd5YGzW8gh5vZI1z4zU83+W2T711Yv6MJwinPQNUr/dPW4pReLq4tvUZlBfjU2rb6d4UdeNDY1Ub0SJytOi5rGVlE64LC8KwhNqnKdOjp+oGXP8RfwQBn72UDpN5JrHFQ6tyhXhqAcozd/ienNXvOsKzjOVKuWMYFO2uoYgQ6Y2p8f4VnlmQHeLx5HubXE4NZ9RVCzwhSAjQjS3G0fsln+uGXdI1Uys0ovOaaYu41PwYzhq/Je+JxhXCCrGYwX26H4tDHREoCmawY8Iyghlgz0KiTM970EDRq9Tn5bSYM5u+y7BhzxaCkUnkjuia15DU+w1xaK04ZMGug3auNn5epsNwsgE/Oyv55vPVCYqm6wwfWDl001g7T/v+8u5BTJy4Ugo8giZkSRCGcwR+EpNx/TRhtEilSw445BuSanSL3ikgdCRiRMtrcWaYtPUPVlkkwiu6IzWa4IibgFVqts89YuLsBMyrdWAHbQMQVgfLY9qJWA13UmbPrdE/gktmsPdOno1IH0V+JOzL1b9X2wWqILWmtFcrPD7NgJrzfShVwuf2ON8bWb42Yf1u/k+8owNHd9/sizNXGnRjApAf8TWjNyBoUzS4jBIFr67CM/9d7UkyM0FQspgjaak5FrPbR2E66QcRwiG1Lvd/4jRRhw54v08KFqztalEHMAQNDsHnbRu/aBUfTR2YT7AgRN4d/4g45VPW2bDwk+ygni1g1NDvnd+zL+Ba1aPLmydSh8qP4YwGkrZcNLERNe/XN21V/8Y9dykGdRq8KVX0DkQu0804F3as6QUKOjdw6HVKV1kXY3Z3eHiZm2/Qdze1+r5EERFeasCDxuGewhQEQUkntpfBMHZYU+YIgnzFXOBIVpRKzCDkoPqRNavrXnH7KEB1TEVmvnevrOmv3eVX8X5g+vZMyJ0WfkCgzNnFIw0h
 lzhK9Bh0
 oIvGxF4HYnqEK8eE5VgbEOiCF454R/pZYFLg228LO8Sst5Kwj/wHPv3W9CEp67RT3/vwS9mhz1KnLlXKTUUPiW78HZ61E4VSbGCQepolOLyT3WZ/TXVRFn6dUda1KQxBU3eV1a7X1mCxExAjWPAcLC8iBIIecNDpF29mrf8q1Qg58+pcVzoczyHLtGw2792nBc+xk+/ZOy1Ljn35Ny+zIuK0So0wZb2RcADgrEtdgbsL/dKk=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On 19/07/2024 01:12, Barry Song wrote:
> On Wed, Jul 17, 2024 at 1:59 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> Expose 3 new mTHP stats for file (pagecache) folio allocations:
>>
>>   /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_alloc
>>   /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback
>>   /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback_charge
>>
>> This will provide some insight on the sizes of large folios being
>> allocated for file-backed memory, and how often allocation is failing.
>>
>> All non-order-0 (and most order-0) folio allocations are currently done
>> through filemap_alloc_folio(), and folios are charged in a subsequent
>> call to filemap_add_folio(). So count file_fallback when allocation
>> fails in filemap_alloc_folio() and count file_alloc or
>> file_fallback_charge in filemap_add_folio(), based on whether charging
>> succeeded or not. There are some users of filemap_add_folio() that
>> allocate their own order-0 folio by other means, so we would not count
>> an allocation failure in this case, but we also don't care about order-0
>> allocations. This approach feels like it should be good enough and
>> doesn't require any (impractically large) refactoring.
>>
>> The existing mTHP stats interface is reused to provide consistency to
>> users. And because we are reusing the same interface, we can reuse the
>> same infrastructure on the kernel side.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  Documentation/admin-guide/mm/transhuge.rst | 13 +++++++++++++
>>  include/linux/huge_mm.h                    |  3 +++
>>  include/linux/pagemap.h                    | 16 ++++++++++++++--
>>  mm/filemap.c                               |  6 ++++--
>>  mm/huge_memory.c                           |  7 +++++++
>>  5 files changed, 41 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
>> index 058485daf186..d4857e457add 100644
>> --- a/Documentation/admin-guide/mm/transhuge.rst
>> +++ b/Documentation/admin-guide/mm/transhuge.rst
>> @@ -512,6 +512,19 @@ shmem_fallback_charge
>>         falls back to using small pages even though the allocation was
>>         successful.
>>
>> +file_alloc
>> +       is incremented every time a file huge page is successfully
>> +       allocated.
>> +
>> +file_fallback
>> +       is incremented if a file huge page is attempted to be allocated
>> +       but fails and instead falls back to using small pages.
>> +
>> +file_fallback_charge
>> +       is incremented if a file huge page cannot be charged and instead
>> +       falls back to using small pages even though the allocation was
>> +       successful.
>> +
> 
> I realized that when we talk about fallback, it doesn't necessarily mean
> small pages; it could also refer to smaller huge pages.

Yes good point, I'll update the documentation to reflect that for all memory types.

> 
> anon_fault_alloc
>         is incremented every time a huge page is successfully
>         allocated and charged to handle a page fault.
> 
> anon_fault_fallback
>         is incremented if a page fault fails to allocate or charge
>         a huge page and instead falls back to using huge pages with
>         lower orders or small pages.
> 
> anon_fault_fallback_charge
>         is incremented if a page fault fails to charge a huge page and
>         instead falls back to using huge pages with lower orders or
>         small pages even though the allocation was successful.
> 
> This also applies to files, right?

It does in the place you highlight below, but page_cache_ra_order() just falls
back immediately to order-0. Regardless, I think we should just document the
user facing docs to allow for a lower high order. That gives the implementation
more flexibility.

> 
>                 do {
>                         gfp_t alloc_gfp = gfp;
> 
>                         err = -ENOMEM;
>                         if (order > 0)
>                                 alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
>                         folio = filemap_alloc_folio(alloc_gfp, order);
>                         if (!folio)
>                                 continue;
> 
>                         /* Init accessed so avoid atomic
> mark_page_accessed later */
>                         if (fgp_flags & FGP_ACCESSED)
>                                 __folio_set_referenced(folio);
> 
>                         err = filemap_add_folio(mapping, folio, index, gfp);
>                         if (!err)
>                                 break;
>                         folio_put(folio);
>                         folio = NULL;
>                 } while (order-- > 0);
> 
> 
>>  split
>>         is incremented every time a huge page is successfully split into
>>         smaller orders. This can happen for a variety of reasons but a
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index b8c63c3e967f..4f9109fcdded 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -123,6 +123,9 @@ enum mthp_stat_item {
>>         MTHP_STAT_SHMEM_ALLOC,
>>         MTHP_STAT_SHMEM_FALLBACK,
>>         MTHP_STAT_SHMEM_FALLBACK_CHARGE,
>> +       MTHP_STAT_FILE_ALLOC,
>> +       MTHP_STAT_FILE_FALLBACK,
>> +       MTHP_STAT_FILE_FALLBACK_CHARGE,
>>         MTHP_STAT_SPLIT,
>>         MTHP_STAT_SPLIT_FAILED,
>>         MTHP_STAT_SPLIT_DEFERRED,
>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
>> index 6e2f72d03176..95a147b5d117 100644
>> --- a/include/linux/pagemap.h
>> +++ b/include/linux/pagemap.h
>> @@ -562,14 +562,26 @@ static inline void *detach_page_private(struct page *page)
>>  }
>>
>>  #ifdef CONFIG_NUMA
>> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order);
>> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order);
>>  #else
>> -static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>> +static inline struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>>  {
>>         return folio_alloc_noprof(gfp, order);
>>  }
>>  #endif
>>
>> +static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>> +{
>> +       struct folio *folio;
>> +
>> +       folio = __filemap_alloc_folio_noprof(gfp, order);
>> +
>> +       if (!folio)
>> +               count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK);
>> +
>> +       return folio;
>> +}
> 
> Do we need to add and export __filemap_alloc_folio_noprof()? 

It is exported. See the below change in filemap.c. Previously
filemap_alloc_folio_noprof() was exported, but that is now an inline and
__filemap_alloc_folio_noprof() is exported in its place.

> In any case,
> we won't call count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK) and
> will only allocate the folio instead?

Sorry I don't understand what you mean by this?

> 
>> +
>>  #define filemap_alloc_folio(...)                               \
>>         alloc_hooks(filemap_alloc_folio_noprof(__VA_ARGS__))
>>
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 53d5d0410b51..131d514fca29 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -963,6 +963,8 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
>>         int ret;
>>
>>         ret = mem_cgroup_charge(folio, NULL, gfp);
>> +       count_mthp_stat(folio_order(folio),
>> +               ret ? MTHP_STAT_FILE_FALLBACK_CHARGE : MTHP_STAT_FILE_ALLOC);
>>         if (ret)
>>                 return ret;
> 
> Would the following be better?
> 
>         ret = mem_cgroup_charge(folio, NULL, gfp);
>          if (ret) {
>                  count_mthp_stat(folio_order(folio),
> MTHP_STAT_FILE_FALLBACK_CHARGE);
>                  return ret;
>          }
>        count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC);
> 
> Anyway, it's up to you. The code just feels a bit off to me :-)

Yes, agree your version is better. I also noticed that anon and shmem increment
FALLBACK whenever FALLBACK_CHARGE is incremented so I should apply those same
semantics here.

Thanks,
Ryan


> 
>>
>> @@ -990,7 +992,7 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
>>  EXPORT_SYMBOL_GPL(filemap_add_folio);
>>
>>  #ifdef CONFIG_NUMA
>> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>>  {
>>         int n;
>>         struct folio *folio;
>> @@ -1007,7 +1009,7 @@ struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>>         }
>>         return folio_alloc_noprof(gfp, order);
>>  }
>> -EXPORT_SYMBOL(filemap_alloc_folio_noprof);
>> +EXPORT_SYMBOL(__filemap_alloc_folio_noprof);
>>  #endif
>>
>>  /*
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 578ac212c172..26d558e3e80f 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -608,7 +608,14 @@ static struct attribute_group anon_stats_attr_grp = {
>>         .attrs = anon_stats_attrs,
>>  };
>>
>> +DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
>> +DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
>> +DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
>> +
>>  static struct attribute *file_stats_attrs[] = {
>> +       &file_alloc_attr.attr,
>> +       &file_fallback_attr.attr,
>> +       &file_fallback_charge_attr.attr,
>>  #ifdef CONFIG_SHMEM
>>         &shmem_alloc_attr.attr,
>>         &shmem_fallback_attr.attr,
>> --
>> 2.43.0
>>
> 
> Thanks
> Barry