From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3AB15C3DA70
	for <linux-mm@archiver.kernel.org>; Wed, 24 Jul 2024 08:12:09 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 76E336B007B; Wed, 24 Jul 2024 04:12:08 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 71D896B0082; Wed, 24 Jul 2024 04:12:08 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 5E5EB6B0083; Wed, 24 Jul 2024 04:12:08 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16])
	by kanga.kvack.org (Postfix) with ESMTP id 3F6F16B007B
	for <linux-mm@kvack.org>; Wed, 24 Jul 2024 04:12:08 -0400 (EDT)
Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay10.hostedemail.com (Postfix) with ESMTP id A3067C076B
	for <linux-mm@kvack.org>; Wed, 24 Jul 2024 08:12:07 +0000 (UTC)
X-FDA: 82373928294.05.1FE317E
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by imf19.hostedemail.com (Postfix) with ESMTP id A67E01A001D
	for <linux-mm@kvack.org>; Wed, 24 Jul 2024 08:12:05 +0000 (UTC)
Authentication-Results: imf19.hostedemail.com;
	dkim=none;
	spf=pass (imf19.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com;
	dmarc=pass (policy=none) header.from=arm.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1721808662;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=EWQWCT8oKLYGpZm42GUWuPY9GN0yFswvQI7jvC1g1Xo=;
	b=PkEu6CHphLw1HzDnMfAj8Bxg4joQHB7y9Y8dSNf/iA9qTlPuDsA6b5qmw0bg8AnBiiLzEM
	BLtrsgc5jG/v+eeMWdilomURaDP4e1Dqg7Mh2I7oZxUV2UlFrzi8cEwtUol8yw/FddzDhi
	PBYlw+nLuprQPUAd5DM2WhTaRcgG0Js=
ARC-Authentication-Results: i=1;
	imf19.hostedemail.com;
	dkim=none;
	spf=pass (imf19.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com;
	dmarc=pass (policy=none) header.from=arm.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721808662; a=rsa-sha256;
	cv=none;
	b=ocnL8mh3tfx7IVjzGpsdppFd8ufB2R+5PYvUGo8imGzuEwjCmEFKBgUWl182RM8ttFbsml
	u3aCIFKz2n2kcLXqlGS5EqOxMmx4yuEwzAy6Fe2USGeYcAd6t8IdLq3qVRigax85y6Hm4u
	KvuHzC9aWFmIShv17pQjG7aMQxKcJYY=
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 114F4106F;
	Wed, 24 Jul 2024 01:12:30 -0700 (PDT)
Received: from [10.57.78.42] (unknown [10.57.78.42])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 229E13F766;
	Wed, 24 Jul 2024 01:12:03 -0700 (PDT)
Message-ID: <a9b11f05-62a6-4e30-aac8-ffb44b6d7d36@arm.com>
Date: Wed, 24 Jul 2024 09:12:01 +0100
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH v2 3/3] mm: mTHP stats for pagecache folio allocations
Content-Language: en-GB
To: Barry Song <baohua@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>, Hugh Dickins
 <hughd@google.com>, Jonathan Corbet <corbet@lwn.net>,
 "Matthew Wilcox (Oracle)" <willy@infradead.org>,
 David Hildenbrand <david@redhat.com>, Lance Yang <ioworker0@gmail.com>,
 Baolin Wang <baolin.wang@linux.alibaba.com>, Gavin Shan <gshan@redhat.com>,
 linux-kernel@vger.kernel.org, linux-mm@kvack.org
References: <20240716135907.4047689-1-ryan.roberts@arm.com>
 <20240716135907.4047689-4-ryan.roberts@arm.com>
 <CAGsJ_4xfiVNH-cQtD_rMrHvzx1a9Ap6CcqsqbxyAEOTB-9Jvhw@mail.gmail.com>
 <223acdf2-ba25-4c31-94df-a212d31bd6c3@arm.com>
 <CAGsJ_4wz-rMZC1PX1mM+aq7gdFhOFvSUvtfsN5ufz74w_jEfbw@mail.gmail.com>
From: Ryan Roberts <ryan.roberts@arm.com>
In-Reply-To: <CAGsJ_4wz-rMZC1PX1mM+aq7gdFhOFvSUvtfsN5ufz74w_jEfbw@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Rspamd-Server: rspam06
X-Rspamd-Queue-Id: A67E01A001D
X-Stat-Signature: u3afey8onwidwok5c9jzwy9z7ij6kmy3
X-Rspam-User: 
X-HE-Tag: 1721808725-703636
X-HE-Meta: U2FsdGVkX1/41MXDH6uL6Cv0/F9Uz0Rv0ph+vYPXA8+E4D3NUL/ZTUqPE7r1r2lXP1WVpXi8vqtFwyUbTSpE9/sFPN3d6qn+zWvuokX1VovB8TVp0QoenZ4xmJ83nz3Z4aYWUxEw7zh0VlIpwM1gqUmJ2oTLvDZdK32/Y7HEpT28BzEYyzk0gJl0pLMouwcRc1HBvQ0Vx+3YNDM2f2ZVS0v+6co6OrpUcF1jUmgUDhCAA4lmzYiUMJV4+nTRbveNopnQcpkW8aw7ML/ftXfeyE7xVDLJW1TUpf+qoxrTlRTEetv2CFMTpVvcNu9GSnLaE+09hplKHHfcsRX+alYfYsCeFI2Z802fXF0aYU4+hL4Y/RVrUoORYWLreSu4V9hQyC09cH9NVpWobnFOk7ftT5RvWO6jWiDiMjQUnXWKRMW3HVxrvFf5jDcHEVjFY2JhQdww1kbmjV+yWD2UTg+FgYXFdDO/iVwgB+i5IyJcQDVDSUJuBFwrd1PR4jUS4HkwsP4HXL/7g3WkA/EnD2wtVdb8gnTXK79+Abr90wXVQmRfmr2s64S6gVHvXTt6dUrTQNjoq94snzOKN7rtMmX/aNsteOYosXESH+YY7RzVNKStLifeaEVP6WXfIQwD50GPVLu5rKnrw32JoqrJKTr3VM+CqfrBJ+30Nx1hz88ZLNWVzSNv2JDTWdinALAAIv/8ywpwli032GR9tj2jlL8LDvcV7TDl4fIj13hcMwss4EaZ3N/zSN9LLDJ320BNsvP1dhDo5Mz8x9erxvPXk0K7uNyPYYrrd3v5ziu+TR23IV+HfylEDerNSEFoZQZue25cyw+4J1JtEd5jDpjoxSzHnYbECL+Bzv2HpwJqXNruVP3iFVBR9ojDKpRhp9nluxKMDnEtwwIp6HuaWFFL2MIazyiDGWFa+AOsTtNJNvBkWg0E5c0BEANKiqjy9jFTQ7TVrJMwl+VIXK3SaLycxxD
 fyRex74q
 EfnkjX8zimIL3qJS7Kbxx/UblbbSqhqPmv9nspC7BfdqZwoVBW9uXegwWZl46SN1Aq1MxuSx2Cu9mNIGKKBT/56Rz8jE3VeO5ldIdyrDP0dNuwqXepP1L0wKeQu/AWTuDpKgaf6/yb8ymn8KixOFZozUOWW7HIMBW7QTi+TUp205qlv0wCqR8hVDoZyWvLLzbaJ049mqGa0zrCRgM4Ixt9jnz2/bydfAiYsPC/s6DnUrQWH0=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On 23/07/2024 23:07, Barry Song wrote:
> On Fri, Jul 19, 2024 at 8:56 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 19/07/2024 01:12, Barry Song wrote:
>>> On Wed, Jul 17, 2024 at 1:59 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>
>>>> Expose 3 new mTHP stats for file (pagecache) folio allocations:
>>>>
>>>>   /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_alloc
>>>>   /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback
>>>>   /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback_charge
>>>>
>>>> This will provide some insight on the sizes of large folios being
>>>> allocated for file-backed memory, and how often allocation is failing.
>>>>
>>>> All non-order-0 (and most order-0) folio allocations are currently done
>>>> through filemap_alloc_folio(), and folios are charged in a subsequent
>>>> call to filemap_add_folio(). So count file_fallback when allocation
>>>> fails in filemap_alloc_folio() and count file_alloc or
>>>> file_fallback_charge in filemap_add_folio(), based on whether charging
>>>> succeeded or not. There are some users of filemap_add_folio() that
>>>> allocate their own order-0 folio by other means, so we would not count
>>>> an allocation failure in this case, but we also don't care about order-0
>>>> allocations. This approach feels like it should be good enough and
>>>> doesn't require any (impractically large) refactoring.
>>>>
>>>> The existing mTHP stats interface is reused to provide consistency to
>>>> users. And because we are reusing the same interface, we can reuse the
>>>> same infrastructure on the kernel side.
>>>>
>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>> ---
>>>>  Documentation/admin-guide/mm/transhuge.rst | 13 +++++++++++++
>>>>  include/linux/huge_mm.h                    |  3 +++
>>>>  include/linux/pagemap.h                    | 16 ++++++++++++++--
>>>>  mm/filemap.c                               |  6 ++++--
>>>>  mm/huge_memory.c                           |  7 +++++++
>>>>  5 files changed, 41 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
>>>> index 058485daf186..d4857e457add 100644
>>>> --- a/Documentation/admin-guide/mm/transhuge.rst
>>>> +++ b/Documentation/admin-guide/mm/transhuge.rst
>>>> @@ -512,6 +512,19 @@ shmem_fallback_charge
>>>>         falls back to using small pages even though the allocation was
>>>>         successful.
>>>>
>>>> +file_alloc
>>>> +       is incremented every time a file huge page is successfully
>>>> +       allocated.
>>>> +
>>>> +file_fallback
>>>> +       is incremented if a file huge page is attempted to be allocated
>>>> +       but fails and instead falls back to using small pages.
>>>> +
>>>> +file_fallback_charge
>>>> +       is incremented if a file huge page cannot be charged and instead
>>>> +       falls back to using small pages even though the allocation was
>>>> +       successful.
>>>> +
>>>
>>> I realized that when we talk about fallback, it doesn't necessarily mean
>>> small pages; it could also refer to smaller huge pages.
>>
>> Yes good point, I'll update the documentation to reflect that for all memory types.
>>
>>>
>>> anon_fault_alloc
>>>         is incremented every time a huge page is successfully
>>>         allocated and charged to handle a page fault.
>>>
>>> anon_fault_fallback
>>>         is incremented if a page fault fails to allocate or charge
>>>         a huge page and instead falls back to using huge pages with
>>>         lower orders or small pages.
>>>
>>> anon_fault_fallback_charge
>>>         is incremented if a page fault fails to charge a huge page and
>>>         instead falls back to using huge pages with lower orders or
>>>         small pages even though the allocation was successful.
>>>
>>> This also applies to files, right?
>>
>> It does in the place you highlight below, but page_cache_ra_order() just falls
>> back immediately to order-0. Regardless, I think we should just document the
>> user facing docs to allow for a lower high order. That gives the implementation
>> more flexibility.
>>
>>>
>>>                 do {
>>>                         gfp_t alloc_gfp = gfp;
>>>
>>>                         err = -ENOMEM;
>>>                         if (order > 0)
>>>                                 alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
>>>                         folio = filemap_alloc_folio(alloc_gfp, order);
>>>                         if (!folio)
>>>                                 continue;
>>>
>>>                         /* Init accessed so avoid atomic
>>> mark_page_accessed later */
>>>                         if (fgp_flags & FGP_ACCESSED)
>>>                                 __folio_set_referenced(folio);
>>>
>>>                         err = filemap_add_folio(mapping, folio, index, gfp);
>>>                         if (!err)
>>>                                 break;
>>>                         folio_put(folio);
>>>                         folio = NULL;
>>>                 } while (order-- > 0);
>>>
>>>
>>>>  split
>>>>         is incremented every time a huge page is successfully split into
>>>>         smaller orders. This can happen for a variety of reasons but a
>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>> index b8c63c3e967f..4f9109fcdded 100644
>>>> --- a/include/linux/huge_mm.h
>>>> +++ b/include/linux/huge_mm.h
>>>> @@ -123,6 +123,9 @@ enum mthp_stat_item {
>>>>         MTHP_STAT_SHMEM_ALLOC,
>>>>         MTHP_STAT_SHMEM_FALLBACK,
>>>>         MTHP_STAT_SHMEM_FALLBACK_CHARGE,
>>>> +       MTHP_STAT_FILE_ALLOC,
>>>> +       MTHP_STAT_FILE_FALLBACK,
>>>> +       MTHP_STAT_FILE_FALLBACK_CHARGE,
>>>>         MTHP_STAT_SPLIT,
>>>>         MTHP_STAT_SPLIT_FAILED,
>>>>         MTHP_STAT_SPLIT_DEFERRED,
>>>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
>>>> index 6e2f72d03176..95a147b5d117 100644
>>>> --- a/include/linux/pagemap.h
>>>> +++ b/include/linux/pagemap.h
>>>> @@ -562,14 +562,26 @@ static inline void *detach_page_private(struct page *page)
>>>>  }
>>>>
>>>>  #ifdef CONFIG_NUMA
>>>> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order);
>>>> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order);
>>>>  #else
>>>> -static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>>>> +static inline struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>>>>  {
>>>>         return folio_alloc_noprof(gfp, order);
>>>>  }
>>>>  #endif
>>>>
>>>> +static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>>>> +{
>>>> +       struct folio *folio;
>>>> +
>>>> +       folio = __filemap_alloc_folio_noprof(gfp, order);
>>>> +
>>>> +       if (!folio)
>>>> +               count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK);
>>>> +
>>>> +       return folio;
>>>> +}
>>>
>>> Do we need to add and export __filemap_alloc_folio_noprof()?
>>
>> It is exported. See the below change in filemap.c. Previously
>> filemap_alloc_folio_noprof() was exported, but that is now an inline and
>> __filemap_alloc_folio_noprof() is exported in its place.
>>
>>> In any case,
>>> we won't call count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK) and
>>> will only allocate the folio instead?
>>
>> Sorry I don't understand what you mean by this?
> 
> Ryan, my question is whether exporting __filemap_alloc_folio_noprof() might lead
> to situations where people call __filemap_alloc_folio_noprof() and
> forget to call
> count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK). Currently, it seems
> that counting is always necessary? 

OK to make sure I've understood, I think you are saying that there is a risk
that people will call __filemap_alloc_folio_noprof() and bypass the stat
counting? But its the same problem with all "_noprof" functions; if those are
called directly, it bypasses the memory allocation profiling infrastructure. So
this just needs to be a case of "don't do that" IMHO. filemap_alloc_folio() is
the public API.

> Another option is that we still
> call count mthp
> within filemap_alloc_folio_noprof() and make it noinline if
> __filemap_alloc_folio_noprof()
> is not inline?

I could certainly make filemap_alloc_folio_noprof() always out-of-line and then
handle the counting privately in the compilation unit. But before my change
filemap_alloc_folio_noprof() was inline for the !CONFIG_NUMA case and I was
trying not to change that. I think what you're suggesting would be tidier
though. I'll make the change. But I don't think it solves the wider problem of
the possibility that people can call private APIs. That's what review is for.

Thanks,
Ryan


> 
>>
>>>
>>>> +
>>>>  #define filemap_alloc_folio(...)                               \
>>>>         alloc_hooks(filemap_alloc_folio_noprof(__VA_ARGS__))
>>>>
>>>> diff --git a/mm/filemap.c b/mm/filemap.c
>>>> index 53d5d0410b51..131d514fca29 100644
>>>> --- a/mm/filemap.c
>>>> +++ b/mm/filemap.c
>>>> @@ -963,6 +963,8 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
>>>>         int ret;
>>>>
>>>>         ret = mem_cgroup_charge(folio, NULL, gfp);
>>>> +       count_mthp_stat(folio_order(folio),
>>>> +               ret ? MTHP_STAT_FILE_FALLBACK_CHARGE : MTHP_STAT_FILE_ALLOC);
>>>>         if (ret)
>>>>                 return ret;
>>>
>>> Would the following be better?
>>>
>>>         ret = mem_cgroup_charge(folio, NULL, gfp);
>>>          if (ret) {
>>>                  count_mthp_stat(folio_order(folio),
>>> MTHP_STAT_FILE_FALLBACK_CHARGE);
>>>                  return ret;
>>>          }
>>>        count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC);
>>>
>>> Anyway, it's up to you. The code just feels a bit off to me :-)
>>
>> Yes, agree your version is better. I also noticed that anon and shmem increment
>> FALLBACK whenever FALLBACK_CHARGE is incremented so I should apply those same
>> semantics here.
>>
>> Thanks,
>> Ryan
>>
>>
>>>
>>>>
>>>> @@ -990,7 +992,7 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
>>>>  EXPORT_SYMBOL_GPL(filemap_add_folio);
>>>>
>>>>  #ifdef CONFIG_NUMA
>>>> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>>>> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>>>>  {
>>>>         int n;
>>>>         struct folio *folio;
>>>> @@ -1007,7 +1009,7 @@ struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>>>>         }
>>>>         return folio_alloc_noprof(gfp, order);
>>>>  }
>>>> -EXPORT_SYMBOL(filemap_alloc_folio_noprof);
>>>> +EXPORT_SYMBOL(__filemap_alloc_folio_noprof);
>>>>  #endif
>>>>
>>>>  /*
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 578ac212c172..26d558e3e80f 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -608,7 +608,14 @@ static struct attribute_group anon_stats_attr_grp = {
>>>>         .attrs = anon_stats_attrs,
>>>>  };
>>>>
>>>> +DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
>>>> +DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
>>>> +DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
>>>> +
>>>>  static struct attribute *file_stats_attrs[] = {
>>>> +       &file_alloc_attr.attr,
>>>> +       &file_fallback_attr.attr,
>>>> +       &file_fallback_charge_attr.attr,
>>>>  #ifdef CONFIG_SHMEM
>>>>         &shmem_alloc_attr.attr,
>>>>         &shmem_fallback_attr.attr,
>>>> --
>>>> 2.43.0
>>>>
>>>
> 
> Thanks
> Barry