Re: [PATCH v1 2/2] mm: mTHP stats for pagecache folio allocations

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Ryan Roberts <ryan.roberts@arm.com>,
	Lance Yang <ioworker0@gmail.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>, Jonathan Corbet <corbet@lwn.net>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Barry Song <baohua@kernel.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v1 2/2] mm: mTHP stats for pagecache folio allocations
Date: Tue, 16 Jul 2024 12:19:30 +0200	[thread overview]
Message-ID: <5c58d9ea-8490-4ae6-b7bf-be816dab3356@redhat.com> (raw)
In-Reply-To: <a8441245-ae35-443f-9aea-325007492741@arm.com>

On 16.07.24 10:31, Ryan Roberts wrote:
> On 13/07/2024 11:45, Ryan Roberts wrote:
>> On 13/07/2024 02:08, David Hildenbrand wrote:
>>> On 12.07.24 14:22, Lance Yang wrote:
>>>> On Fri, Jul 12, 2024 at 11:00 AM Baolin Wang
>>>> <baolin.wang@linux.alibaba.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 2024/7/11 15:29, Ryan Roberts wrote:
>>>>>> Expose 3 new mTHP stats for file (pagecache) folio allocations:
>>>>>>
>>>>>>      /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_alloc
>>>>>>      /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback
>>>>>>     
>>>>>> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback_charge
>>>>>>
>>>>>> This will provide some insight on the sizes of large folios being
>>>>>> allocated for file-backed memory, and how often allocation is failing.
>>>>>>
>>>>>> All non-order-0 (and most order-0) folio allocations are currently done
>>>>>> through filemap_alloc_folio(), and folios are charged in a subsequent
>>>>>> call to filemap_add_folio(). So count file_fallback when allocation
>>>>>> fails in filemap_alloc_folio() and count file_alloc or
>>>>>> file_fallback_charge in filemap_add_folio(), based on whether charging
>>>>>> succeeded or not. There are some users of filemap_add_folio() that
>>>>>> allocate their own order-0 folio by other means, so we would not count
>>>>>> an allocation failure in this case, but we also don't care about order-0
>>>>>> allocations. This approach feels like it should be good enough and
>>>>>> doesn't require any (impractically large) refactoring.
>>>>>>
>>>>>> The existing mTHP stats interface is reused to provide consistency to
>>>>>> users. And because we are reusing the same interface, we can reuse the
>>>>>> same infrastructure on the kernel side. The one small wrinkle is that
>>>>>> the set of folio sizes supported by the pagecache are not identical to
>>>>>> those supported by anon and shmem; pagecache supports order-1, unlike
>>>>>> anon and shmem, and the max pagecache order may be less than PMD-size
>>>>>> (see arm64 with 64K base pages), again unlike anon and shmem. So we now
>>>>>> create a hugepages-*kB directory for the union of the sizes supported by
>>>>>> all 3 memory types and populate it with the relevant stats and controls.
>>>>>
>>>>> Personally, I like the idea that can help analyze the allocation of
>>>>> large folios for the page cache.
>>>>>
>>>>> However, I have a slight concern about the consistency of the interface.
>>>>>
>>>>> For 64K, the fields layout:
>>>>> ├── hugepages-64kB
>>>>> │   ├── enabled
>>>>> │   ├── shmem_enabled
>>>>> │   └── stats
>>>>> │       ├── anon_fault_alloc
>>>>> │       ├── anon_fault_fallback
>>>>> │       ├── anon_fault_fallback_charge
>>>>> │       ├── file_alloc
>>>>> │       ├── file_fallback
>>>>> │       ├── file_fallback_charge
>>>>> │       ├── shmem_alloc
>>>>> │       ├── shmem_fallback
>>>>> │       ├── shmem_fallback_charge
>>>>> │       ├── split
>>>>> │       ├── split_deferred
>>>>> │       ├── split_failed
>>>>> │       ├── swpout
>>>>> │       └── swpout_fallback
>>>>>
>>>>> But for 8K (for pagecache), you removed some fields (of course, I
>>>>> understand why they are not supported).
>>>>>
>>>>> ├── hugepages-8kB
>>>>> │   └── stats
>>>>> │       ├── file_alloc
>>>>> │       ├── file_fallback
>>>>> │       └── file_fallback_charge
>>>>>
>>>>> This might not be user-friendly for some user-space parsing tools, as
>>>>> they lack certain fields for the same pattern interfaces. Of course,
>>>>> this might not be an issue if we have clear documentation describing the
>>>>> differences here:)
>>>>>
>>>>> Another possible approach is to maintain the same field layout to keep
>>>>> consistent, but prohibit writing to the fields that are not supported by
>>>>> the pagecache, and any stats read from them would be 0.
>>>>
>>>> I agree that maintaining a uniform field layout, especially at the stats
>>>> level, might be necessary ;)
>>>>
>>>> Keeping a consistent interface could future-proof the design. It allows
>>>> for the possibility that features not currently supported for 8kB pages
>>>> might be enabled in the future.
>>>
>>> I'll just note that, with shmem/file effectively being disabled for order > 11,
>>> we'll also have entries there that are effectively unused.
>>
>> Indeed, I mentioned that in the commit log :)

Well, I think it's more extreme than what you mentioned.

For example, shmem_enable on arm64 with 64k is now effectively 
non-functional. Just like it will be for other orders in the anon-shmem 
case when the order exceeds MAX_PAGECACHE_ORDER.

>>
>>>
>>> Good question how we want to deal with that (stats are easy, but what about when
>>> we enable something? Maybe we should document that "enabled" is only effective
>>> when supported).
>>
>> The documentation already says "If enabling multiple hugepage sizes, the kernel
>> will select the most appropriate enabled size for a given allocation." for anon
>> THP (and I've added similar wording for my as-yet-unposted patch to add controls
>> for page cache folio sizes). So I think we could easily add dummy *enabled
>> controls for all sizes, that can be written to and read back consistently, but
>> the kernel just ignores them when deciding what size to use. It would also
>> simplify the code that populates the controls.
>>
>> Personally though, I'm not convinced of the value of trying to make the controls
>> for every size look identical. What's the real value to the user to pretend that
>> they can select a size that they cannot? What happens when we inevitably want to
>> add some new control in future which only applies to select sizes and there is
>> no good way to fake it for the other sizes? Why can't user space just be
>> expected to rely on the existance of the files rather than on the existance of
>> the directories?
>>
>> As always, I'll go with the majority, but just wanted to register my opinion.
> 
> Should I assume from the lack of reply on this that everyone else is in favour
> of adding dummy controls so that all sizes have the same set of controls? If I
> don't hear anything further, I'll post v2 with dummry controls today or tomorrow.

Sorry, busy with other stuff.

Indicating only what really exists sounds cleaner. But I wonder how we 
would want to handle in general orders that are effectively non-existant?

-- 
Cheers,

David / dhildenb

next prev parent reply	other threads:[~2024-07-16 10:19 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-11  7:29 [PATCH v1 0/2] mTHP allocation stats for file-backed memory Ryan Roberts
2024-07-11  7:29 ` [PATCH v1 1/2] mm: Cleanup count_mthp_stat() definition Ryan Roberts
2024-07-11  8:20   ` Barry Song
2024-07-12  2:31   ` Baolin Wang
2024-07-12 11:57   ` Lance Yang
2024-07-11  7:29 ` [PATCH v1 2/2] mm: mTHP stats for pagecache folio allocations Ryan Roberts
2024-07-12  3:00   ` Baolin Wang
2024-07-12 12:22     ` Lance Yang
2024-07-13  1:08       ` David Hildenbrand
2024-07-13 10:45         ` Ryan Roberts
2024-07-16  8:31           ` Ryan Roberts
2024-07-16 10:19             ` David Hildenbrand [this message]
2024-07-16 11:14               ` Ryan Roberts
2024-07-17  8:02                 ` David Hildenbrand
2024-07-17  8:29                   ` Ryan Roberts
2024-07-17  8:44                     ` David Hildenbrand
2024-07-17  9:50                       ` Ryan Roberts
2024-07-17 10:03                         ` David Hildenbrand
2024-07-17 10:18                           ` Ryan Roberts
2024-07-17 10:25                             ` David Hildenbrand
2024-07-17 10:48                               ` Ryan Roberts
2024-07-13 11:00     ` Ryan Roberts
2024-07-13 12:54       ` Baolin Wang
2024-07-14  9:05         ` Ryan Roberts
2024-07-22  3:52           ` Baolin Wang
2024-07-22  7:36             ` Ryan Roberts
2024-07-12 22:44   ` kernel test robot
2024-07-15 13:55     ` Ryan Roberts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5c58d9ea-8490-4ae6-b7bf-be816dab3356@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=corbet@lwn.net \
    --cc=hughd@google.com \
    --cc=ioworker0@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox