linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: John Hubbard <jhubbard@nvidia.com>
To: Dev Jain <dev.jain@arm.com>, Ryan Roberts <ryan.roberts@arm.com>,
	<akpm@linux-foundation.org>, <david@redhat.com>,
	<willy@infradead.org>, <kirill.shutemov@linux.intel.com>
Cc: <anshuman.khandual@arm.com>, <catalin.marinas@arm.com>,
	<cl@gentwo.org>, <vbabka@suse.cz>, <mhocko@suse.com>,
	<apopple@nvidia.com>, <dave.hansen@linux.intel.com>,
	<will@kernel.org>, <baohua@kernel.org>, <jack@suse.cz>,
	<srivatsa@csail.mit.edu>, <haowenchao22@gmail.com>,
	<hughd@google.com>, <aneesh.kumar@kernel.org>,
	<yang@os.amperecomputing.com>, <peterx@redhat.com>,
	<ioworker0@gmail.com>, <wangkefeng.wang@huawei.com>,
	<ziy@nvidia.com>, <jglisse@google.com>, <surenb@google.com>,
	<vishal.moola@gmail.com>, <zokeefe@google.com>,
	<zhengqi.arch@bytedance.com>, <21cnbao@gmail.com>,
	<linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 10/12] khugepaged: Skip PTE range if a larger mTHP is already mapped
Date: Wed, 18 Dec 2024 19:40:29 -0800	[thread overview]
Message-ID: <00d429c9-6ade-42c9-a1f3-a7519375324f@nvidia.com> (raw)
In-Reply-To: <b5ca5e9f-e430-4bfb-ae6f-e2fbd3dc9e54@arm.com>

On 12/18/24 1:34 AM, Dev Jain wrote:
> On 18/12/24 1:06 pm, Ryan Roberts wrote:
>> On 16/12/2024 16:51, Dev Jain wrote:
>>> We may hit a situation wherein we have a larger folio mapped. It is incorrect
>>> to go ahead with the collapse since some pages will be unmapped, leading to
>>> the entire folio getting unmapped. Therefore, skip the corresponding range.
...
>> It would be good if you can spell out the desired policy when khugepaged hits
>> partially unmapped large folios and unaligned large folios. I think the simple
>> approach is to always collapse them to fully mapped, aligned folios even if the
>> resulting order is smaller than the original. But I'm not sure that's definitely
>> going to always be the best thing.
>>
>> Regardless, I'm struggling to understand the logic in this patch. Taking the
>> order of a folio based on having hit one of it's pages says anything about
>> whether the whole of that folio is mapped or not or it's alignment. And it's not
>> clear to me how we would get to a situation where we are scanning for a lower
>> order and find a (fully mapped, aligned) folio of higher order in the first place.
>>
>> Let's assume the desired policy is that khugepaged should always collapse to
>> naturally aligned large folios. If there happens to be an existing aligned
>> order-4 folio that is fully mapped, we will identify that for collapse as part
>> of the scan for order-4. At that point, we should just notice that it is already
>> an aligned order-4 folio and bypass collapse. Of course we may have already
>> chosen to collapse it into a higher order, but we should definitely not get to a
>> lower order before we notice it.
>>
>> Hmm... I guess if the sysfs thp settings have been changed then things could get
>> spicy... if order-8 was previously enabled and we have an order-8 folio, then it
>> get's disabled and khugepaged is scanning for order-4 (which is still enabled)
>> then hits the order-8; what's the expected policy? rework into 2 order-4 folios
>> or leave it as as single order-8?
> 
> Exactly, sorry, I should have made it clear in the patch description that I am
> handling the following scenario: there is a long running system on which we are
> using order-8 folios, and now we decide to downgrade to order-4. Will it be a
> good idea to take the pain of splitting order-8 to 16 order-4 folios? This should
> be a rare situation in the first place, so I have currently decided to ignore the
> folios set up by the previous sysfs setting and only focus on collapsing fresh memory.
> 
> Thinking again, a sys-admin deciding to downgrade order of folios, should do that in
> the hopes of reducing internal fragmentation or increasing swap speed etc, so it makes
> sense to shatter large folios....maybe we can have a sysfs tunable for this?

Maybe we should not support it (at runtime) at all. We are trying to build
systems that don't require incredibly detailed sysadmin involvement, and
this level of tweaking qualifies, thoroughly, as "incredibly detailed
sysadmin micromanagement", imho.

Apologies for not having gone through the series in detail yet, but this
point jumped out at me.

thanks,
-- 
John Hubbard



  reply	other threads:[~2024-12-19  3:40 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-16 16:50 [RFC PATCH 00/12] khugepaged: Asynchronous mTHP collapse Dev Jain
2024-12-16 16:50 ` [RFC PATCH 01/12] khugepaged: Rename hpage_collapse_scan_pmd() -> ptes() Dev Jain
2024-12-17  4:18   ` Matthew Wilcox
2024-12-17  5:52     ` Dev Jain
2024-12-17  6:43     ` Ryan Roberts
2024-12-17 18:11       ` Zi Yan
2024-12-17 19:12         ` Ryan Roberts
2024-12-16 16:50 ` [RFC PATCH 02/12] khugepaged: Generalize alloc_charge_folio() Dev Jain
2024-12-17  2:51   ` Baolin Wang
2024-12-17  6:08     ` Dev Jain
2024-12-17  4:17   ` Matthew Wilcox
2024-12-17  7:09     ` Ryan Roberts
2024-12-17 13:00       ` Zi Yan
2024-12-20 17:41       ` Christoph Lameter (Ampere)
2024-12-20 17:45         ` Ryan Roberts
2024-12-20 18:47           ` Christoph Lameter (Ampere)
2025-01-02 11:21             ` Ryan Roberts
2024-12-17  6:53   ` Ryan Roberts
2024-12-17  9:06     ` Dev Jain
2024-12-16 16:50 ` [RFC PATCH 03/12] khugepaged: Generalize hugepage_vma_revalidate() Dev Jain
2024-12-17  4:21   ` Matthew Wilcox
2024-12-17 16:58   ` Ryan Roberts
2024-12-16 16:50 ` [RFC PATCH 04/12] khugepaged: Generalize __collapse_huge_page_swapin() Dev Jain
2024-12-17  4:24   ` Matthew Wilcox
2024-12-16 16:50 ` [RFC PATCH 05/12] khugepaged: Generalize __collapse_huge_page_isolate() Dev Jain
2024-12-17  4:32   ` Matthew Wilcox
2024-12-17  6:41     ` Dev Jain
2024-12-17 17:14       ` Ryan Roberts
2024-12-17 17:09   ` Ryan Roberts
2024-12-16 16:50 ` [RFC PATCH 06/12] khugepaged: Generalize __collapse_huge_page_copy_failed() Dev Jain
2024-12-17 17:22   ` Ryan Roberts
2024-12-18  8:49     ` Dev Jain
2024-12-16 16:51 ` [RFC PATCH 07/12] khugepaged: Scan PTEs order-wise Dev Jain
2024-12-17 18:15   ` Ryan Roberts
2024-12-18  9:24     ` Dev Jain
2025-01-06 10:04   ` Usama Arif
2025-01-07  7:17     ` Dev Jain
2024-12-16 16:51 ` [RFC PATCH 08/12] khugepaged: Abstract PMD-THP collapse Dev Jain
2024-12-17 19:24   ` Ryan Roberts
2024-12-18  9:26     ` Dev Jain
2024-12-16 16:51 ` [RFC PATCH 09/12] khugepaged: Introduce vma_collapse_anon_folio() Dev Jain
2024-12-16 17:06   ` David Hildenbrand
2024-12-16 19:08     ` Yang Shi
2024-12-17 10:07     ` Dev Jain
2024-12-17 10:32       ` David Hildenbrand
2024-12-18  8:35         ` Dev Jain
2025-01-02 10:08           ` Dev Jain
2025-01-02 11:33             ` David Hildenbrand
2025-01-03  8:17               ` Dev Jain
2025-01-02 11:22           ` David Hildenbrand
2024-12-18 15:59     ` Dev Jain
2025-01-06 10:17   ` Usama Arif
2025-01-07  8:12     ` Dev Jain
2024-12-16 16:51 ` [RFC PATCH 10/12] khugepaged: Skip PTE range if a larger mTHP is already mapped Dev Jain
2024-12-18  7:36   ` Ryan Roberts
2024-12-18  9:34     ` Dev Jain
2024-12-19  3:40       ` John Hubbard [this message]
2024-12-19  3:51         ` Zi Yan
2024-12-19  7:59         ` Dev Jain
2024-12-19  8:07           ` Dev Jain
2024-12-20 11:57             ` Ryan Roberts
2024-12-16 16:51 ` [RFC PATCH 11/12] khugepaged: Enable sysfs to control order of collapse Dev Jain
2024-12-16 16:51 ` [RFC PATCH 12/12] selftests/mm: khugepaged: Enlighten for mTHP collapse Dev Jain
2024-12-18  9:03   ` Ryan Roberts
2024-12-18  9:50     ` Dev Jain
2024-12-20 11:05       ` Ryan Roberts
2024-12-30  7:09         ` Dev Jain
2024-12-30 16:36           ` Zi Yan
2025-01-02 11:43             ` Ryan Roberts
2025-01-03 10:10               ` Dev Jain
2025-01-03 10:11             ` Dev Jain
2024-12-16 17:31 ` [RFC PATCH 00/12] khugepaged: Asynchronous " Dev Jain
2025-01-02 21:58   ` Nico Pache
2025-01-03  7:04     ` Dev Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=00d429c9-6ade-42c9-a1f3-a7519375324f@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@kernel.org \
    --cc=anshuman.khandual@arm.com \
    --cc=apopple@nvidia.com \
    --cc=baohua@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=cl@gentwo.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=haowenchao22@gmail.com \
    --cc=hughd@google.com \
    --cc=ioworker0@gmail.com \
    --cc=jack@suse.cz \
    --cc=jglisse@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=peterx@redhat.com \
    --cc=ryan.roberts@arm.com \
    --cc=srivatsa@csail.mit.edu \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=vishal.moola@gmail.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yang@os.amperecomputing.com \
    --cc=zhengqi.arch@bytedance.com \
    --cc=ziy@nvidia.com \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox