Re: [PATCH v3 0/6] add mTHP support for anonymous shmem

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Daniel Gomez <da.gomez@samsung.com>
To: David Hildenbrand <david@redhat.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"hughd@google.com" <hughd@google.com>,
	"willy@infradead.org" <willy@infradead.org>,
	"wangkefeng.wang@huawei.com" <wangkefeng.wang@huawei.com>,
	"ying.huang@intel.com" <ying.huang@intel.com>,
	"21cnbao@gmail.com" <21cnbao@gmail.com>,
	"ryan.roberts@arm.com" <ryan.roberts@arm.com>,
	"shy828301@gmail.com" <shy828301@gmail.com>,
	"ziy@nvidia.com" <ziy@nvidia.com>,
	"ioworker0@gmail.com" <ioworker0@gmail.com>,
	Pankaj Raghav <p.raghav@samsung.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 0/6] add mTHP support for anonymous shmem
Date: Tue, 4 Jun 2024 12:30:09 +0000	[thread overview]
Message-ID: <v5acpezkt4ml3j3ufmbgnq5b335envea7xfobvowtaetvbt3an@v3pfkwly5jh2> (raw)
In-Reply-To: <8e10beae-b3cd-4f5a-8e50-3f6dfe75ba9f@redhat.com>

On Tue, Jun 04, 2024 at 11:59:51AM +0200, David Hildenbrand wrote:
> [...]
> 
> > > > How can we use per-block tracking for reclaiming memory and what changes would
> > > > be needed? Or is per-block really a non-viable option?
> > > 
> > > The interesting thing is: with mTHP toggles it is opt-in -- like for
> > > PMD-sized THP with shmem_enabled -- and we don't have to be that concerned
> > > about this problem right now.
> > 
> > Without respecting the size when allocating large folios, mTHP toggles would
> > over allocate. My proposal added earlier to this thread is to combine the 2
> > to avoid that case. Otherwise, shouldn't we try to find a solution for the
> > reclaiming path?
> 
> I think at some point we'll really have to do a better job at reclaiming
> (either memory-overallocation, PUNCHHOLE that couldn't split, but maybe also
> pages that are now all-zero again and could be reclaimed again).
> 
> > 
> > > 
> > > > 
> > > > Clearly, if per-block is viable option, shmem_fault() bug would required to be
> > > > fixed first. Any ideas on how to make it reproducible?
> > > > 
> > > > The alternatives discussed where sub-page refcounting and zeropage scanning.
> > > 
> > > Yeah, I don't think sub-page refcounting is a feasible (and certainly not
> > > desired ;) ) option in the folio world.
> > > 
> > > > The first one is not possible (IIUC) because of a refactor years back that
> > > > simplified the code and also requires extra complexity. The second option would
> > > > require additional overhead as we would involve scanning.
> > > 
> > > We'll likely need something similar (scanning, tracking?) for anonymous
> > > memory as well. There was a proposal for a THP shrinker some time ago, that
> > > would solve part of the problem.
> > 
> > It's good to know we have the same problem in different places. I'm more
> > inclined to keep the information rather than adding an extra overhead. Unless
> > the complexity is really overwhelming. Considering the concerns here, not sure
> > how much should we try merging with iomap as per Ritesh's suggestion.
> 
> As raised in the meeting, I do see value in maintaining the information; but
> I also see why Hugh and Kirill think this might be unwarranted complexity in
> shmem.c. Meaning: we might be able to achieve something similar without it,
> and we don't have to solve the problem right now to benefit from large
> folios.
> 
> > 
> > The THP shrinker, could you please confirm if it is this following thread?
> > 
> > https://lore.kernel.org/all/cover.1667454613.git.alexlzhu@fb.com/
> 
> Yes, although there was no follow up so far. Possibly, because in the
> current khugepaged approach, there will be a constant back-and-forth between
> khugepaged collapsing memory (and wasting memory in the default setting),
> and the THP shrinker reclaiming memory; doesn't sound quite effective :)
> That needs more thought IMHO.
> 
> [...]
> 
> > > > > To optimize FALLOC_FL_PUNCH_HOLE for the cases where splitting+freeing
> > > > > is not possible at fallcoate() time, detecting zeropages later and
> > > > > retrying to split+free might be an option, without per-block tracking.
> > > > 
> > > > > 
> > > > > (2) mTHP controls
> > > > > 
> > > > > As a default, we should not be using large folios / mTHP for any shmem, just
> > > > > like we did with THP via shmem_enabled. This is what this series currently
> > > > > does, and is aprt of the whole mTHP user-space interface design.
> > > > 
> > > > That was clear for me too. But what is the reason we want to boot in 'safe
> > > > mode'? What are the implications of not respecing that?
> > > 
> > > [...]
> > > 
> > > > 
> > > > As I understood from the call, mTHP with sysctl knobs is preferred over
> > > > optimistic falloc/write allocation? But is still unclear to me why the former
> > > > is preferred.
> > > 
> > > I think Hugh's point was that this should be an opt-in feature, just like
> > > PMD-sized THP started out, and still is, an opt-in feature.
> > 
> > I'd be keen to understand the use case for this. Even the current THP controls
> > we have in tmpfs. I guess these are just scenarios with no swap involved?
> > Are these use cases the same for both tmpfs and shmem anon mm?
> 
> Systems without swap are one case I think. The other reason for a toggle in
> the past was to be able to disable it to troubleshoot issues (Hugh mentioned
> in the meeting about unlocking new code paths in shmem.c only with a
> toggle).
> 
> Staring at my Fedora system:
> 
> $ cat /sys/kernel/mm/transparent_hugepage/shmem_enabled
> always within_size advise [never] deny force
> 
> Maybe because it uses tmpfs to mount /tmp (interesting article on lwn.net
> about that) and people are concerned about the side-effects (that can
> currently waste memory, or result in more reclaim work being required when
> exceeding file sizes).
> 
> For VMs, I know that shmem+THP with memory ballooning is problematic and not
> really recommended.

Agree. We cannot PUNCH_HOLES when using THP unless the punch is PMD-size.

> 
> [...]
> 
> > > > 
> > > > > 
> > > > > Also, we should properly fallback within the configured sizes, and not jump
> > > > > "over" configured sizes. Unless there is a good reason.
> > > > > 
> > > > > (3) khugepaged
> > > > > 
> > > > > khugepaged needs to handle larger folios properly as well. Until fixed,
> > > > > using smaller THP sizes as fallback might prohibit collapsing a PMD-sized
> > > > > THP later. But really, khugepaged needs to be fixed to handle that.
> > > > > 
> > > > > (4) force/disable
> > > > > 
> > > > > These settings are rather testing artifacts from the old ages. We should not
> > > > > add them to the per-size toggles. We might "inherit" it from the global one,
> > > > > though.
> > > > > 
> > > > > "within_size" might have value, and especially for consistency, we should
> > > > > have them per size.
> > > > > 
> > > > > 
> > > > > 
> > > > > So, this series only tackles anonymous shmem, which is a good starting
> > > > > point. Ideally, we'd get support for other shmem (especially during fault
> > > > > time) soon afterwards, because we won't be adding separate toggles for that
> > > > > from the interface POV, and having inconsistent behavior between kernel
> > > > > versions would be a bit unfortunate.
> > > > > 
> > > > > 
> > > > > @Baolin, this series likely does not consider (4) yet. And I suggest we have
> > > > > to take a lot of the "anonymous thp" terminology out of this series,
> > > > > especially when it comes to documentation.
> > > > > 
> > > > > @Daniel, Pankaj, what are your plans regarding that? It would be great if we
> > > > > could get an understanding on the next steps on !anon shmem.
> > > > 
> > > > I realize I've raised so many questions, but it's essential for us to grasp the
> > > > mm concerns and usage scenarios. This understanding will provide clarity on the
> > > > direction regarding folios for !anon shmem.
> > > 
> > > If I understood correctly, Hugh had strong feelings against not respecting
> > > mTHP toggles for shmem. Without per-block tracking, I agree with that.
> > 
> > My understanding was the same. But I have this follow-up question: should
> > we respect mTHP toggles without considering mapping constraints (size and
> > index)? Or perhaps we should use within_size where we can fit this intermediate
> > approach, as long as mTHP granularity is respected?
> 
> Likely we should do exactly the same as we would do with the existing
> PMD-sized THPs.
> 
> I recall in the meeting that we discussed that always vs. within_size seems
> to have some value, and that we should respect that setting like we did for
> PMD-sized THP.
> 
> Which other mapping constraints could we have?

Patch 12 in my RFC for LSF [1] adds this shmem_mapping_size_order() (updated
to support get_order()) to get the max order 'based on the file size which the
mapping currently allows at the given index'. Same check is done here [2] in
filemap.c.

[1] https://lore.kernel.org/all/20240515055719.32577-13-da.gomez@samsung.com/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/filemap.c?h=v6.10-rc2#n1940

@@ -1726,6 +1726,37 @@ static struct folio *shmem_alloc_folio(gfp_t gfp, int order,
        return folio;
 }

+/**
+ * shmem_mapping_size_order - Get maximum folio order for the given file size.
+ * @mapping: Target address_space.
+ * @index: The page index.
+ * @size: The suggested size of the folio to create.
+ *
+ * This returns a high order for folios (when supported) based on the file size
+ * which the mapping currently allows at the given index. The index is relevant
+ * due to alignment considerations the mapping might have. The returned order
+ * may be less than the size passed.
+ *
+ * Like __filemap_get_folio order calculation.
+ *
+ * Return: The order.
+ */
+static inline unsigned int
+shmem_mapping_size_order(struct address_space *mapping, pgoff_t index,
+                        size_t size)
+{
+       unsigned int order = get_order(max_t(size_t, size, PAGE_SIZE));
+
+       if (!mapping_large_folio_support(mapping))
+               return 0;
+
+       /* If we're not aligned, allocate a smaller folio */
+       if (index & ((1UL << order) - 1))
+               order = __ffs(index);
+
+       return min_t(size_t, order, MAX_PAGECACHE_ORDER);
+}
+

> 
> -- 
> Cheers,
> 
> David / dhildenb
>

     prev parent reply	other threads:[~2024-06-04 12:30 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-30  2:04 Baolin Wang
2024-05-30  2:04 ` [PATCH v3 1/6] mm: memory: extend finish_fault() to support large folio Baolin Wang
2024-06-03  4:44   ` Lance Yang
2024-06-03  8:04     ` Baolin Wang
2024-06-03  5:28   ` Barry Song
2024-06-03  8:29     ` Baolin Wang
2024-06-03  8:58       ` Barry Song
2024-06-03  9:01         ` Barry Song
2024-06-03  9:37           ` Baolin Wang
2024-05-30  2:04 ` [PATCH v3 2/6] mm: shmem: add THP validation for PMD-mapped THP related statistics Baolin Wang
2024-05-30  2:04 ` [PATCH v3 3/6] mm: shmem: add multi-size THP sysfs interface for anonymous shmem Baolin Wang
2024-06-01  3:29   ` wang wei
2024-06-02  4:36     ` [PATCH " Baolin Wang
2024-05-30  2:04 ` [PATCH v3 4/6] mm: shmem: add mTHP support " Baolin Wang
2024-05-30  6:36   ` kernel test robot
2024-06-02  4:16     ` Baolin Wang
2024-06-04  9:23   ` Dan Carpenter
2024-06-04  9:46     ` Baolin Wang
2024-05-30  2:04 ` [PATCH v3 5/6] mm: shmem: add mTHP size alignment in shmem_get_unmapped_area Baolin Wang
2024-05-30  2:04 ` [PATCH v3 6/6] mm: shmem: add mTHP counters for anonymous shmem Baolin Wang
2024-05-31  9:35 ` [PATCH v3 0/6] add mTHP support " David Hildenbrand
2024-05-31 10:13   ` Baolin Wang
2024-05-31 11:13     ` David Hildenbrand
2024-06-02  4:15       ` Baolin Wang
2024-06-04  8:18       ` Daniel Gomez
2024-06-04  9:45         ` Baolin Wang
2024-06-04 12:05           ` Daniel Gomez
2024-06-06  3:31             ` Baolin Wang
2024-06-06  8:38               ` David Hildenbrand
2024-06-06  9:31                 ` Baolin Wang
2024-06-07  9:05                 ` Daniel Gomez
2024-06-07 10:39                   ` David Hildenbrand
2024-06-01  3:54     ` wang wei
2024-05-31 13:19   ` Daniel Gomez
2024-05-31 14:43     ` David Hildenbrand
2024-06-04  9:29       ` Daniel Gomez
2024-06-04  9:59         ` David Hildenbrand
2024-06-04 12:30           ` Daniel Gomez [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=v5acpezkt4ml3j3ufmbgnq5b335envea7xfobvowtaetvbt3an@v3pfkwly5jh2 \
    --to=da.gomez@samsung.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=ioworker0@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=p.raghav@samsung.com \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox