From: Chris Li <chrisl@kernel.org>
To: Barry Song <21cnbao@gmail.com>
Cc: Kairui Song <ryncsn@gmail.com>,
linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
Matthew Wilcox <willy@infradead.org>,
Hugh Dickins <hughd@google.com>, Baoquan He <bhe@redhat.com>,
Nhat Pham <nphamcs@gmail.com>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Ying Huang <ying.huang@linux.alibaba.com>,
Johannes Weiner <hannes@cmpxchg.org>,
David Hildenbrand <david@redhat.com>,
Yosry Ahmed <yosryahmed@google.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Zi Yan <ziy@nvidia.com>,
linux-kernel@vger.kernel.org, Kairui Song <kasong@tencent.com>
Subject: Re: [PATCH v4 01/15] docs/mm: add document for swap table
Date: Thu, 18 Sep 2025 00:03:20 -0700 [thread overview]
Message-ID: <CANeU7QmcC=-CTmJ7i8R77SQ_WArBvjP3VrmpLOy-b7QhCfMRYA@mail.gmail.com> (raw)
In-Reply-To: <CANeU7QkZBWFO6SeVHtmm73oLu7r0zavePQEYmQfH8opKPH1QWw@mail.gmail.com>
Hi Barry,
How about this:
A swap table stores one cluster worth of swap cache values, which is
exactly one page table page on most morden 64 bit systems. This is not
coincidental because the cluster size is determined by the huge page size.
The swap table is holding an array of pointers, which have the same
size as the PTE. The size of the swap table should match the page table
page.
If that sounds OK, I will send an incremental patch to Andrew.
Chris
On Wed, Sep 17, 2025 at 10:03 PM Chris Li <chrisl@kernel.org> wrote:
>
> On Wed, Sep 17, 2025 at 4:38 PM Barry Song <21cnbao@gmail.com> wrote:
> >
> > > > This approach still seems to work, so the 32-bit system appears to be
> > > > the only exception. However, I’m not entirely sure that your description
> > > > of “the second last level” is correct. I believe it refers to the PTE,
> > > > which corresponds to the last level, not the second-to-last.
> > > > In other words, how do you define the second-to-last level page table?
> > >
> > > The second-to-last level page table page holds the PMD. The last level
> > > page table holds PTE.
> > > Cluster size is HPAGE_PMD_NR = 1<<HPAGE_PMD_ORDER
> > > I was thinking of a PMD entry but the actual page table page it points
> > > to is the last level.
> > > That is a good catch. Let me see how to fix it.
> > >
> > > What I am trying to say is that, swap table size should match to the
> > > PTE page table page size which determines the cluster size. An
> > > alternative to understanding the swap table is that swap table is a
> > > shadow PTE page table containing the shadow PTE matching to the page
> > > that gets swapped out to the swapfile. It is arranged in the swapfile
> > > swap offset order. The intuition is simple once you find the right
> > > angle to view it. However it might be a mouthful to explain.
> > >
> > > I am fine with removing it, on the other hand it removes the only bit
> > > of secret sauce which I try to give the reader a glimpse of my
> > > intuition of the swap table.
> >
> > Perhaps you could describe the swap table as similar to a PTE page table
> > representing the swap cache mapping.
>
> Hard to qualify what is "similar", in what way it is similar.
> Different readers will have different interpretations of what similar
> means to them.
>
> > That is correct for most 32-bit and 64-bit systems,
> > but not for every machine.
>
> I think I will leave it as for most 64 bit systems, the swap table
> size is exactly one page table page size and that is not coincidental.
>
> > The only exception is a 32-bit system with a 64-bit physical address
> > (Large Physical Address Extension, LPAE), which uses a 4 KB PTE table
> > but a 2 KB swap table because the pointer is 32 bit while each page
> > table entry is 64 bit.
>
> I feel that is a very corner case. I will leave it out of the
> document. I want to present a simplified abstracted view. There is
> always more detail to distract the simple abstracted view. That is why
> we have physics.
>
> > Maybe we can simply say that the number of entries in the swap table
> > is the same as in a PTE page table?
>
> Yes, that is what I want to say, for most modern 64 bit systems.
>
> Chris
next prev parent reply other threads:[~2025-09-18 7:03 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-16 16:00 [PATCH v4 00/15] mm, swap: introduce swap table as swap cache (phase I) Kairui Song
2025-09-16 16:00 ` [PATCH v4 01/15] docs/mm: add document for swap table Kairui Song
2025-09-16 21:59 ` Barry Song
2025-09-16 22:42 ` Chris Li
2025-09-16 23:09 ` Barry Song
2025-09-16 23:28 ` Chris Li
2025-09-16 23:47 ` Barry Song
2025-09-17 16:48 ` Chris Li
2025-09-17 23:37 ` Barry Song
2025-09-17 23:50 ` Barry Song
2025-09-18 4:50 ` Chris Li
2025-09-18 5:03 ` Chris Li
2025-09-18 7:03 ` Chris Li [this message]
2025-09-18 8:58 ` Barry Song
2025-09-18 14:19 ` Chris Li
2025-09-18 21:35 ` Barry Song
2025-09-21 0:11 ` Chris Li
2025-09-17 16:14 ` SeongJae Park
2025-09-17 17:12 ` Chris Li
2025-09-16 16:00 ` [PATCH v4 02/15] mm, swap: use unified helper for swap cache look up Kairui Song
2025-09-16 16:00 ` [PATCH v4 03/15] mm, swap: fix swap cache index error when retrying reclaim Kairui Song
2025-09-16 16:00 ` [PATCH v4 04/15] mm, swap: check page poison flag after locking it Kairui Song
2025-09-16 16:00 ` [PATCH v4 05/15] mm, swap: always lock and check the swap cache folio before use Kairui Song
2025-09-17 23:54 ` Barry Song
2025-09-16 16:00 ` [PATCH v4 06/15] mm, swap: rename and move some swap cluster definition and helpers Kairui Song
2025-09-19 22:02 ` Nhat Pham
2025-09-16 16:00 ` [PATCH v4 07/15] mm, swap: tidy up swap device and cluster info helpers Kairui Song
2025-09-16 16:00 ` [PATCH v4 08/15] mm, swap: cleanup swap cache API and add kerneldoc Kairui Song
2025-09-16 16:00 ` [PATCH v4 09/15] mm/shmem, swap: remove redundant error handling for replacing folio Kairui Song
2025-09-24 21:55 ` Chris Li
2025-09-16 16:00 ` [PATCH v4 10/15] mm, swap: wrap swap cache replacement with a helper Kairui Song
2025-09-16 16:00 ` [PATCH v4 11/15] mm, swap: use the swap table for the swap cache and switch API Kairui Song
2025-09-16 16:00 ` [PATCH v4 12/15] mm, swap: mark swap address space ro and add context debug check Kairui Song
2025-09-16 16:00 ` [PATCH v4 13/15] mm, swap: remove contention workaround for swap cache Kairui Song
2025-09-16 16:00 ` [PATCH v4 14/15] mm, swap: implement dynamic allocation of swap table Kairui Song
2025-09-16 22:51 ` Barry Song
2025-09-24 21:51 ` Chris Li
2025-09-16 16:01 ` [PATCH v4 15/15] mm, swap: use a single page for swap table when the size fits Kairui Song
2025-09-16 22:30 ` Barry Song
2025-09-17 3:52 ` Kairui Song
2025-09-17 4:41 ` Barry Song
2025-09-17 4:50 ` Barry Song
2025-09-16 21:22 ` [PATCH v4 00/15] mm, swap: introduce swap table as swap cache (phase I) Hugh Dickins
2025-09-17 3:53 ` Kairui Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CANeU7QmcC=-CTmJ7i8R77SQ_WArBvjP3VrmpLOy-b7QhCfMRYA@mail.gmail.com' \
--to=chrisl@kernel.org \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bhe@redhat.com \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kasong@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=nphamcs@gmail.com \
--cc=ryncsn@gmail.com \
--cc=shikemeng@huaweicloud.com \
--cc=willy@infradead.org \
--cc=ying.huang@linux.alibaba.com \
--cc=yosryahmed@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox