Re: [PATCH v5 3/4] mm: support large folios swapin as a whole for zRAM-like swapfile

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Barry Song <21cnbao@gmail.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	ying.huang@intel.com,  baolin.wang@linux.alibaba.com,
	chrisl@kernel.org, david@redhat.com,  hannes@cmpxchg.org,
	hughd@google.com, kaleshsingh@google.com,  kasong@tencent.com,
	linux-kernel@vger.kernel.org, mhocko@suse.com,
	 minchan@kernel.org, nphamcs@gmail.com, ryan.roberts@arm.com,
	 senozhatsky@chromium.org, shakeel.butt@linux.dev,
	shy828301@gmail.com,  surenb@google.com, v-songbaohua@oppo.com,
	xiang@kernel.org,  yosryahmed@google.com,
	Chuanhua Han <hanchuanhua@oppo.com>
Subject: Re: [PATCH v5 3/4] mm: support large folios swapin as a whole for zRAM-like swapfile
Date: Tue, 30 Jul 2024 09:56:03 +1200	[thread overview]
Message-ID: <CAGsJ_4zFVrJ3BVDBBAD5mSQgZybsig5ZoT6PVyohYAbZt9Ndnw@mail.gmail.com> (raw)
In-Reply-To: <CAGsJ_4yqLVvCUFpHjWmNAYvPRMzGK8JJWYMXJLR7d9UhKp+QDA@mail.gmail.com>

On Tue, Jul 30, 2024 at 8:03 AM Barry Song <21cnbao@gmail.com> wrote:
>
> On Tue, Jul 30, 2024 at 3:13 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Tue, Jul 30, 2024 at 01:11:31AM +1200, Barry Song wrote:
> > > for this zRAM case, it is a new allocated large folio, only
> > > while all conditions are met, we will allocate and map
> > > the whole folio. you can check can_swapin_thp() and
> > > thp_swap_suitable_orders().
> >
> > YOU ARE DOING THIS WRONGLY!
> >
> > All of you anonymous memory people are utterly fixated on TLBs AND THIS
> > IS WRONG.  Yes, TLB performance is important, particularly with crappy
> > ARM designs, which I know a lot of you are paid to work on.  But you
> > seem to think this is the only consideration, and you're making bad
> > design choices as a result.  It's overly complicated, and you're leaving
> > performance on the table.
> >
> > Look back at the results Ryan showed in the early days of working on
> > large anonymous folios.  Half of the performance win on his system came
> > from using larger TLBs.  But the other half came from _reduced software
> > overhead_.  The LRU lock is a huge problem, and using large folios cuts
> > the length of the LRU list, hence LRU lock hold time.
> >
> > Your _own_ data on how hard it is to get hold of a large folio due to
> > fragmentation should be enough to convince you that the more large folios
> > in the system, the better the whole system runs.  We should not decline to
> > allocate large folios just because they can't be mapped with a single TLB!
>
> I am not convinced. for a new allocated large folio, even alloc_anon_folio()
> of do_anonymous_page() does the exactly same thing
>
> alloc_anon_folio()
> {
>         /*
>          * Get a list of all the (large) orders below PMD_ORDER that are enabled
>          * for this vma. Then filter out the orders that can't be allocated over
>          * the faulting address and still be fully contained in the vma.
>          */
>         orders = thp_vma_allowable_orders(vma, vma->vm_flags,
>                         TVA_IN_PF | TVA_ENFORCE_SYSFS, BIT(PMD_ORDER) - 1);
>         orders = thp_vma_suitable_orders(vma, vmf->address, orders);
>
> }
>
> you are not going to allocate a mTHP for an unaligned address for a new
> PF.
> Please point out where it is wrong.

Let's assume we have a folio with the virtual address as
0x500000000000 ~ 0x500000000000 + 64KB
if it is swapped out to 0x10000 ~ 0x100000 + 64KB.

The current code will swap it in as a mTHP if page fault occurs in
any address within (0x500000000000 ~ 0x500000000000 + 64KB)

In this case, the mTHP enjoys both decreased TLB and reduced overhead
such as LRU lock etc. So it sounds we have nothing lost in this case.

But if the folio is mremap-ed to an unaligned address like:
(0x600000000000 + 16KB ~ 0x600000000000 + 80KB)
and its swap offset is still (0x10000 ~ 0x100000 + 64KB).

The current code won't swap in them as mTHP. Sounds like a loss?

If this is the performance problem you are trying to address, my point
is that it is not worth increasing the complexity for this stage though this
might be doable. We once tracked hundreds of phones running apps randomly
for a couple of days, and we didn't encounter such a case. So this is
pretty much a corner case.

If your concern is more than this, for example, if you want to swap in
large folios even when swaps are completely not contiguous, this is a different
story. I agree this is a potential optimization direction to go,  but in that
case, you still need to find an aligned boundary to handle page faults
just like do_anonymous_page(), otherwise, you may result in all
kinds of pointless intersections where PFs can cover the address ranges of
other PFs, then make the PTEs check such as pte_range_none()
completely dis-ordered:

static struct folio *alloc_anon_folio(struct vm_fault *vmf)
{
        ....

        /*
         * Find the highest order where the aligned range is completely
         * pte_none(). Note that all remaining orders will be completely
         * pte_none().
         */
        order = highest_order(orders);
        while (orders) {
                addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
                if (pte_range_none(pte + pte_index(addr), 1 << order))
                        break;
                order = next_order(&orders, order);
        }
}

>
> Thanks
> Barry

next prev parent reply	other threads:[~2024-07-29 21:56 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-26  9:46 [PATCH v5 0/4] mm: support mTHP swap-in " Barry Song
2024-07-26  9:46 ` [PATCH v5 1/4] mm: swap: introduce swapcache_prepare_nr and swapcache_clear_nr for large folios swap-in Barry Song
2024-07-30  3:00   ` Baolin Wang
2024-07-30  3:11   ` Matthew Wilcox
2024-07-30  3:15     ` Barry Song
2024-07-26  9:46 ` [PATCH v5 2/4] mm: Introduce mem_cgroup_swapin_uncharge_swap_nr() helper " Barry Song
2024-07-26 16:30   ` Yosry Ahmed
2024-07-29  2:02     ` Barry Song
2024-07-29  3:43       ` Matthew Wilcox
2024-07-29  4:52         ` Barry Song
2024-07-26  9:46 ` [PATCH v5 3/4] mm: support large folios swapin as a whole for zRAM-like swapfile Barry Song
2024-07-29  3:51   ` Matthew Wilcox
2024-07-29  4:41     ` Barry Song
     [not found]       ` <CAGsJ_4wxUZAysyg3cCVnHhOFt5SbyAMUfq3tJcX-Wb6D4BiBhA@mail.gmail.com>
2024-07-29 12:49         ` Matthew Wilcox
2024-07-29 13:11           ` Barry Song
2024-07-29 15:13             ` Matthew Wilcox
2024-07-29 20:03               ` Barry Song
2024-07-29 21:56                 ` Barry Song [this message]
2024-07-30  8:12               ` Ryan Roberts
2024-07-29  6:36     ` Chuanhua Han
2024-07-29 12:55       ` Matthew Wilcox
2024-07-29 13:18         ` Barry Song
2024-07-29 13:32         ` Chuanhua Han
2024-07-29 14:16   ` Dan Carpenter
2024-07-26  9:46 ` [PATCH v5 4/4] mm: Introduce per-thpsize swapin control policy Barry Song
2024-07-27  5:58   ` kernel test robot
2024-07-29  1:37     ` Barry Song
2024-07-29  3:52   ` Matthew Wilcox
2024-07-29  4:49     ` Barry Song
2024-07-29 16:11     ` Christoph Hellwig
2024-07-29 20:11       ` Barry Song
2024-07-30 16:30         ` Christoph Hellwig
2024-07-30 19:28           ` Nhat Pham
2024-07-30 21:06             ` Barry Song
2024-07-31 18:35               ` Nhat Pham
2024-08-01  3:00                 ` Sergey Senozhatsky
2024-08-01 20:55           ` Chris Li
2024-08-12  8:27             ` Christoph Hellwig
2024-08-12  8:44               ` Barry Song
2024-07-30  2:27       ` Chuanhua Han
2024-07-30  8:36     ` Ryan Roberts
2024-07-30  8:47       ` David Hildenbrand
2024-08-05  6:10     ` Huang, Ying
2024-08-02 12:20 ` [PATCH v6 0/2] mm: Ignite large folios swap-in support Barry Song
2024-08-02 12:20   ` [PATCH v6 1/2] mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios Barry Song
2024-08-02 17:29     ` Chris Li
2024-08-02 12:20   ` [PATCH v6 2/2] mm: support large folios swap-in for zRAM-like devices Barry Song
2024-08-03 19:08     ` Andrew Morton
2024-08-12  8:26     ` Christoph Hellwig
2024-08-12  8:53       ` Barry Song
2024-08-12 11:38         ` Christoph Hellwig
2024-08-15  9:47     ` Kairui Song
2024-08-15 13:27       ` Kefeng Wang
2024-08-15 23:06         ` Barry Song
2024-08-16 16:50           ` Kairui Song
2024-08-16 20:34             ` Andrew Morton
2024-08-27  3:41               ` Chuanhua Han
2024-08-16 21:16           ` Matthew Wilcox
2024-08-16 21:39             ` Barry Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGsJ_4zFVrJ3BVDBBAD5mSQgZybsig5ZoT6PVyohYAbZt9Ndnw@mail.gmail.com \
    --to=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=chrisl@kernel.org \
    --cc=david@redhat.com \
    --cc=hanchuanhua@oppo.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kaleshsingh@google.com \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=nphamcs@gmail.com \
    --cc=ryan.roberts@arm.com \
    --cc=senozhatsky@chromium.org \
    --cc=shakeel.butt@linux.dev \
    --cc=shy828301@gmail.com \
    --cc=surenb@google.com \
    --cc=v-songbaohua@oppo.com \
    --cc=willy@infradead.org \
    --cc=xiang@kernel.org \
    --cc=ying.huang@intel.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox