[Question] performance regression after VM migration due to anon THP split in CoW

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Jinjiang Tu <tujinjiang@huawei.com>
To: <akpm@linux-foundation.org>, <kirill.shutemov@linux.intel.com>,
	<ziy@nvidia.com>, <william.kucharski@oracle.com>,
	<yang.shi@linux.alibaba.com>
Cc: <aarcange@redhat.com>, <jhubbard@nvidia.com>,
	<mike.kravetz@oracle.com>, <rcampbell@nvidia.com>,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	Nanyong Sun <sunnanyong@huawei.com>, <baohua@kernel.org>,
	David Hildenbrand <david@redhat.com>,
	<baolin.wang@linux.alibaba.com>, <linux-mm@kvack.org>
Subject: [Question] performance regression after VM migration due to anon THP split in CoW
Date: Sat, 29 Jun 2024 17:18:33 +0800	[thread overview]
Message-ID: <740d7379-3e3d-4c8c-4350-6c496969db1f@huawei.com> (raw)

Hi,

We noticed a performance regression in benchmark memtester[1] after
upgrading the kernel. THP is enabled by default 
(/sys/kernel/mm/transparent_hugepage/enabled
is set to "always"). The issue arises when we migrate a virtual machine
that has 125G total memory and 124G free memory to another host. And then,
we run the command `memtester 120G` in the VM. The benchmark takes about
20 seconds to consume 120G memory in v4.18, but takes about 160 seconds in
v5.10. This issue exists in mainline kernel too.

We find commit 3917c80280c9 ("thp: change CoW semantics for anon-THP")
leads to the performance regression. Since this commit, When we trigger a
write fault on a anon THP, we split the PMD and allocate a 4K page, instead
of allocating the full anon THP. When a VM is migrating (based on qemu[2]),
if the page is marked zero page in the source VM, the destination VM will
call mmap and read the region to allocate memory, making the region mapped
by the zero THP. When we run memtester in the destination VM after VM
migration finishes, memtester(in VM) will allocate large amounts of free
memory and write to them, cause CoW of anon THP and THP split, further
cause performance regression. After reverting this commit, performance
regression disappears.

This commit optimises some scenarios such as Redis, but may lead to
performance regression in some other scenarios, such as VM migration.
How could we solve this issue? Maybe we could add a new sysctl to let users
decide whether to CoW the full anon THP or not?

Thanks.

[1] https://github.com/jnavila/memtester/tree/master
[2] https://github.com/qemu/qemu/blob/master/migration/ram.c

next             reply	other threads:[~2024-06-29  9:18 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-29  9:18 Jinjiang Tu [this message]
2024-06-29  9:45 ` David Hildenbrand
2024-07-04 13:31   ` Jinjiang Tu
2024-07-04 13:55     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=740d7379-3e3d-4c8c-4350-6c496969db1f@huawei.com \
    --to=tujinjiang@huawei.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=rcampbell@nvidia.com \
    --cc=sunnanyong@huawei.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=william.kucharski@oracle.com \
    --cc=yang.shi@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox