Re: [RFC v2 PATCH 0/4] speed up page allocation for __GFP_ZERO

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Liang Li <liliang324@gmail.com>
Cc: David Hildenbrand <david@redhat.com>,
	Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Dan Williams <dan.j.williams@intel.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Michal Hocko <mhocko@suse.com>,
	Liang Li <liliangleo@didiglobal.com>,
	linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	virtualization@lists.linux-foundation.org
Subject: Re: [RFC v2 PATCH 0/4] speed up page allocation for __GFP_ZERO
Date: Mon, 4 Jan 2021 21:18:14 +0100	[thread overview]
Message-ID: <96BB0656-F234-4634-853E-E2A747B6ECDB@redhat.com> (raw)
In-Reply-To: <CA+2MQi_C-PTqyrqBprhtGBAiDBnPQBzwu6hvyuk+QiKy0L3sHw@mail.gmail.com>


> Am 23.12.2020 um 13:12 schrieb Liang Li <liliang324@gmail.com>:
> 
> On Wed, Dec 23, 2020 at 4:41 PM David Hildenbrand <david@redhat.com> wrote:
>> 
>> [...]
>> 
>>>> I was rather saying that for security it's of little use IMHO.
>>>> Application/VM start up time might be improved by using huge pages (and
>>>> pre-zeroing these). Free page reporting might be improved by using
>>>> MADV_FREE instead of MADV_DONTNEED in the hypervisor.
>>>> 
>>>>> this feature, above all of them, which one is likely to become the
>>>>> most strong one?  From the implementation, you will find it is
>>>>> configurable, users don't want to use it can turn it off.  This is not
>>>>> an option?
>>>> 
>>>> Well, we have to maintain the feature and sacrifice a page flag. For
>>>> example, do we expect someone explicitly enabling the feature just to
>>>> speed up startup time of an app that consumes a lot of memory? I highly
>>>> doubt it.
>>> 
>>> In our production environment, there are three main applications have such
>>> requirement, one is QEMU [creating a VM with SR-IOV passthrough device],
>>> anther other two are DPDK related applications, DPDK OVS and SPDK vhost,
>>> for best performance, they populate memory when starting up. For SPDK vhost,
>>> we make use of the VHOST_USER_GET/SET_INFLIGHT_FD feature for
>>> vhost 'live' upgrade, which is done by killing the old process and
>>> starting a new
>>> one with the new binary. In this case, we want the new process started as quick
>>> as possible to shorten the service downtime. We really enable this feature
>>> to speed up startup time for them  :)

Am I wrong or does using hugeltbfs/tmpfs ... i.e., a file not-deleted between shutting down the old instances and firing up the new instance just solve this issue?

>> 
>> Thanks for info on the use case!
>> 
>> All of these use cases either already use, or could use, huge pages
>> IMHO. It's not your ordinary proprietary gaming app :) This is where
>> pre-zeroing of huge pages could already help.
> 
> You are welcome.  For some historical reason, some of our services are
> not using hugetlbfs, that is why I didn't start with hugetlbfs.
> 
>> Just wondering, wouldn't it be possible to use tmpfs/hugetlbfs ...
>> creating a file and pre-zeroing it from another process, or am I missing
>> something important? At least for QEMU this should work AFAIK, where you
>> can just pass the file to be use using memory-backend-file.
>> 
> If using another process to create a file, we can offload the overhead to
> another process, and there is no need to pre-zeroing it's content, just
> populating the memory is enough.

Right, if non-zero memory can be tolerated (e.g., for vms usually has to).

> If we do it that way, then how to determine the size of the file? it depends
> on the RAM size of the VM the customer buys.
> Maybe we can create a file
> large enough in advance and truncate it to the right size just before the
> VM is created. Then, how many large files should be created on a host?

That‘s mostly already existing scheduling logic, no? (How many vms can I put onto a specific machine eventually)

> You will find there are a lot of things that have to be handled properly.
> I think it's possible to make it work well, but we will transfer the
> management complexity to up layer components. It's a bad practice to let
> upper layer components process such low level details which should be
> handled in the OS layer.

It‘s bad practice to squeeze things into the kernel that can just be handled on upper layers ;)

next prev parent reply	other threads:[~2021-01-04 20:18 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-21 16:25 Liang Li
2020-12-22  8:47 ` David Hildenbrand
2020-12-22 11:31   ` Liang Li
2020-12-22 11:57     ` David Hildenbrand
2020-12-22 14:00       ` Liang Li
2020-12-23  8:41         ` David Hildenbrand
2020-12-23 12:11           ` Liang Li
2021-01-04 20:18             ` David Hildenbrand [this message]
2021-01-05  2:14               ` Liang Li
2021-01-05  9:39                 ` David Hildenbrand
2021-01-05 10:22                   ` Liang Li
2021-01-05 10:27                     ` David Hildenbrand
2020-12-22 12:23 ` Matthew Wilcox
2020-12-22 14:42   ` Liang Li
2021-01-04 12:51     ` Michal Hocko
2021-01-04 13:45       ` Liang Li
2020-12-22 17:11 ` Daniel Jordan
2020-12-22 19:13 ` Alexander Duyck
2021-01-04 12:55 ` Michal Hocko
2021-01-04 14:07   ` Liang Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=96BB0656-F234-4634-853E-E2A747B6ECDB@redhat.com \
    --to=david@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=jasowang@redhat.com \
    --cc=liliang324@gmail.com \
    --cc=liliangleo@didiglobal.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=mst@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox