linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: David Hildenbrand <david@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Dan Williams <dan.j.williams@intel.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	Liang Li <liliangleo@didiglobal.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org
Subject: Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO
Date: Tue, 5 Jan 2021 10:20:37 +0100	[thread overview]
Message-ID: <20210105092037.GY13207@dhcp22.suse.cz> (raw)
In-Reply-To: <6bfcc500-7c11-f66a-26ea-e8b8bcc79e28@intel.com>

On Mon 04-01-21 15:00:31, Dave Hansen wrote:
> On 1/4/21 12:11 PM, David Hildenbrand wrote:
> >> Yeah, it certainly can't be the default, but it *is* useful for
> >> thing where we know that there are no cache benefits to zeroing
> >> close to where the memory is allocated.
> >> 
> >> The trick is opting into it somehow, either in a process or a VMA.
> >> 
> > The patch set is mostly trying to optimize starting a new process. So
> > process/vma doesn‘t really work.
> 
> Let's say you have a system-wide tunable that says: pre-zero pages and
> keep 10GB of them around.  Then, you opt-in a process to being allowed
> to dip into that pool with a process-wide flag or an madvise() call.
> You could even have the flag be inherited across execve() if you wanted
> to have helper apps be able to set the policy and access the pool like
> how numactl works.

While possible, it sounds quite heavy weight to me. Page allocator would
have to somehow maintain those pre-zeroed pages. This pool will also
become a very scarce resource very soon because everybody just want to
run faster. So this would open many more interesting questions.

A global knob with all or nothing sounds like an easier to use and
maintain solution to me.

> Dan makes a very good point about using filesystems for this, though.
> It wouldn't be rocket science to set up a special tmpfs mount just for
> VM memory and pre-zero it from userspace.  For qemu, you'd need to teach
> the management layer to hand out zeroed files via mem-path=.

Agreed. That would be an interesting option.

> Heck, if
> you taught MADV_FREE how to handle tmpfs, you could even pre-zero *and*
> get the memory back quickly if those files ended up over-sized somehow.

We can probably allow MADV_FREE on shmem but that would require an
exclusively mapped page. Shared case is really tricky because of silent
data corruption in uncoordinated userspace.
-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2021-01-05  9:20 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-21 16:30 Liang Li
2021-01-04 19:19 ` Dave Hansen
2021-01-04 19:27   ` Matthew Wilcox
2021-01-04 19:44     ` Dan Williams
2021-01-04 19:51     ` Dave Hansen
2021-01-04 20:11       ` David Hildenbrand
2021-01-04 22:29         ` Dan Williams
2021-01-04 23:00         ` Dave Hansen
2021-01-05  9:20           ` Michal Hocko [this message]
2021-01-05  9:29             ` David Hildenbrand
2021-01-05  9:39               ` Liang Li
2021-01-05  9:56               ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210105092037.GY13207@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=liliangleo@didiglobal.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mst@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox