linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: David Hildenbrand <david@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>, Jens Axboe <axboe@kernel.dk>,
	Andrew Dona-Couch <andrew@donacou.ch>,
	Andrew Morton <akpm@linux-foundation.org>,
	Drew DeVault <sir@cmpwn.com>,
	Ammar Faizi <ammarfaizi2@gnuweeb.org>,
	linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
	io_uring Mailing List <io-uring@vger.kernel.org>,
	Pavel Begunkov <asml.silence@gmail.com>,
	linux-mm@kvack.org
Subject: Re: [PATCH] Increase default MLOCK_LIMIT to 8 MiB
Date: Wed, 24 Nov 2021 19:11:33 -0400	[thread overview]
Message-ID: <20211124231133.GM5112@ziepe.ca> (raw)
In-Reply-To: <cc9d3f3e-2fe1-0df0-06b2-c54e020161da@redhat.com>

On Wed, Nov 24, 2021 at 08:09:42PM +0100, David Hildenbrand wrote:

> That would be giving up on compound pages (hugetlbfs, THP, ...) on any
> current Linux system that does not use ZONE_MOVABLE -- which is not
> something I am not willing to buy into, just like our customers ;)

So we have ZONE_MOVABLE but users won't use it?

Then why is the solution to push the same kinds of restrictions as
ZONE_MOVABLE on to ZONE_NORMAL?
 
> See my other mail, the upstream version of my reproducer essentially
> shows what FOLL_LONGTERM is currently doing wrong with pageblocks. And
> at least to me that's an interesting insight :)

Hmm. To your reproducer it would be nice if we could cgroup control
the # of page blocks a cgroup has pinned. Focusing on # pages pinned
is clearly the wrong metric, I suggested the whole compound earlier,
but your point about the entire page block being ruined makes sense
too.

It means pinned pages will have be migrated to already ruined page
blocks the cgroup owns, which is a more controlled version of the
FOLL_LONGTERM migration you have been thinking about.

This would effectively limit the fragmentation a hostile process group
can create. If we further treated unmovable cgroup charged kernel
allocations as 'pinned' and routed them to the pinned page blocks it
start to look really interesting. Kill the cgroup, get all your THPs
back? Fragmentation cannot extend past the cgroup?

ie there are lots of batch workloads that could be interesting there -
wrap the batch in a cgroup, run it, then kill everything and since the
cgroup gives some lifetime clustering to the allocator you get a lot
less fragmentation when the batch is finished, so the next batch gets
more THPs, etc.

There is also sort of an interesting optimization opportunity - many
FOLL_LONGTERM users would be happy to spend more time pinning to get
nice contiguous memory ranges. Might help convince people that the
extra pin time for migrations is worthwhile.

> > Something like io_ring is registering a bulk amount of memory and then
> > doing some potentially long operations against it.
> 
> The individual operations it performs are comparable to O_DIRECT I think

Yes, and O_DIRECT can take 10s's of seconds in troubled cases with IO
timeouts and things.

Plus io_uring is worse as the buffer is potentially shared by many in
fight ops and you'd have to block new ops of the buffer and flush all
running ops before any mapping change can happen, all while holding up
a mmu notifier.

Not only is it bad for mm subsystem operations, but would
significantly harm io_uring performance if a migration hits.

So, I really don't like abusing mmu notifiers for stuff like this. I
didn't like it in virtio either :)

Jason


  reply	other threads:[~2021-11-24 23:12 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20211028080813.15966-1-sir@cmpwn.com>
     [not found] ` <CAFBCWQ+=2T4U7iNQz_vsBsGVQ72s+QiECndy_3AMFV98bMOLow@mail.gmail.com>
     [not found]   ` <CFII8LNSW5XH.3OTIVFYX8P65Y@taiga>
     [not found]     ` <593aea3b-e4a4-65ce-0eda-cb3885ff81cd@gnuweeb.org>
2021-11-16  4:35       ` Andrew Morton
2021-11-16  6:32         ` Drew DeVault
2021-11-16 19:47           ` Andrew Morton
2021-11-16 19:48             ` Drew DeVault
2021-11-16 21:37               ` Andrew Morton
2021-11-17  8:23                 ` Drew DeVault
2021-11-22 17:11                 ` David Hildenbrand
2021-11-22 17:55                   ` Andrew Dona-Couch
2021-11-22 18:26                     ` David Hildenbrand
2021-11-22 19:53                       ` Jens Axboe
2021-11-22 20:03                         ` Matthew Wilcox
2021-11-22 20:04                           ` Jens Axboe
2021-11-22 20:08                         ` David Hildenbrand
2021-11-22 20:44                           ` Jens Axboe
2021-11-22 21:56                             ` David Hildenbrand
2021-11-23 12:02                               ` David Hildenbrand
2021-11-23 13:25                           ` Jason Gunthorpe
2021-11-23 13:39                             ` David Hildenbrand
2021-11-23 14:07                               ` Jason Gunthorpe
2021-11-23 14:44                                 ` David Hildenbrand
2021-11-23 17:00                                   ` Jason Gunthorpe
2021-11-23 17:04                                     ` David Hildenbrand
2021-11-23 22:04                                     ` Vlastimil Babka
2021-11-23 23:59                                       ` Jason Gunthorpe
2021-11-24  8:57                                         ` David Hildenbrand
2021-11-24 13:23                                           ` Jason Gunthorpe
2021-11-24 13:25                                             ` David Hildenbrand
2021-11-24 13:28                                               ` Jason Gunthorpe
2021-11-24 13:29                                                 ` David Hildenbrand
2021-11-24 13:48                                                   ` Jason Gunthorpe
2021-11-24 14:14                                                     ` David Hildenbrand
2021-11-24 15:34                                                       ` Jason Gunthorpe
2021-11-24 16:43                                                         ` David Hildenbrand
2021-11-24 18:35                                                           ` Jason Gunthorpe
2021-11-24 19:09                                                             ` David Hildenbrand
2021-11-24 23:11                                                               ` Jason Gunthorpe [this message]
2021-11-30 15:52                                                                 ` David Hildenbrand
2021-11-24 18:37                                                           ` David Hildenbrand
2021-11-24 14:37                                           ` Vlastimil Babka
2021-11-24 14:41                                             ` David Hildenbrand
2021-11-16 18:36         ` Matthew Wilcox
2021-11-16 18:44           ` Drew DeVault
2021-11-16 18:55           ` Jens Axboe
2021-11-16 19:21             ` Vito Caputo
2021-11-16 19:25               ` Drew DeVault
2021-11-16 19:46                 ` Vito Caputo
2021-11-16 19:41               ` Jens Axboe
2021-11-17 22:26         ` Johannes Weiner
2021-11-17 23:17           ` Jens Axboe
2021-11-18 21:58             ` Andrew Morton
2021-11-19  7:41               ` Drew DeVault

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211124231133.GM5112@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=akpm@linux-foundation.org \
    --cc=ammarfaizi2@gnuweeb.org \
    --cc=andrew@donacou.ch \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=david@redhat.com \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sir@cmpwn.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox