linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Jens Axboe <axboe@kernel.dk>,
	Andrew Dona-Couch <andrew@donacou.ch>,
	Andrew Morton <akpm@linux-foundation.org>,
	Drew DeVault <sir@cmpwn.com>
Cc: Ammar Faizi <ammarfaizi2@gnuweeb.org>,
	linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
	io_uring Mailing List <io-uring@vger.kernel.org>,
	Pavel Begunkov <asml.silence@gmail.com>,
	linux-mm@kvack.org
Subject: Re: [PATCH] Increase default MLOCK_LIMIT to 8 MiB
Date: Mon, 22 Nov 2021 22:56:24 +0100	[thread overview]
Message-ID: <ffa66565-d546-a2cf-1748-38b9992fd5b8@redhat.com> (raw)
In-Reply-To: <3adc55d3-f383-efa9-7319-740fc6ab5d7a@kernel.dk>

On 22.11.21 21:44, Jens Axboe wrote:
> On 11/22/21 1:08 PM, David Hildenbrand wrote:
>> On 22.11.21 20:53, Jens Axboe wrote:
>>> On 11/22/21 11:26 AM, David Hildenbrand wrote:
>>>> On 22.11.21 18:55, Andrew Dona-Couch wrote:
>>>>> Forgive me for jumping in to an already overburdened thread.  But can
>>>>> someone pushing back on this clearly explain the issue with applying
>>>>> this patch?
>>>>
>>>> It will allow unprivileged users to easily and even "accidentally"
>>>> allocate more unmovable memory than it should in some environments. Such
>>>> limits exist for a reason. And there are ways for admins/distros to
>>>> tweak these limits if they know what they are doing.
>>>
>>> But that's entirely the point, the cases where this change is needed are
>>> already screwed by a distro and the user is the administrator. This is
>>> _exactly_ the case where things should just work out of the box. If
>>> you're managing farms of servers, yeah you have competent administration
>>> and you can be expected to tweak settings to get the best experience and
>>> performance, but the kernel should provide a sane default. 64K isn't a
>>> sane default.
>>
>> 0.1% of RAM isn't either.
> 
> No default is perfect, byt 0.1% will solve 99% of the problem. And most
> likely solve 100% of the problems for the important case, which is where
> you want things to Just Work on your distro without doing any
> administration.  If you're aiming for perfection, it doesn't exist.

... and my Fedora is already at 16 MiB *sigh*.

And I'm not aiming for perfection, I'm aiming for as little
FOLL_LONGTERM users as possible ;)

> 
>>>> This is not a step into the right direction. This is all just trying to
>>>> hide the fact that we're exposing FOLL_LONGTERM usage to random
>>>> unprivileged users.
>>>>
>>>> Maybe we could instead try getting rid of FOLL_LONGTERM usage and the
>>>> memlock limit in io_uring altogether, for example, by using mmu
>>>> notifiers. But I'm no expert on the io_uring code.
>>>
>>> You can't use mmu notifiers without impacting the fast path. This isn't
>>> just about io_uring, there are other users of memlock right now (like
>>> bpf) which just makes it even worse.
>>
>> 1) Do we have a performance evaluation? Did someone try and come up with
>> a conclusion how bad it would be?
> 
> I honestly don't remember the details, I took a look at it about a year
> ago due to some unrelated reasons. These days it just pertains to
> registered buffers, so it's less of an issue than back then when it
> dealt with the rings as well. Hence might be feasible, I'm certainly not
> against anyone looking into it. Easy enough to review and test for
> performance concerns.

That at least sounds promising.

> 
>> 2) Could be provide a mmu variant to ordinary users that's just good
>> enough but maybe not as fast as what we have today? And limit
>> FOLL_LONGTERM to special, privileged users?
> 
> If it's not as fast, then it's most likely not good enough though...

There is always a compromise of course.

See, FOLL_LONGTERM is *the worst* kind of memory allocation thingy you
could possible do to your MM subsystem. It's absolutely the worst thing
you can do to swap and compaction.

I really don't want random feature X to be next and say "well, io_uring
uses it, so I can just use it for max performance and we'll adjust the
memlock limit, who cares!".

> 
>> 3) Just because there are other memlock users is not an excuse. For
>> example, VFIO/VDPA have to use it for a reason, because there is no way
>> not do use FOLL_LONGTERM.
> 
> It's not an excuse, the statement merely means that the problem is
> _worse_ as there are other memlock users.

Yes, and it will keep getting worse every time we introduce more
FOLL_LONGTERM users that really shouldn't be FOLL_LONGTERM users unless
really required. Again, VFIO/VDPA/RDMA are prime examples, because the
HW forces us to do it. And these are privileged features either way.

> 
>>>
>>> We should just make this 0.1% of RAM (min(0.1% ram, 64KB)) or something
>>> like what was suggested, if that will help move things forward. IMHO the
>>> 32MB machine is mostly a theoretical case, but whatever .
>>
>> 1) I'm deeply concerned about large ZONE_MOVABLE and MIGRATE_CMA ranges
>> where FOLL_LONGTERM cannot be used, as that memory is not available.
>>
>> 2) With 0.1% RAM it's sufficient to start 1000 processes to break any
>> system completely and deeply mess up the MM. Oh my.
> 
> We're talking per-user limits here. But if you want to talk hyperbole,
> then 64K multiplied by some other random number will also allow
> everything to be pinned, potentially.
> 

Right, it's per-user. 0.1% per user FOLL_LONGTERM locked into memory in
the worst case.

-- 
Thanks,

David / dhildenb



  reply	other threads:[~2021-11-22 21:56 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20211028080813.15966-1-sir@cmpwn.com>
     [not found] ` <CAFBCWQ+=2T4U7iNQz_vsBsGVQ72s+QiECndy_3AMFV98bMOLow@mail.gmail.com>
     [not found]   ` <CFII8LNSW5XH.3OTIVFYX8P65Y@taiga>
     [not found]     ` <593aea3b-e4a4-65ce-0eda-cb3885ff81cd@gnuweeb.org>
2021-11-16  4:35       ` Andrew Morton
2021-11-16  6:32         ` Drew DeVault
2021-11-16 19:47           ` Andrew Morton
2021-11-16 19:48             ` Drew DeVault
2021-11-16 21:37               ` Andrew Morton
2021-11-17  8:23                 ` Drew DeVault
2021-11-22 17:11                 ` David Hildenbrand
2021-11-22 17:55                   ` Andrew Dona-Couch
2021-11-22 18:26                     ` David Hildenbrand
2021-11-22 19:53                       ` Jens Axboe
2021-11-22 20:03                         ` Matthew Wilcox
2021-11-22 20:04                           ` Jens Axboe
2021-11-22 20:08                         ` David Hildenbrand
2021-11-22 20:44                           ` Jens Axboe
2021-11-22 21:56                             ` David Hildenbrand [this message]
2021-11-23 12:02                               ` David Hildenbrand
2021-11-23 13:25                           ` Jason Gunthorpe
2021-11-23 13:39                             ` David Hildenbrand
2021-11-23 14:07                               ` Jason Gunthorpe
2021-11-23 14:44                                 ` David Hildenbrand
2021-11-23 17:00                                   ` Jason Gunthorpe
2021-11-23 17:04                                     ` David Hildenbrand
2021-11-23 22:04                                     ` Vlastimil Babka
2021-11-23 23:59                                       ` Jason Gunthorpe
2021-11-24  8:57                                         ` David Hildenbrand
2021-11-24 13:23                                           ` Jason Gunthorpe
2021-11-24 13:25                                             ` David Hildenbrand
2021-11-24 13:28                                               ` Jason Gunthorpe
2021-11-24 13:29                                                 ` David Hildenbrand
2021-11-24 13:48                                                   ` Jason Gunthorpe
2021-11-24 14:14                                                     ` David Hildenbrand
2021-11-24 15:34                                                       ` Jason Gunthorpe
2021-11-24 16:43                                                         ` David Hildenbrand
2021-11-24 18:35                                                           ` Jason Gunthorpe
2021-11-24 19:09                                                             ` David Hildenbrand
2021-11-24 23:11                                                               ` Jason Gunthorpe
2021-11-30 15:52                                                                 ` David Hildenbrand
2021-11-24 18:37                                                           ` David Hildenbrand
2021-11-24 14:37                                           ` Vlastimil Babka
2021-11-24 14:41                                             ` David Hildenbrand
2021-11-16 18:36         ` Matthew Wilcox
2021-11-16 18:44           ` Drew DeVault
2021-11-16 18:55           ` Jens Axboe
2021-11-16 19:21             ` Vito Caputo
2021-11-16 19:25               ` Drew DeVault
2021-11-16 19:46                 ` Vito Caputo
2021-11-16 19:41               ` Jens Axboe
2021-11-17 22:26         ` Johannes Weiner
2021-11-17 23:17           ` Jens Axboe
2021-11-18 21:58             ` Andrew Morton
2021-11-19  7:41               ` Drew DeVault

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ffa66565-d546-a2cf-1748-38b9992fd5b8@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=ammarfaizi2@gnuweeb.org \
    --cc=andrew@donacou.ch \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sir@cmpwn.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox