From: David Hildenbrand <david@redhat.com>
To: Jens Axboe <axboe@kernel.dk>,
Andrew Dona-Couch <andrew@donacou.ch>,
Andrew Morton <akpm@linux-foundation.org>,
Drew DeVault <sir@cmpwn.com>
Cc: Ammar Faizi <ammarfaizi2@gnuweeb.org>,
linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
io_uring Mailing List <io-uring@vger.kernel.org>,
Pavel Begunkov <asml.silence@gmail.com>,
linux-mm@kvack.org
Subject: Re: [PATCH] Increase default MLOCK_LIMIT to 8 MiB
Date: Mon, 22 Nov 2021 22:56:24 +0100 [thread overview]
Message-ID: <ffa66565-d546-a2cf-1748-38b9992fd5b8@redhat.com> (raw)
In-Reply-To: <3adc55d3-f383-efa9-7319-740fc6ab5d7a@kernel.dk>
On 22.11.21 21:44, Jens Axboe wrote:
> On 11/22/21 1:08 PM, David Hildenbrand wrote:
>> On 22.11.21 20:53, Jens Axboe wrote:
>>> On 11/22/21 11:26 AM, David Hildenbrand wrote:
>>>> On 22.11.21 18:55, Andrew Dona-Couch wrote:
>>>>> Forgive me for jumping in to an already overburdened thread. But can
>>>>> someone pushing back on this clearly explain the issue with applying
>>>>> this patch?
>>>>
>>>> It will allow unprivileged users to easily and even "accidentally"
>>>> allocate more unmovable memory than it should in some environments. Such
>>>> limits exist for a reason. And there are ways for admins/distros to
>>>> tweak these limits if they know what they are doing.
>>>
>>> But that's entirely the point, the cases where this change is needed are
>>> already screwed by a distro and the user is the administrator. This is
>>> _exactly_ the case where things should just work out of the box. If
>>> you're managing farms of servers, yeah you have competent administration
>>> and you can be expected to tweak settings to get the best experience and
>>> performance, but the kernel should provide a sane default. 64K isn't a
>>> sane default.
>>
>> 0.1% of RAM isn't either.
>
> No default is perfect, byt 0.1% will solve 99% of the problem. And most
> likely solve 100% of the problems for the important case, which is where
> you want things to Just Work on your distro without doing any
> administration. If you're aiming for perfection, it doesn't exist.
... and my Fedora is already at 16 MiB *sigh*.
And I'm not aiming for perfection, I'm aiming for as little
FOLL_LONGTERM users as possible ;)
>
>>>> This is not a step into the right direction. This is all just trying to
>>>> hide the fact that we're exposing FOLL_LONGTERM usage to random
>>>> unprivileged users.
>>>>
>>>> Maybe we could instead try getting rid of FOLL_LONGTERM usage and the
>>>> memlock limit in io_uring altogether, for example, by using mmu
>>>> notifiers. But I'm no expert on the io_uring code.
>>>
>>> You can't use mmu notifiers without impacting the fast path. This isn't
>>> just about io_uring, there are other users of memlock right now (like
>>> bpf) which just makes it even worse.
>>
>> 1) Do we have a performance evaluation? Did someone try and come up with
>> a conclusion how bad it would be?
>
> I honestly don't remember the details, I took a look at it about a year
> ago due to some unrelated reasons. These days it just pertains to
> registered buffers, so it's less of an issue than back then when it
> dealt with the rings as well. Hence might be feasible, I'm certainly not
> against anyone looking into it. Easy enough to review and test for
> performance concerns.
That at least sounds promising.
>
>> 2) Could be provide a mmu variant to ordinary users that's just good
>> enough but maybe not as fast as what we have today? And limit
>> FOLL_LONGTERM to special, privileged users?
>
> If it's not as fast, then it's most likely not good enough though...
There is always a compromise of course.
See, FOLL_LONGTERM is *the worst* kind of memory allocation thingy you
could possible do to your MM subsystem. It's absolutely the worst thing
you can do to swap and compaction.
I really don't want random feature X to be next and say "well, io_uring
uses it, so I can just use it for max performance and we'll adjust the
memlock limit, who cares!".
>
>> 3) Just because there are other memlock users is not an excuse. For
>> example, VFIO/VDPA have to use it for a reason, because there is no way
>> not do use FOLL_LONGTERM.
>
> It's not an excuse, the statement merely means that the problem is
> _worse_ as there are other memlock users.
Yes, and it will keep getting worse every time we introduce more
FOLL_LONGTERM users that really shouldn't be FOLL_LONGTERM users unless
really required. Again, VFIO/VDPA/RDMA are prime examples, because the
HW forces us to do it. And these are privileged features either way.
>
>>>
>>> We should just make this 0.1% of RAM (min(0.1% ram, 64KB)) or something
>>> like what was suggested, if that will help move things forward. IMHO the
>>> 32MB machine is mostly a theoretical case, but whatever .
>>
>> 1) I'm deeply concerned about large ZONE_MOVABLE and MIGRATE_CMA ranges
>> where FOLL_LONGTERM cannot be used, as that memory is not available.
>>
>> 2) With 0.1% RAM it's sufficient to start 1000 processes to break any
>> system completely and deeply mess up the MM. Oh my.
>
> We're talking per-user limits here. But if you want to talk hyperbole,
> then 64K multiplied by some other random number will also allow
> everything to be pinned, potentially.
>
Right, it's per-user. 0.1% per user FOLL_LONGTERM locked into memory in
the worst case.
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2021-11-22 21:56 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20211028080813.15966-1-sir@cmpwn.com>
[not found] ` <CAFBCWQ+=2T4U7iNQz_vsBsGVQ72s+QiECndy_3AMFV98bMOLow@mail.gmail.com>
[not found] ` <CFII8LNSW5XH.3OTIVFYX8P65Y@taiga>
[not found] ` <593aea3b-e4a4-65ce-0eda-cb3885ff81cd@gnuweeb.org>
2021-11-16 4:35 ` Andrew Morton
2021-11-16 6:32 ` Drew DeVault
2021-11-16 19:47 ` Andrew Morton
2021-11-16 19:48 ` Drew DeVault
2021-11-16 21:37 ` Andrew Morton
2021-11-17 8:23 ` Drew DeVault
2021-11-22 17:11 ` David Hildenbrand
2021-11-22 17:55 ` Andrew Dona-Couch
2021-11-22 18:26 ` David Hildenbrand
2021-11-22 19:53 ` Jens Axboe
2021-11-22 20:03 ` Matthew Wilcox
2021-11-22 20:04 ` Jens Axboe
2021-11-22 20:08 ` David Hildenbrand
2021-11-22 20:44 ` Jens Axboe
2021-11-22 21:56 ` David Hildenbrand [this message]
2021-11-23 12:02 ` David Hildenbrand
2021-11-23 13:25 ` Jason Gunthorpe
2021-11-23 13:39 ` David Hildenbrand
2021-11-23 14:07 ` Jason Gunthorpe
2021-11-23 14:44 ` David Hildenbrand
2021-11-23 17:00 ` Jason Gunthorpe
2021-11-23 17:04 ` David Hildenbrand
2021-11-23 22:04 ` Vlastimil Babka
2021-11-23 23:59 ` Jason Gunthorpe
2021-11-24 8:57 ` David Hildenbrand
2021-11-24 13:23 ` Jason Gunthorpe
2021-11-24 13:25 ` David Hildenbrand
2021-11-24 13:28 ` Jason Gunthorpe
2021-11-24 13:29 ` David Hildenbrand
2021-11-24 13:48 ` Jason Gunthorpe
2021-11-24 14:14 ` David Hildenbrand
2021-11-24 15:34 ` Jason Gunthorpe
2021-11-24 16:43 ` David Hildenbrand
2021-11-24 18:35 ` Jason Gunthorpe
2021-11-24 19:09 ` David Hildenbrand
2021-11-24 23:11 ` Jason Gunthorpe
2021-11-30 15:52 ` David Hildenbrand
2021-11-24 18:37 ` David Hildenbrand
2021-11-24 14:37 ` Vlastimil Babka
2021-11-24 14:41 ` David Hildenbrand
2021-11-16 18:36 ` Matthew Wilcox
2021-11-16 18:44 ` Drew DeVault
2021-11-16 18:55 ` Jens Axboe
2021-11-16 19:21 ` Vito Caputo
2021-11-16 19:25 ` Drew DeVault
2021-11-16 19:46 ` Vito Caputo
2021-11-16 19:41 ` Jens Axboe
2021-11-17 22:26 ` Johannes Weiner
2021-11-17 23:17 ` Jens Axboe
2021-11-18 21:58 ` Andrew Morton
2021-11-19 7:41 ` Drew DeVault
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ffa66565-d546-a2cf-1748-38b9992fd5b8@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=ammarfaizi2@gnuweeb.org \
--cc=andrew@donacou.ch \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=io-uring@vger.kernel.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=sir@cmpwn.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox