linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Christoph Lameter <cl@gentwo.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	akpm@linux-foundation.org, torvalds@linux-foundation.org,
	roland@kernel.org, tglx@linutronix.de, kosaki.motohiro@gmail.com,
	penberg@kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-rdma@vger.kernel.org
Subject: Re: [PATCH] mm: Revert pinned_vm braindamage
Date: Thu, 20 Jun 2013 13:49:43 +0200	[thread overview]
Message-ID: <20130620114943.GB12125@gmail.com> (raw)
In-Reply-To: <0000013f536c60ee-9a1ca9da-b798-416a-a32e-c896813d3bac-000000@email.amazonses.com>


* Christoph Lameter <cl@gentwo.org> wrote:

> On Mon, 17 Jun 2013, Peter Zijlstra wrote:
> 
> > They did no such thing; being one of those who wrote such code. I
> > expressly used RLIMIT_MEMLOCK for its the one limit userspace has to
> > limit pages that are exempt from paging.
> 
> Dont remember reviewing that. Assumptions were wrong in that patch then.
> 
> > > Pinned pages are exempted by the kernel. A device driver or some other
> > > kernel process (reclaim, page migration, io etc) increase the page count.
> > > There is currently no consistent accounting for pinned pages. The
> > > vm_pinned counter was introduced to allow the largest pinners to track
> > > what they did.
> >
> > No, not the largest, user space controlled pinnners. The thing that
> > makes all the difference is the _USER_ control.
> 
> The pinning *cannot* be done from user space. Here it is the IB subsystem
> that is doing it.

Peter clearly pointed it out that in the perf case it's user-space that 
initiates the pinned memory mapping which is resource-controlled via 
RLIMIT_MEMLOCK - and this was implemented that way before your commit 
broke the code.

You seem to be hell bent on defining 'memory pinning' only as "the thing 
done via the mlock*() system calls", but that is a nonsensical distinction 
that actively and incorrectly ignores other system calls that can and do 
pin memory legitimately.

If some other system call results in mapping pinned memory that is at 
least as restrictively pinned as an mlock()-ed vma (the perf syscall is 
such) then it's entirely proper design to be resource controlled under 
RLIMIT_MEMLOCK as well. In fact this worked so before your commit broke 
it.

> > > mlockall does not require CAP_IPC_LOCK. Never had an issue.
> >
> > MCL_FUTURE does absolutely require CAP_IPC_LOCK, MCL_CURRENT requires 
> > a huge (as opposed to the default 64k) RLIMIT or CAP_IPC_LOCK.
> >
> > There's no argument there, look at the code.
> 
> I am sorry but we have been mlockall() for years now without the issues 
> that you are bringing up. AFAICT mlockall does not require MCL_FUTURE.

You only have to read the mlockall() code to see that Peter's claim is 
correct:

mm/mlock.c:

SYSCALL_DEFINE1(mlockall, int, flags)
{
        unsigned long lock_limit;
        int ret = -EINVAL;

        if (!flags || (flags & ~(MCL_CURRENT | MCL_FUTURE)))
                goto out;

        ret = -EPERM;
        if (!can_do_mlock())
                goto out;
...


int can_do_mlock(void)
{
        if (capable(CAP_IPC_LOCK))
                return 1;
        if (rlimit(RLIMIT_MEMLOCK) != 0)
                return 1;
        return 0;
}
EXPORT_SYMBOL(can_do_mlock);

Q.E.D.

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-06-20 11:49 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-06 12:43 Peter Zijlstra
2013-06-06 18:46 ` Christoph Lameter
2013-06-07 11:03   ` Peter Zijlstra
2013-06-07 14:52     ` Christoph Lameter
2013-06-17 11:08       ` Peter Zijlstra
2013-06-17 18:36         ` Christoph Lameter
2013-06-20 11:49           ` Ingo Molnar [this message]
2013-06-20 14:48             ` Christoph Lameter
2013-06-21  6:25               ` Roland Dreier
2013-06-21 14:44                 ` Christoph Lameter
2013-06-13 21:06 ` Andrew Morton
2013-06-17  9:45   ` Peter Zijlstra
2013-06-17 12:28   ` Thomas Gleixner
2013-06-20 11:34     ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130620114943.GB12125@gmail.com \
    --to=mingo@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cl@gentwo.org \
    --cc=kosaki.motohiro@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=penberg@kernel.org \
    --cc=peterz@infradead.org \
    --cc=roland@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox