From: Hugh Dickins <hugh@veritas.com>
To: Izik Eidus <ieidus@redhat.com>
Cc: Rik van Riel <riel@redhat.com>,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
aarcange@redhat.com, chrisw@redhat.com, alan@lxorguk.ukuu.org.uk,
device@lanana.org, linux-mm@kvack.org, nickpiggin@yahoo.com.au
Subject: Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
Date: Wed, 6 May 2009 12:16:52 +0100 (BST) [thread overview]
Message-ID: <Pine.LNX.4.64.0905061110470.3519@blonde.anvils> (raw)
In-Reply-To: <4A014C7B.9080702@redhat.com>
On Wed, 6 May 2009, Izik Eidus wrote:
> Rik van Riel wrote:
> >
> > This is different from munmap and madvise, which take both
> > start address and length.
> >
> > Why?
> >
> It work like free, considering the fact that we dont allow memory overlay in
> no way,
> If we have the start of the address it is enough for us to know what memory we
> want to remove.
>
> Isnt interface for userspace that work like malloc / free is enough?
I'm afraid not.
There's an (addr, length) consistency throughout the mm system calls -
mmap munmap mlock munlock mprotect msync madvise, even mincore and mremap
- that we ought not to depart from lightly. And those (addr, length)s
are allowed to break into and span the original mmaps (excepting mremap).
Getting an fd from /dev/ksm, using that in an ioctl to get another fd
(eh?), using that in a further ioctl to specify an addr and number of
pages, may well have been a good interface for getting this working out
of tree, as an adjunct of KVM. But you've done too well at selling KSM
as more generally useful than that: it is good work, I'm liking it, but
if it's going to mainline, then it needs an appropriate user interface.
I'm very much with those who suggested an madvise(), for which Chris
prepared a patch. I know Andrea felt uneasy with an madvise() going
to a possibly-configured-out-or-never-loaded module, but it is just
advice, so I don't have a problem with that myself, so long as it
is documented in the manpage.
Whereas I do worry just what capability should be required for this:
can't a greedy app simply fork itself, touch all its pages, and thus
lock itself into memory in this way? And I do worry about the cpu
cost of all the scanning, if it were to get used more generally -
it would be a pity if we just advised complainers to tune it out.
I'm still working my way through ksm.c, and not gone back to look at
Chris's madvise patch, but doubt it will be sufficient. There's an
interesting difference between what you're doing in ksm.c, and how
madvise usually behaves, regarding unmapped areas: madvice doesn't
usually apply to an unmapped area, and goes away with an area when
it is unmapped; whereas in KSM's case, the advice applies to whatever
happens to get mapped in the area specified, persisting across unmaps.
If KSM is to behave in the usual madvise way, it'll need to be informed
of unmaps. And I suspect it may need to be informed of them, even if we
let it continue to apply to empty address space. Because even with your
more limited unsigned int nrpages interface, the caller can specify an
enormous range on 64-bit, and ksm.c be fully occupied just incrementing
from one absent page to the next. mmap's vma ranges confine the space
to be searched, and instantiated pagetables confine it further: I think
you're either going to need to rely upon those to confine your search
area, or else enhance your own data structures to confine it.
But I do appreciate the separation you've kept so far,
and wouldn't want to tie it all together too closely.
Hugh
p.s. I wish you'd chosen different name than KSM - the kernel
has supported shared memory for many years - and notice ksm.c itself
says "Memory merging driver". "Merge" would indeed have been a less
ambiguous term than "Share", but I think too late to change that now
- except possibly in the MADV_ flag names?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-05-06 13:08 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-04 22:25 [PATCH 0/6] ksm changes (v2) Izik Eidus
2009-05-04 22:25 ` [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd Izik Eidus
2009-05-04 22:25 ` [PATCH 2/6] ksm: dont allow overlap memory addresses registrations Izik Eidus
2009-05-04 22:25 ` [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl Izik Eidus
2009-05-04 22:25 ` [PATCH 4/6] ksm: change the prot handling to use the generic helper functions Izik Eidus
2009-05-04 22:25 ` [PATCH 5/6] ksm: build system make it compile for all archs Izik Eidus
2009-05-04 22:25 ` [PATCH 6/6] ksm: use another miscdevice minor number Izik Eidus
2009-05-06 0:55 ` Rik van Riel
2009-05-06 0:54 ` [PATCH 5/6] ksm: build system make it compile for all archs Rik van Riel
2009-05-06 0:54 ` [PATCH 4/6] ksm: change the prot handling to use the generic helper functions Rik van Riel
2009-05-06 0:53 ` [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl Rik van Riel
2009-05-06 8:38 ` Izik Eidus
2009-05-06 11:16 ` Hugh Dickins [this message]
2009-05-06 13:34 ` Andrea Arcangeli
2009-05-06 13:56 ` Izik Eidus
2009-05-06 16:41 ` Hugh Dickins
2009-05-06 16:49 ` Chris Wright
2009-05-06 16:57 ` Hugh Dickins
2009-05-06 17:47 ` Andrea Arcangeli
2009-05-06 16:59 ` Izik Eidus
2009-05-07 11:31 ` Andrea Arcangeli
2009-05-07 13:13 ` Hugh Dickins
2009-05-07 13:23 ` Andrea Arcangeli
2009-05-06 14:25 ` Hugh Dickins
2009-05-06 14:45 ` Andrea Arcangeli
2009-05-06 15:36 ` Chris Wright
2009-05-06 15:27 ` Izik Eidus
2009-05-06 16:14 ` Chris Wright
2009-05-06 16:36 ` Hugh Dickins
2009-05-06 17:09 ` Chris Wright
2009-05-06 17:54 ` Hugh Dickins
2009-05-06 16:26 ` Hugh Dickins
2009-05-06 16:58 ` Izik Eidus
2009-05-06 23:59 ` Chris Wright
2009-05-07 2:41 ` Rik van Riel
2009-05-06 0:43 ` [PATCH 2/6] ksm: dont allow overlap memory addresses registrations Rik van Riel
2009-05-06 9:46 ` Izik Eidus
2009-05-06 12:26 ` Rik van Riel
2009-05-06 12:39 ` Izik Eidus
2009-05-06 13:17 ` Andrea Arcangeli
2009-05-06 13:28 ` Hugh Dickins
2009-05-06 14:02 ` Izik Eidus
2009-05-06 17:11 ` Hugh Dickins
2009-05-06 14:09 ` Andrea Arcangeli
2009-05-06 14:21 ` Alan Cox
2009-05-06 14:46 ` Hugh Dickins
2009-05-06 14:56 ` Andrea Arcangeli
2009-05-06 23:55 ` Minchan Kim
2009-05-07 0:19 ` Chris Wright
2009-05-07 10:46 ` Andrea Arcangeli
2009-05-07 12:01 ` Minchan Kim
2009-05-06 14:57 ` Izik Eidus
2009-05-06 0:40 ` [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd Rik van Riel
-- strict thread matches above, loose matches on Subject: below --
2009-05-02 22:16 [PATCH 0/6] ksm changes Izik Eidus
2009-05-02 22:16 ` [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd Izik Eidus
2009-05-02 22:16 ` [PATCH 2/6] ksm: dont allow overlap memory addresses registrations Izik Eidus
2009-05-02 22:16 ` [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl Izik Eidus
2009-05-04 19:43 ` Hugh Dickins
2009-05-04 20:37 ` Izik Eidus
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0905061110470.3519@blonde.anvils \
--to=hugh@veritas.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=chrisw@redhat.com \
--cc=device@lanana.org \
--cc=ieidus@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nickpiggin@yahoo.com.au \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox