From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
To: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: kosaki.motohiro@jp.fujitsu.com, linux-mm <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>,
Rik van Riel <riel@redhat.com>
Subject: Re: [RFC PATCH for -mm 3/5] kill unnecessary locked_vm adjustment
Date: Mon, 18 Aug 2008 18:23:57 +0900 [thread overview]
Message-ID: <20080818164449.60CE.KOSAKI.MOTOHIRO@jp.fujitsu.com> (raw)
In-Reply-To: <1218808477.6373.49.camel@lts-notebook>
Hi Lee-san,
Sorry for late responce.
I understand your intention.
I agreed almost part of your explain, but I have few disagree points.
> I'd like to drop this patch and keep the locked_vm adjustment. It
> should still handle the error conditions, as a negative 'adjustment'
> from mlock_vma_pages_range() is handled correctly as an error code.
>
> Now, why keep the adjustment?
>
> First, I agree that locked_vm tracks amount of locked Virtual Memory, as
> the name indicates. IMO, this is not a useful metric. I think what we
> want to track IS the amount of physical memory locked down by the tasks.
hmmm..
Sorry, disagreed.
Various application assume current behavior.
So, I like adding to "locked physical memory" metrics, not replace.
> However, this is difficult. Consider that today, if one has called,
> say, mlockall(MCL_FUTURE) and then you mmap() some large segment into
> your address space more than once, you'll get charged for the locked_vm
> each time you mmap() the same segment. One can easily exceed the
> rlimit, even when the actual amount of locked pages is less than the
> limit. But, as I say, this is difficult to fix efficiently and I don't
> want to try to do that in the context of these patches.
Agreed.
This is mmap()'s bug.
> Now, taking the view that locked_vm counts VM_LOCKED vmas, note that we
> do not set VM_LOCKED for those vmas where it does not make sense to
> mlock the pages with SetPageMlocked(). That is, the "special" vmas,
> that have the following flags: VM_IO, VM_PFNMAP, VM_RESERVED,
> VM_DONTEXPAND, and VM_HUGETLB -- all of the vmas that we filter out in
> mlock_fixup().
>
> Now, for vmas with VM_IO or VM_PFNMAP, we don't even attempt to
> "make_pages_present()" because get_user_pages() will error out
> immediately. For the other types, we do make_pages_present() to avoid
> future minor faults, but we don't set PageMlocked(). Because we don't
> want to have to deal with these vmas during munmap() [including exit
> processing] nor duplicate the vma filtering in that path, we don't set
> the VM_LOCKED flags for these vmas. Since the vma is not VM_LOCKED, we
> don't want to count the memory as locked_vm.
>
> For mlock[all](), we do the locked_vm accounting in mlock_fixup(). We
> just don't count these vmas. See the comments "don't set VM_LOCKED,
> don't count" in the code. However, mmap() does the locked_vm accounting
> before calling mlock_vma_pages_range() [which replaces calls to
> make_pages_present()]. To make mmap(MAP_LOCKED) behavior consistent
> with mlock[all]() behavior, we return a "locked_vm" adjustment from
> mlock_vma_pages_range(). For the filtered, special vmas, this is always
> the size of the vma which mmap() has already added to locked_vm().
Agreed.
That is definitly bug. (and this isn't introduced by us)
but current implementation is slightly bad.
mlock_vma_pages_range()'s prototype is ..
int mlock_vma_pages_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end)
==> return type is int.
but sys_mlock()'s prototype is ..
asmlinkage long sys_mlock(unsigned long start, size_t len)
==> len argument's type is size_t.
So, this adjustment code can overflow easily.
maybe, mlock_vma_pages_range()'s type should be changed as bellow
int mlock_vma_pages_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end,
unsigned long *locked_pages)
but, I think this should do as another patch
because its bug isn't introduced by us, then it is another change.
So, I like following order changing.
1. remove current adjustment code from split-lru patch series
2. re-introduce it as another patch. (of cource, overflow problem should be fixed)
> So, I'd like to keep the adjustment returned by mlock_vma_pages_range().
> However, as you note, the internal function __mlock_vma_pages_range()
> always returns non-positive values [in earlier versions of this series
> it could return a positive, non-zero value but I decided that was not
> appropriate]. It can, however, return an error code with your changes.
> If we keep the locked_vm adjustment, as I propose here, then we need to
> pass any return value from __mlock_vma_pages_range() via the "nr_pages"
> variable which can contain an error code [< 0] as well as a locked_vm
> adjustment [>= 0].
>
> IMO, we still have some issues in locked_vm accounting that needs to be
> addressed. E.g., when you attached a hugetlb shm segment, it is
> automatically counted as locked_shm [tracked independently of locked_vm
> in the user struct instead of mm -- go figure!]. However, when we
> mmap() a hugetlbfs file, we don't automatically add this to locked_vm.
> Maybe we should, but not as part of this patch series, I think.
Agreed.
> Does this make sense?
>
> Maybe I need to explain the rationale better in the unevictable_lru
> documentation. I do discuss the behavior [locked_vm adjustment], but
> maybe not enough rationale.
Could you tell me your feeling about my concern?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-08-18 9:23 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-11 7:01 [RFC PATCH for -mm 0/5] mlock return value rework KOSAKI Motohiro
2008-08-11 7:04 ` [RFC PATCH for -mm 1/5] mlock() fix return values for mainline KOSAKI Motohiro
2008-08-12 20:39 ` Lee Schermerhorn
2008-08-13 8:03 ` KOSAKI Motohiro
2008-08-11 7:05 ` [RFC PATCH for -mm 2/5] related function comment fixes (optional) KOSAKI Motohiro
2008-08-12 19:02 ` Lee Schermerhorn
2008-08-13 8:37 ` KOSAKI Motohiro
2008-08-11 7:06 ` [RFC PATCH for -mm 3/5] kill unnecessary locked_vm adjustment KOSAKI Motohiro
2008-08-12 19:55 ` Lee Schermerhorn
2008-08-13 9:37 ` KOSAKI Motohiro
2008-08-15 13:54 ` Lee Schermerhorn
2008-08-18 9:23 ` KOSAKI Motohiro [this message]
2008-08-18 20:56 ` Lee Schermerhorn
2008-08-11 7:07 ` [RFC PATCH for -mm 4/5] fix mlock return value at munmap race KOSAKI Motohiro
2008-08-12 20:19 ` Lee Schermerhorn
2008-08-11 7:08 ` [RFC PATCH for -mm 5/5] fix mlock return value for mm KOSAKI Motohiro
2008-08-11 7:43 ` KOSAKI Motohiro
2008-08-12 20:30 ` Lee Schermerhorn
2008-08-13 8:36 ` KOSAKI Motohiro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080818164449.60CE.KOSAKI.MOTOHIRO@jp.fujitsu.com \
--to=kosaki.motohiro@jp.fujitsu.com \
--cc=Lee.Schermerhorn@hp.com \
--cc=akpm@linux-foundation.org \
--cc=linux-mm@kvack.org \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox