* Re: [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
2015-05-13 14:38 ` [PATCH 1/2] mmap.2: " Michal Hocko
@ 2015-05-13 14:45 ` Eric B Munson
2015-05-13 14:48 ` Eric B Munson
2015-05-14 8:01 ` Michal Hocko
2015-05-14 13:36 ` Michael Kerrisk (man-pages)
2016-05-11 11:07 ` Peter Zijlstra
2 siblings, 2 replies; 14+ messages in thread
From: Eric B Munson @ 2015-05-13 14:45 UTC (permalink / raw)
To: Michal Hocko
Cc: Michael Kerrisk, Andrew Morton, Linus Torvalds, David Rientjes,
LKML, Linux API, linux-mm, Michal Hocko
[-- Attachment #1: Type: text/plain, Size: 1515 bytes --]
On Wed, 13 May 2015, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.cz>
>
> MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since
> it has been introduced.
> mlock(2) fails if the memory range cannot get populated to guarantee
> that no future major faults will happen on the range. mmap(MAP_LOCKED) on
> the other hand silently succeeds even if the range was populated only
> partially.
>
> Fixing this subtle difference in the kernel is rather awkward because
> the memory population happens after mm locks have been dropped and so
> the cleanup before returning failure (munlock) could operate on something
> else than the originally mapped area.
>
> E.g. speculative userspace page fault handler catching SEGV and doing
> mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing
> mmap and lead to lost data. Although it is not clear whether such a
> usage would be valid, mmap page doesn't explicitly describe requirements
> for threaded applications so we cannot exclude this possibility.
>
> This patch makes the semantic of MAP_LOCKED explicit and suggest using
> mmap + mlock as the only way to guarantee no later major page faults.
>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
Does the problem still happend when MAP_POPULATE | MAP_LOCKED is used
(AFAICT MAP_POPULATE will cause the mmap to fail if all the pages cannot
be made present).
Either way this is a good catch.
Acked-by: Eric B Munson <emunson@akamai.com>
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
2015-05-13 14:45 ` Eric B Munson
@ 2015-05-13 14:48 ` Eric B Munson
2015-05-14 8:01 ` Michal Hocko
1 sibling, 0 replies; 14+ messages in thread
From: Eric B Munson @ 2015-05-13 14:48 UTC (permalink / raw)
To: Michal Hocko
Cc: Michael Kerrisk, Andrew Morton, Linus Torvalds, David Rientjes,
LKML, Linux API, linux-mm, Michal Hocko
[-- Attachment #1: Type: text/plain, Size: 1729 bytes --]
On Wed, 13 May 2015, Eric B Munson wrote:
> On Wed, 13 May 2015, Michal Hocko wrote:
>
> > From: Michal Hocko <mhocko@suse.cz>
> >
> > MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since
> > it has been introduced.
> > mlock(2) fails if the memory range cannot get populated to guarantee
> > that no future major faults will happen on the range. mmap(MAP_LOCKED) on
> > the other hand silently succeeds even if the range was populated only
> > partially.
> >
> > Fixing this subtle difference in the kernel is rather awkward because
> > the memory population happens after mm locks have been dropped and so
> > the cleanup before returning failure (munlock) could operate on something
> > else than the originally mapped area.
> >
> > E.g. speculative userspace page fault handler catching SEGV and doing
> > mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing
> > mmap and lead to lost data. Although it is not clear whether such a
> > usage would be valid, mmap page doesn't explicitly describe requirements
> > for threaded applications so we cannot exclude this possibility.
> >
> > This patch makes the semantic of MAP_LOCKED explicit and suggest using
> > mmap + mlock as the only way to guarantee no later major page faults.
> >
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
>
> Does the problem still happend when MAP_POPULATE | MAP_LOCKED is used
> (AFAICT MAP_POPULATE will cause the mmap to fail if all the pages cannot
> be made present).
>
> Either way this is a good catch.
>
> Acked-by: Eric B Munson <emunson@akamai.com>
>
Sorry for the noise, this should have been a
Reviewed-by: Eric B Munson <emunson@akamai.com>
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
2015-05-13 14:45 ` Eric B Munson
2015-05-13 14:48 ` Eric B Munson
@ 2015-05-14 8:01 ` Michal Hocko
1 sibling, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2015-05-14 8:01 UTC (permalink / raw)
To: Eric B Munson
Cc: Michael Kerrisk, Andrew Morton, Linus Torvalds, David Rientjes,
LKML, Linux API, linux-mm
On Wed 13-05-15 10:45:06, Eric B Munson wrote:
> On Wed, 13 May 2015, Michal Hocko wrote:
>
> > From: Michal Hocko <mhocko@suse.cz>
> >
> > MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since
> > it has been introduced.
> > mlock(2) fails if the memory range cannot get populated to guarantee
> > that no future major faults will happen on the range. mmap(MAP_LOCKED) on
> > the other hand silently succeeds even if the range was populated only
> > partially.
> >
> > Fixing this subtle difference in the kernel is rather awkward because
> > the memory population happens after mm locks have been dropped and so
> > the cleanup before returning failure (munlock) could operate on something
> > else than the originally mapped area.
> >
> > E.g. speculative userspace page fault handler catching SEGV and doing
> > mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing
> > mmap and lead to lost data. Although it is not clear whether such a
> > usage would be valid, mmap page doesn't explicitly describe requirements
> > for threaded applications so we cannot exclude this possibility.
> >
> > This patch makes the semantic of MAP_LOCKED explicit and suggest using
> > mmap + mlock as the only way to guarantee no later major page faults.
> >
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
>
> Does the problem still happend when MAP_POPULATE | MAP_LOCKED is used
> (AFAICT MAP_POPULATE will cause the mmap to fail if all the pages cannot
> be made present).
No, there is no difference because MAP_POPULATE is implicit when
MAP_LOCKED is used and as pointed in the cover, we cannot fail after the
vma is created and locks dropped. The second patch tries to clarify that
MAP_POPULATE is just a best effort.
> Either way this is a good catch.
>
> Acked-by: Eric B Munson <emunson@akamai.com>
Thanks!
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
2015-05-13 14:38 ` [PATCH 1/2] mmap.2: " Michal Hocko
2015-05-13 14:45 ` Eric B Munson
@ 2015-05-14 13:36 ` Michael Kerrisk (man-pages)
2016-05-11 11:07 ` Peter Zijlstra
2 siblings, 0 replies; 14+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-05-14 13:36 UTC (permalink / raw)
To: Michal Hocko
Cc: mtk.manpages, Andrew Morton, Linus Torvalds, David Rientjes,
LKML, Linux API, linux-mm, Michal Hocko, Eric B Munson
On 05/13/2015 04:38 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.cz>
>
> MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since
> it has been introduced.
> mlock(2) fails if the memory range cannot get populated to guarantee
> that no future major faults will happen on the range. mmap(MAP_LOCKED) on
> the other hand silently succeeds even if the range was populated only
> partially.
>
> Fixing this subtle difference in the kernel is rather awkward because
> the memory population happens after mm locks have been dropped and so
> the cleanup before returning failure (munlock) could operate on something
> else than the originally mapped area.
>
> E.g. speculative userspace page fault handler catching SEGV and doing
> mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing
> mmap and lead to lost data. Although it is not clear whether such a
> usage would be valid, mmap page doesn't explicitly describe requirements
> for threaded applications so we cannot exclude this possibility.
>
> This patch makes the semantic of MAP_LOCKED explicit and suggest using
> mmap + mlock as the only way to guarantee no later major page faults.
Thanks, Michal. Applied, with Reviewed-by: from Eric added.
Cheers,
Michael
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
> man2/mmap.2 | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/man2/mmap.2 b/man2/mmap.2
> index 54d68cf87e9e..1486be2e96b3 100644
> --- a/man2/mmap.2
> +++ b/man2/mmap.2
> @@ -235,8 +235,19 @@ See the Linux kernel source file
> for further information.
> .TP
> .BR MAP_LOCKED " (since Linux 2.5.37)"
> -Lock the pages of the mapped region into memory in the manner of
> +Mark the mmaped region to be locked in the same way as
> .BR mlock (2).
> +This implementation will try to populate (prefault) the whole range but
> +the mmap call doesn't fail with
> +.B ENOMEM
> +if this fails. Therefore major faults might happen later on. So the semantic
> +is not as strong as
> +.BR mlock (2).
> +.BR mmap (2)
> ++
> +.BR mlock (2)
> +should be used when major faults are not acceptable after the initialization
> +of the mapping.
> This flag is ignored in older kernels.
> .\" If set, the mapped pages will not be swapped out.
> .TP
>
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
2015-05-13 14:38 ` [PATCH 1/2] mmap.2: " Michal Hocko
2015-05-13 14:45 ` Eric B Munson
2015-05-14 13:36 ` Michael Kerrisk (man-pages)
@ 2016-05-11 11:07 ` Peter Zijlstra
2016-05-11 11:18 ` Peter Zijlstra
2016-05-11 11:32 ` Michal Hocko
2 siblings, 2 replies; 14+ messages in thread
From: Peter Zijlstra @ 2016-05-11 11:07 UTC (permalink / raw)
To: Michal Hocko, Michael Kerrisk
Cc: Andrew Morton, Linus Torvalds, David Rientjes, LKML, Linux API,
linux-mm, Michal Hocko
On 05/13/2015 04:38 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.cz>
>
> MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since
> it has been introduced.
> mlock(2) fails if the memory range cannot get populated to guarantee
> that no future major faults will happen on the range. mmap(MAP_LOCKED) on
> the other hand silently succeeds even if the range was populated only
> partially.
>
> Fixing this subtle difference in the kernel is rather awkward because
> the memory population happens after mm locks have been dropped and so
> the cleanup before returning failure (munlock) could operate on something
> else than the originally mapped area.
>
> E.g. speculative userspace page fault handler catching SEGV and doing
> mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing
> mmap and lead to lost data. Although it is not clear whether such a
> usage would be valid, mmap page doesn't explicitly describe requirements
> for threaded applications so we cannot exclude this possibility.
>
> This patch makes the semantic of MAP_LOCKED explicit and suggest using
> mmap + mlock as the only way to guarantee no later major page faults.
>
URGH, this really blows chunks. It basically means MAP_LOCKED is
pointless cruft and we might as well remove it.
Why not fix it proper?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
2016-05-11 11:07 ` Peter Zijlstra
@ 2016-05-11 11:18 ` Peter Zijlstra
2016-05-11 11:32 ` Michal Hocko
1 sibling, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2016-05-11 11:18 UTC (permalink / raw)
To: Michal Hocko, Michael Kerrisk
Cc: Andrew Morton, Linus Torvalds, David Rientjes, LKML, Linux API,
linux-mm, Michal Hocko
On 05/11/2016 01:07 PM, Peter Zijlstra wrote:
> On 05/13/2015 04:38 PM, Michal Hocko wrote:
>>
>> This patch makes the semantic of MAP_LOCKED explicit and suggest using
>> mmap + mlock as the only way to guarantee no later major page faults.
>>
>
> URGH, this really blows chunks. It basically means MAP_LOCKED is
> pointless cruft and we might as well remove it.
>
> Why not fix it proper?
OK; after having been pointed at this discussion, it seems I reacted rather
too hasty in that I didn't read all the previous threads.
From that it appears fixing this proper is indeed rather hard, and we
should
indeed consider MAP_LOCKED broken. At which point I would've worded the
manpage update stronger, but alas.
Sorry for the noise.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
2016-05-11 11:07 ` Peter Zijlstra
2016-05-11 11:18 ` Peter Zijlstra
@ 2016-05-11 11:32 ` Michal Hocko
1 sibling, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2016-05-11 11:32 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Michael Kerrisk, Andrew Morton, Linus Torvalds, David Rientjes,
LKML, Linux API, linux-mm
On Wed 11-05-16 13:07:33, Peter Zijlstra wrote:
>
>
> On 05/13/2015 04:38 PM, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.cz>
> >
> > MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since
> > it has been introduced.
> > mlock(2) fails if the memory range cannot get populated to guarantee
> > that no future major faults will happen on the range. mmap(MAP_LOCKED) on
> > the other hand silently succeeds even if the range was populated only
> > partially.
> >
> > Fixing this subtle difference in the kernel is rather awkward because
> > the memory population happens after mm locks have been dropped and so
> > the cleanup before returning failure (munlock) could operate on something
> > else than the originally mapped area.
> >
> > E.g. speculative userspace page fault handler catching SEGV and doing
> > mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing
> > mmap and lead to lost data. Although it is not clear whether such a
> > usage would be valid, mmap page doesn't explicitly describe requirements
> > for threaded applications so we cannot exclude this possibility.
> >
> > This patch makes the semantic of MAP_LOCKED explicit and suggest using
> > mmap + mlock as the only way to guarantee no later major page faults.
> >
>
> URGH, this really blows chunks. It basically means MAP_LOCKED is pointless
> cruft and we might as well remove it.
Yeah, the usefulness of MAP_LOCKED is somehow reduced. Everybody who
wants the full semantic really have to use mlock(2).
> Why not fix it proper?
I have tried but it turned out to be a problem because we are dropping
mmap_sem after we initialized VMA and as Linus pointed out there
are multithreaded applications which are doing opportunistic memory
management[1]. So we would have to hold the mmap_sem for write during
the whole VMA setup + population and that doesn't seem to be worth
all the trouble when we are even not sure whether somebody relies on
MAP_LOCKED to have the hard mlock semantic.
---
[1] http://lkml.kernel.org/r/CA+55aFydkG-BgZzry5DrTzueVh9VvEcVJdLV8iOyUphQk=0vpw@mail.gmail.com
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread