* [PATCH] hugetlbfs: Kill applications that use MAP_NORESERVE with SIGBUS instead of OOM-killer
@ 2010-04-20 17:44 Mel Gorman
2010-04-20 23:33 ` Andrew Morton
0 siblings, 1 reply; 3+ messages in thread
From: Mel Gorman @ 2010-04-20 17:44 UTC (permalink / raw)
To: Andrew Morton
Cc: Lee Schermerhorn, David Rientjes, Andi Kleen, linux-kernel, linux-mm
Ordinarily, application using hugetlbfs will create mappings with
reserves. For shared mappings, these pages are reserved before mmap()
returns success and for private mappings, the caller process is
guaranteed and a child process that cannot get the pages gets killed
with sigbus.
An application that uses MAP_NORESERVE gets no reservations and mmap()
will always succeed at the risk the page will not be available at fault
time. This might be used for example on very large sparse mappings where the
developer is confident the necessary huge pages exist to satisfy all faults
even though the whole mapping cannot be backed by huge pages. Unfortunately,
if an allocation does fail, VM_FAULT_OOM is returned to the fault handler
which proceeds to trigger the OOM-killer. This is unhelpful.
This patch alters hugetlbfs to kill a process that uses MAP_NORESERVE
where huge pages were not available with SIGBUS instead of triggering
the OOM killer.
This patch if accepted should also be considered a -stable candidate.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
mm/hugetlb.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6034dc9..af2d907 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1038,7 +1038,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
page = alloc_buddy_huge_page(h, vma, addr);
if (!page) {
hugetlb_put_quota(inode->i_mapping, chg);
- return ERR_PTR(-VM_FAULT_OOM);
+ return ERR_PTR(-VM_FAULT_SIGBUS);
}
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] hugetlbfs: Kill applications that use MAP_NORESERVE with SIGBUS instead of OOM-killer
2010-04-20 17:44 [PATCH] hugetlbfs: Kill applications that use MAP_NORESERVE with SIGBUS instead of OOM-killer Mel Gorman
@ 2010-04-20 23:33 ` Andrew Morton
2010-04-21 9:27 ` Mel Gorman
0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2010-04-20 23:33 UTC (permalink / raw)
To: Mel Gorman
Cc: Lee Schermerhorn, David Rientjes, Andi Kleen, linux-kernel, linux-mm
On Tue, 20 Apr 2010 18:44:07 +0100
Mel Gorman <mel@csn.ul.ie> wrote:
> Ordinarily, application using hugetlbfs will create mappings with
> reserves. For shared mappings, these pages are reserved before mmap()
> returns success and for private mappings, the caller process is
> guaranteed and a child process that cannot get the pages gets killed
> with sigbus.
>
> An application that uses MAP_NORESERVE gets no reservations and mmap()
> will always succeed at the risk the page will not be available at fault
> time. This might be used for example on very large sparse mappings where the
> developer is confident the necessary huge pages exist to satisfy all faults
> even though the whole mapping cannot be backed by huge pages. Unfortunately,
> if an allocation does fail, VM_FAULT_OOM is returned to the fault handler
> which proceeds to trigger the OOM-killer. This is unhelpful.
>
> This patch alters hugetlbfs to kill a process that uses MAP_NORESERVE
> where huge pages were not available with SIGBUS instead of triggering
> the OOM killer.
>
> This patch if accepted should also be considered a -stable candidate.
Why? The changelog doesn't convey much seriousness?
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> ---
> mm/hugetlb.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 6034dc9..af2d907 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1038,7 +1038,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
> page = alloc_buddy_huge_page(h, vma, addr);
> if (!page) {
> hugetlb_put_quota(inode->i_mapping, chg);
> - return ERR_PTR(-VM_FAULT_OOM);
> + return ERR_PTR(-VM_FAULT_SIGBUS);
> }
> }
>
This affects hugetlb_cow() as well?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] hugetlbfs: Kill applications that use MAP_NORESERVE with SIGBUS instead of OOM-killer
2010-04-20 23:33 ` Andrew Morton
@ 2010-04-21 9:27 ` Mel Gorman
0 siblings, 0 replies; 3+ messages in thread
From: Mel Gorman @ 2010-04-21 9:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Lee Schermerhorn, David Rientjes, Andi Kleen, linux-kernel, linux-mm
On Tue, Apr 20, 2010 at 04:33:07PM -0700, Andrew Morton wrote:
> On Tue, 20 Apr 2010 18:44:07 +0100
> Mel Gorman <mel@csn.ul.ie> wrote:
>
> > Ordinarily, application using hugetlbfs will create mappings with
> > reserves. For shared mappings, these pages are reserved before mmap()
> > returns success and for private mappings, the caller process is
> > guaranteed and a child process that cannot get the pages gets killed
> > with sigbus.
> >
> > An application that uses MAP_NORESERVE gets no reservations and mmap()
> > will always succeed at the risk the page will not be available at fault
> > time. This might be used for example on very large sparse mappings where the
> > developer is confident the necessary huge pages exist to satisfy all faults
> > even though the whole mapping cannot be backed by huge pages. Unfortunately,
> > if an allocation does fail, VM_FAULT_OOM is returned to the fault handler
> > which proceeds to trigger the OOM-killer. This is unhelpful.
> >
> > This patch alters hugetlbfs to kill a process that uses MAP_NORESERVE
> > where huge pages were not available with SIGBUS instead of triggering
> > the OOM killer.
> >
> > This patch if accepted should also be considered a -stable candidate.
>
> Why? The changelog doesn't convey much seriousness?
>
Because even without hugetlbfs mounted, a user using mmap() can trivially
trigger the OOM-killer because VM_FAULT_OOM is returned (will provide example
program if you like, it's a whopping 24 lines long). It could be considered
a DOS available to an unprivileged user.
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > ---
> > mm/hugetlb.c | 2 +-
> > 1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 6034dc9..af2d907 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1038,7 +1038,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
> > page = alloc_buddy_huge_page(h, vma, addr);
> > if (!page) {
> > hugetlb_put_quota(inode->i_mapping, chg);
> > - return ERR_PTR(-VM_FAULT_OOM);
> > + return ERR_PTR(-VM_FAULT_SIGBUS);
> > }
> > }
> >
>
> This affects hugetlb_cow() as well?
>
Yes. I feel there is a failure case in there, but I didn't create one.
It would need a fairly specific target in terms of the faulting application
and the hugepage pool size. The hugetlb_no_page path is much easier to hit
but both might as well be closed.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-04-21 9:27 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-20 17:44 [PATCH] hugetlbfs: Kill applications that use MAP_NORESERVE with SIGBUS instead of OOM-killer Mel Gorman
2010-04-20 23:33 ` Andrew Morton
2010-04-21 9:27 ` Mel Gorman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox