From: David Gibson <david@gibson.dropbear.id.au>
To: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@osdl.org>,
Ken Chen <kenneth.w.chen@intel.com>,
Bill Irwin <wli@holomorphy.com>, Adam Litke <agl@us.ibm.com>,
linux-mm@kvack.org
Subject: Re: [PATCH 3/3] hugetlb: fix absurd HugePages_Rsvd
Date: Wed, 25 Oct 2006 16:26:10 +1000 [thread overview]
Message-ID: <20061025062610.GB2330@localhost.localdomain> (raw)
In-Reply-To: <Pine.LNX.4.64.0610250335530.30678@blonde.wat.veritas.com>
On Wed, Oct 25, 2006 at 03:38:24AM +0100, Hugh Dickins wrote:
> If you truncated an mmap'ed hugetlbfs file, then faulted on the truncated
> area, /proc/meminfo's HugePages_Rsvd wrapped hugely "negative". Reinstate
> my preliminary i_size check before attempting to allocate the page (though
> this only fixes the most obvious case: more work will be needed here).
>
> Signed-off-by: Hugh Dickins <hugh@veritas.com>
> ___
>
> This is not a complete solution (what if hugetlb_no_page is actually
> racing with truncate_hugepages?), and there are several other accounting
> anomalies in here (private versus shared pages, hugetlbfs quota handling);
> but those all need more thought. It'll probably make sense to use i_mutex
> instead of hugetlb_instantiation_mutex, so locking out truncation
> and mmap.
Ah, yes. I also encountered this one a few days ago - I found it in
the context of deserializing the hugepage fault path, which makes the
problem worse, and forgot to consider if there was also a problem in
the original case.
In fact, there's a second problem with the current location of the
i_size check. As well as wrapping the reserved count, if there's a
fault on a truncated area and the hugepage pool is also empty, we can
get an OOM SIGKILL instead of the correct SIGBUS.
I don't things are quite as bad as you fear, though: I believe the
page lock protects us against racing concurrent truncations (this is
one reason we have find_lock_page() here, rather than the
find_get_page() which appears in the analagous normal page path).
I suggest the slightly revised patch below, which doesn't duplicate
the i_size test, and cleans up the backout path (removing a
no-longer-useful goto label) in the process.
hugepage: Correct i_size test to fix some corner cases
If you truncated an mmap'ed hugetlbfs file, then faulted on the
truncated area, /proc/meminfo's HugePages_Rsvd wrapped hugely
"negative". In addition, faulting on the truncated area when the
hugepage pool was also exhausted could result in a VM OOM SIGKILL
instead of the correct SIGBUS.
Correct these by moving the i_size check to before the allocation of a
new page.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
mm/hugetlb.c | 36 +++++++++++++++++++++++-------------
1 file changed, 23 insertions(+), 13 deletions(-)
Index: working-2.6/mm/hugetlb.c
===================================================================
--- working-2.6.orig/mm/hugetlb.c 2006-10-25 16:04:12.000000000 +1000
+++ working-2.6/mm/hugetlb.c 2006-10-25 16:18:17.000000000 +1000
@@ -477,6 +477,22 @@ int hugetlb_no_page(struct mm_struct *mm
*/
retry:
page = find_lock_page(mapping, idx);
+ /* In the case the page exists, we want to lock it before we
+ * check against i_size to guard against racing truncations.
+ * In the case it doesn't exist, we have to check against
+ * i_size before attempting to allocate a page, or we could
+ * get the wrong error if we're also out of hugepages in the
+ * pool (OOM instead of SIGBUS). So the i_size test has to go
+ * in this slightly odd location. */
+ size = i_size_read(mapping->host) >> HPAGE_SHIFT;
+ if (idx >= size) {
+ hugetlb_put_quota(mapping);
+ if (page) {
+ unlock_page(page);
+ put_page(page);
+ }
+ return VM_FAULT_SIGBUS;
+ }
if (!page) {
if (hugetlb_get_quota(mapping))
goto out;
@@ -504,13 +520,14 @@ retry:
}
spin_lock(&mm->page_table_lock);
- size = i_size_read(mapping->host) >> HPAGE_SHIFT;
- if (idx >= size)
- goto backout;
-
ret = VM_FAULT_MINOR;
- if (!pte_none(*ptep))
- goto backout;
+ if (!pte_none(*ptep)) {
+ spin_unlock(&mm->page_table_lock);
+ hugetlb_put_quota(mapping);
+ unlock_page(page);
+ put_page(page);
+ goto out;
+ }
add_mm_counter(mm, file_rss, HPAGE_SIZE / PAGE_SIZE);
new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE)
@@ -526,13 +543,6 @@ retry:
unlock_page(page);
out:
return ret;
-
-backout:
- spin_unlock(&mm->page_table_lock);
- hugetlb_put_quota(mapping);
- unlock_page(page);
- put_page(page);
- goto out;
}
int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-10-25 6:26 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-10-25 2:31 [PATCH 1/3] hugetlb: fix size=4G parsing Hugh Dickins
2006-10-25 2:35 ` [PATCH 2/3] hugetlb: fix prio_tree unit Hugh Dickins
2006-10-25 7:08 ` David Gibson
2006-10-25 7:41 ` Hugh Dickins
2006-10-25 23:49 ` Chen, Kenneth W
2006-10-26 3:47 ` David Gibson
2006-10-26 6:15 ` Chen, Kenneth W
2006-10-26 7:55 ` Hugh Dickins
2006-10-26 8:13 ` Hugh Dickins
2006-10-26 10:42 ` David Gibson
2006-10-25 2:38 ` [PATCH 3/3] hugetlb: fix absurd HugePages_Rsvd Hugh Dickins
2006-10-25 5:23 ` Mika Penttilä
2006-10-25 5:52 ` David Gibson
2006-10-25 7:27 ` Hugh Dickins
2006-10-25 6:26 ` David Gibson [this message]
2006-10-25 6:29 ` David Gibson
2006-10-25 8:39 ` Hugh Dickins
2006-10-25 10:09 ` David Gibson
2006-10-26 3:59 ` Chen, Kenneth W
2006-10-26 4:13 ` 'David Gibson'
2006-10-26 19:08 ` Christoph Lameter
2006-10-26 19:19 ` Chen, Kenneth W
2006-10-26 20:59 ` Christoph Lameter
2006-10-26 22:19 ` 'David Gibson'
2006-10-25 21:31 ` Adam Litke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20061025062610.GB2330@localhost.localdomain \
--to=david@gibson.dropbear.id.au \
--cc=agl@us.ibm.com \
--cc=akpm@osdl.org \
--cc=hugh@veritas.com \
--cc=kenneth.w.chen@intel.com \
--cc=linux-mm@kvack.org \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox