linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@osdl.org>,
	Ken Chen <kenneth.w.chen@intel.com>,
	Bill Irwin <wli@holomorphy.com>, Adam Litke <agl@us.ibm.com>,
	linux-mm@kvack.org
Subject: Re: [PATCH 3/3] hugetlb: fix absurd HugePages_Rsvd
Date: Wed, 25 Oct 2006 16:26:10 +1000	[thread overview]
Message-ID: <20061025062610.GB2330@localhost.localdomain> (raw)
In-Reply-To: <Pine.LNX.4.64.0610250335530.30678@blonde.wat.veritas.com>

On Wed, Oct 25, 2006 at 03:38:24AM +0100, Hugh Dickins wrote:
> If you truncated an mmap'ed hugetlbfs file, then faulted on the truncated
> area, /proc/meminfo's HugePages_Rsvd wrapped hugely "negative".  Reinstate
> my preliminary i_size check before attempting to allocate the page (though
> this only fixes the most obvious case: more work will be needed here).
> 
> Signed-off-by: Hugh Dickins <hugh@veritas.com>
> ___
> 
> This is not a complete solution (what if hugetlb_no_page is actually
> racing with truncate_hugepages?), and there are several other accounting
> anomalies in here (private versus shared pages, hugetlbfs quota handling);
> but those all need more thought.  It'll probably make sense to use i_mutex
> instead of hugetlb_instantiation_mutex, so locking out truncation
> and mmap.

Ah, yes.  I also encountered this one a few days ago - I found it in
the context of deserializing the hugepage fault path, which makes the
problem worse, and forgot to consider if there was also a problem in
the original case.

In fact, there's a second problem with the current location of the
i_size check.  As well as wrapping the reserved count, if there's a
fault on a truncated area and the hugepage pool is also empty, we can
get an OOM SIGKILL instead of the correct SIGBUS.

I don't things are quite as bad as you fear, though:  I believe the
page lock protects us against racing concurrent truncations (this is
one reason we have find_lock_page() here, rather than the
find_get_page() which appears in the analagous normal page path).

I suggest the slightly revised patch below, which doesn't duplicate
the i_size test, and cleans up the backout path (removing a
no-longer-useful goto label) in the process.

hugepage: Correct i_size test to fix some corner cases

If you truncated an mmap'ed hugetlbfs file, then faulted on the
truncated area, /proc/meminfo's HugePages_Rsvd wrapped hugely
"negative".  In addition, faulting on the truncated area when the
hugepage pool was also exhausted could result in a VM OOM SIGKILL
instead of the correct SIGBUS.

Correct these by moving the i_size check to before the allocation of a
new page.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

 mm/hugetlb.c |   36 +++++++++++++++++++++++-------------
 1 file changed, 23 insertions(+), 13 deletions(-)

Index: working-2.6/mm/hugetlb.c
===================================================================
--- working-2.6.orig/mm/hugetlb.c	2006-10-25 16:04:12.000000000 +1000
+++ working-2.6/mm/hugetlb.c	2006-10-25 16:18:17.000000000 +1000
@@ -477,6 +477,22 @@ int hugetlb_no_page(struct mm_struct *mm
 	 */
 retry:
 	page = find_lock_page(mapping, idx);
+	/* In the case the page exists, we want to lock it before we
+	 * check against i_size to guard against racing truncations.
+	 * In the case it doesn't exist, we have to check against
+	 * i_size before attempting to allocate a page, or we could
+	 * get the wrong error if we're also out of hugepages in the
+	 * pool (OOM instead of SIGBUS).  So the i_size test has to go
+	 * in this slightly odd location. */
+	size = i_size_read(mapping->host) >> HPAGE_SHIFT;
+	if (idx >= size) {
+		hugetlb_put_quota(mapping);
+		if (page) {
+			unlock_page(page);
+			put_page(page);
+		}
+		return VM_FAULT_SIGBUS;
+	}
 	if (!page) {
 		if (hugetlb_get_quota(mapping))
 			goto out;
@@ -504,13 +520,14 @@ retry:
 	}
 
 	spin_lock(&mm->page_table_lock);
-	size = i_size_read(mapping->host) >> HPAGE_SHIFT;
-	if (idx >= size)
-		goto backout;
-
 	ret = VM_FAULT_MINOR;
-	if (!pte_none(*ptep))
-		goto backout;
+	if (!pte_none(*ptep)) {
+		spin_unlock(&mm->page_table_lock);
+		hugetlb_put_quota(mapping);
+		unlock_page(page);
+		put_page(page);
+		goto out;
+	}
 
 	add_mm_counter(mm, file_rss, HPAGE_SIZE / PAGE_SIZE);
 	new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE)
@@ -526,13 +543,6 @@ retry:
 	unlock_page(page);
 out:
 	return ret;
-
-backout:
-	spin_unlock(&mm->page_table_lock);
-	hugetlb_put_quota(mapping);
-	unlock_page(page);
-	put_page(page);
-	goto out;
 }
 
 int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,


-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2006-10-25  6:26 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-25  2:31 [PATCH 1/3] hugetlb: fix size=4G parsing Hugh Dickins
2006-10-25  2:35 ` [PATCH 2/3] hugetlb: fix prio_tree unit Hugh Dickins
2006-10-25  7:08   ` David Gibson
2006-10-25  7:41     ` Hugh Dickins
2006-10-25 23:49       ` Chen, Kenneth W
2006-10-26  3:47         ` David Gibson
2006-10-26  6:15           ` Chen, Kenneth W
2006-10-26  7:55           ` Hugh Dickins
2006-10-26  8:13           ` Hugh Dickins
2006-10-26 10:42             ` David Gibson
2006-10-25  2:38 ` [PATCH 3/3] hugetlb: fix absurd HugePages_Rsvd Hugh Dickins
2006-10-25  5:23   ` Mika Penttilä
2006-10-25  5:52     ` David Gibson
2006-10-25  7:27       ` Hugh Dickins
2006-10-25  6:26   ` David Gibson [this message]
2006-10-25  6:29     ` David Gibson
2006-10-25  8:39     ` Hugh Dickins
2006-10-25 10:09       ` David Gibson
2006-10-26  3:59         ` Chen, Kenneth W
2006-10-26  4:13           ` 'David Gibson'
2006-10-26 19:08           ` Christoph Lameter
2006-10-26 19:19             ` Chen, Kenneth W
2006-10-26 20:59               ` Christoph Lameter
2006-10-26 22:19               ` 'David Gibson'
2006-10-25 21:31     ` Adam Litke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061025062610.GB2330@localhost.localdomain \
    --to=david@gibson.dropbear.id.au \
    --cc=agl@us.ibm.com \
    --cc=akpm@osdl.org \
    --cc=hugh@veritas.com \
    --cc=kenneth.w.chen@intel.com \
    --cc=linux-mm@kvack.org \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox