From: 'David Gibson' <david@gibson.dropbear.id.au>
To: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Cc: wli@holomorphy.com, 'Andrew Morton' <akpm@osdl.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [patch] hugetlb strict commit accounting
Date: Thu, 9 Mar 2006 23:54:41 +1100 [thread overview]
Message-ID: <20060309125441.GE9479@localhost.localdomain> (raw)
In-Reply-To: <200603091231.k29CV9g20079@unix-os.sc.intel.com>
On Thu, Mar 09, 2006 at 04:31:11AM -0800, Chen, Kenneth W wrote:
> David Gibson wrote on Thursday, March 09, 2006 4:07 AM
> > > Well, the reservation is already done at mmap time for shared mapping. Why
> > > does kernel need to do anything at fault time? Doing it at fault time is
> > > an indication of weakness (or brokenness) - you already promised at mmap
> > > time that there will be a page available for faulting. Why check them
> > > again at fault time?
> >
> > You can't know (or bound) at mmap() time how many pages a PRIVATE
> > mapping will take (because of fork()). So unless you have a test at
> > fault time (essentialy deciding whether to draw from "reserved" and
> > "unreserved" hugepage pool) a supposedly reserved SHARED mapping will
> > OOM later if there have been enough COW faults to use up all the
> > hugepages before it's instantiated.
>
> I see. But that is easy to fix. I just need to do exactly the same
> thing as what you did to alloc_huge_page. I will then need to change
> definition of 'reservation' to needs-in-the future (also an easy thing
> to change).
Well.. except that then you *do* need to traverse the page cache on
truncate(), just like I do. Note that in my latest revision,
hugetlb_extend_reservation() no longer walks the radix tree, only
hugetlb_truncate_reservation() does (extend *does* still take the
tree_lock, an oversight which I will send a patch for tomorrow).
(Oh, and you'll need to walk the reserved range list in
alloc_huge_page(), rather than one comparison like I have. Although
in practice I imagine there will never be more than one entry on the
list, so I guess that doesn't really matter)
> The real question or discussion I want to bring up is whether kernel
> should do it's own accounting or relying on traversing the page cache.
> My opinion is that kernel should do it's own accounting because it is
> simpler: you just need to do that at mmap and ftruncate time.
And as we've seen above, a little bit at fault time. Which would be
exactly the same three places that my patch adds accounting. I'm
quite willing to be convinced your patch is the better approach, but
this isn't an argument for it.
Incidentally, I've just realised that removing the dodgy heuristic and
allowing unconstrained overcommit for PRIVATE mappings (which both our
patches do) is potentially problematic. In particular it means my
hugepage malloc() implementation will always OOM rather than fallback
to normal pages :( (I believe currently it will usually fall back, and
only OOM if you get unlucky with the timing).
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-03-09 12:54 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-03-09 10:55 Chen, Kenneth W
2006-03-09 11:26 ` 'David Gibson'
2006-03-09 11:43 ` Chen, Kenneth W
2006-03-09 12:06 ` 'David Gibson'
2006-03-09 12:31 ` Chen, Kenneth W
2006-03-09 12:54 ` 'David Gibson' [this message]
2006-03-09 12:02 ` Chen, Kenneth W
2006-03-09 12:14 ` 'David Gibson'
2006-03-09 12:14 Chen, Kenneth W
2006-03-10 0:45 Chen, Kenneth W
2006-03-10 2:38 ` 'David Gibson'
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060309125441.GE9479@localhost.localdomain \
--to=david@gibson.dropbear.id.au \
--cc=akpm@osdl.org \
--cc=kenneth.w.chen@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox