linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [patch] hugepage allocator cleanup
@ 2006-01-25  9:11 Nick Piggin
  2006-01-25 15:05 ` William Lee Irwin III
  2006-01-26  3:09 ` William Lee Irwin III
  0 siblings, 2 replies; 8+ messages in thread
From: Nick Piggin @ 2006-01-25  9:11 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Andrew Morton, Linux Memory Management List

Hi,

This is a slight rework of the mechanism for allocating "fresh" hugepages.
Comments?

Thanks,
Nick

--
Insert "fresh" huge pages into the hugepage allocator by the same
means as they are freed back into it. This reduces code size and
allows enqueue_huge_page to be inlined into the hugepage free
fastpath.

Eliminate occurances of hugepages on the free list with non-zero
refcount. This can allow stricter refcount checks in future. Also
required for lockless pagecache.

Signed-off-by: Nick Piggin <npiggin@suse.de>

Index: linux-2.6/mm/hugetlb.c
===================================================================
--- linux-2.6.orig/mm/hugetlb.c
+++ linux-2.6/mm/hugetlb.c
@@ -64,7 +64,7 @@ static struct page *dequeue_huge_page(st
 	return page;
 }
 
-static struct page *alloc_fresh_huge_page(void)
+static int alloc_fresh_huge_page(void)
 {
 	static int nid = 0;
 	struct page *page;
@@ -72,12 +72,15 @@ static struct page *alloc_fresh_huge_pag
 					HUGETLB_PAGE_ORDER);
 	nid = (nid + 1) % num_online_nodes();
 	if (page) {
+		page[1].mapping = (void *)free_huge_page;
 		spin_lock(&hugetlb_lock);
 		nr_huge_pages++;
 		nr_huge_pages_node[page_to_nid(page)]++;
 		spin_unlock(&hugetlb_lock);
+		put_page(page); /* free it into the hugepage allocator */
+		return 1;
 	}
-	return page;
+	return 0;
 }
 
 void free_huge_page(struct page *page)
@@ -85,7 +88,6 @@ void free_huge_page(struct page *page)
 	BUG_ON(page_count(page));
 
 	INIT_LIST_HEAD(&page->lru);
-	page[1].mapping = NULL;
 
 	spin_lock(&hugetlb_lock);
 	enqueue_huge_page(page);
@@ -105,7 +107,6 @@ struct page *alloc_huge_page(struct vm_a
 	}
 	spin_unlock(&hugetlb_lock);
 	set_page_count(page, 1);
-	page[1].mapping = (void *)free_huge_page;
 	for (i = 0; i < (HPAGE_SIZE/PAGE_SIZE); ++i)
 		clear_highpage(&page[i]);
 	return page;
@@ -114,7 +115,6 @@ struct page *alloc_huge_page(struct vm_a
 static int __init hugetlb_init(void)
 {
 	unsigned long i;
-	struct page *page;
 
 	if (HPAGE_SHIFT == 0)
 		return 0;
@@ -123,12 +123,8 @@ static int __init hugetlb_init(void)
 		INIT_LIST_HEAD(&hugepage_freelists[i]);
 
 	for (i = 0; i < max_huge_pages; ++i) {
-		page = alloc_fresh_huge_page();
-		if (!page)
+		if (!alloc_fresh_huge_page())
 			break;
-		spin_lock(&hugetlb_lock);
-		enqueue_huge_page(page);
-		spin_unlock(&hugetlb_lock);
 	}
 	max_huge_pages = free_huge_pages = nr_huge_pages = i;
 	printk("Total HugeTLB memory allocated, %ld\n", free_huge_pages);
@@ -154,8 +150,8 @@ static void update_and_free_page(struct 
 		page[i].flags &= ~(1 << PG_locked | 1 << PG_error | 1 << PG_referenced |
 				1 << PG_dirty | 1 << PG_active | 1 << PG_reserved |
 				1 << PG_private | 1<< PG_writeback);
-		set_page_count(&page[i], 0);
 	}
+	page[1].mapping = NULL;
 	set_page_count(page, 1);
 	__free_pages(page, HUGETLB_PAGE_ORDER);
 }
@@ -188,12 +184,8 @@ static inline void try_to_free_low(unsig
 static unsigned long set_max_huge_pages(unsigned long count)
 {
 	while (count > nr_huge_pages) {
-		struct page *page = alloc_fresh_huge_page();
-		if (!page)
+		if (!alloc_fresh_huge_page())
 			return nr_huge_pages;
-		spin_lock(&hugetlb_lock);
-		enqueue_huge_page(page);
-		spin_unlock(&hugetlb_lock);
 	}
 	if (count >= nr_huge_pages)
 		return nr_huge_pages;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] hugepage allocator cleanup
  2006-01-25  9:11 [patch] hugepage allocator cleanup Nick Piggin
@ 2006-01-25 15:05 ` William Lee Irwin III
  2006-01-25 15:18   ` Nick Piggin
  2006-01-26  3:09 ` William Lee Irwin III
  1 sibling, 1 reply; 8+ messages in thread
From: William Lee Irwin III @ 2006-01-25 15:05 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, Linux Memory Management List

On Wed, Jan 25, 2006 at 10:11:03AM +0100, Nick Piggin wrote:
> This is a slight rework of the mechanism for allocating "fresh" hugepages.
> Comments?
> --
> Insert "fresh" huge pages into the hugepage allocator by the same
> means as they are freed back into it. This reduces code size and
> allows enqueue_huge_page to be inlined into the hugepage free
> fastpath.
> Eliminate occurances of hugepages on the free list with non-zero
> refcount. This can allow stricter refcount checks in future. Also
> required for lockless pagecache.

I don't really see any particular benefit to the rearrangement for
hugetlb's own sake. Explaining more about how it it's needed for the
lockless pagecache might help.


-- wli

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] hugepage allocator cleanup
  2006-01-25 15:05 ` William Lee Irwin III
@ 2006-01-25 15:18   ` Nick Piggin
  2006-01-25 16:32     ` William Lee Irwin III
  0 siblings, 1 reply; 8+ messages in thread
From: Nick Piggin @ 2006-01-25 15:18 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Nick Piggin, Andrew Morton, Linux Memory Management List

On Wed, Jan 25, 2006 at 07:05:13AM -0800, William Lee Irwin III wrote:
> On Wed, Jan 25, 2006 at 10:11:03AM +0100, Nick Piggin wrote:
> > This is a slight rework of the mechanism for allocating "fresh" hugepages.
> > Comments?
> > --
> > Insert "fresh" huge pages into the hugepage allocator by the same
> > means as they are freed back into it. This reduces code size and
> > allows enqueue_huge_page to be inlined into the hugepage free
> > fastpath.
> > Eliminate occurances of hugepages on the free list with non-zero
> > refcount. This can allow stricter refcount checks in future. Also
> > required for lockless pagecache.
> 
> I don't really see any particular benefit to the rearrangement for
> hugetlb's own sake.

I like the fact that freelist pages always have a zero refcount now.
And I thought it was quite neat to set up the page for the new allocator
then free straight into it... but you're right, there is no *particular*
benefit ;)

> Explaining more about how it it's needed for the
> lockless pagecache might help.
> 

OK. Though obviously I don't want to introduce any ugliness or
hackery to core code just for lockless pagecache (which may not
ever go in itself).

Basically with lockless pagecache it becomes possible that any
page taken out of the page allocator may have its refcount raised
by another thread.

That other thread is looking for a pagecache page, if it takes
a reused one, it will quickly drop the refcount again.

So it is important not to muddle the count.  A 1->0 transition
(as currently when allocating fresh pages for the first time)
needs to be done with a dec-and-test rather than a plain set.

Thanks,
Nick

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] hugepage allocator cleanup
  2006-01-25 15:18   ` Nick Piggin
@ 2006-01-25 16:32     ` William Lee Irwin III
  2006-01-25 16:52       ` Nick Piggin
  0 siblings, 1 reply; 8+ messages in thread
From: William Lee Irwin III @ 2006-01-25 16:32 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, Linux Memory Management List

On Wed, Jan 25, 2006 at 04:18:46PM +0100, Nick Piggin wrote:
> OK. Though obviously I don't want to introduce any ugliness or
> hackery to core code just for lockless pagecache (which may not
> ever go in itself).
> Basically with lockless pagecache it becomes possible that any
> page taken out of the page allocator may have its refcount raised
> by another thread.
> That other thread is looking for a pagecache page, if it takes
> a reused one, it will quickly drop the refcount again.
> So it is important not to muddle the count.  A 1->0 transition
> (as currently when allocating fresh pages for the first time)
> needs to be done with a dec-and-test rather than a plain set.

Preparatory cleanups are fine by me barring where things get to the
point of churn, which isn't a concern here.

It appears the crucial component of this update_and_free_page(). It
shouldn't be necessary as disciplined page->_count references are
redirected to the head of the hugepage, but it's trying to clean up the
page->_counts in tail pages of the hugepage in preparation for freeing.
Arguably 1->0 transition logic shouldn't be triggered, but the locking
protocol envisioned may not allow unconditionally setting page->_count.

Just yanking the page refcount affairs out of update_and_free_page()
should suffice. Could I get things trimmed down to that?


-- wli

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] hugepage allocator cleanup
  2006-01-25 16:32     ` William Lee Irwin III
@ 2006-01-25 16:52       ` Nick Piggin
  2006-01-26  3:04         ` William Lee Irwin III
  0 siblings, 1 reply; 8+ messages in thread
From: Nick Piggin @ 2006-01-25 16:52 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Nick Piggin, Andrew Morton, Linux Memory Management List

On Wed, Jan 25, 2006 at 08:32:43AM -0800, William Lee Irwin III wrote:
> 
> Preparatory cleanups are fine by me barring where things get to the
> point of churn, which isn't a concern here.
> 

Cool.

> It appears the crucial component of this update_and_free_page(). It
> shouldn't be necessary as disciplined page->_count references are
> redirected to the head of the hugepage, but it's trying to clean up the
> page->_counts in tail pages of the hugepage in preparation for freeing.

Yes, that is the crucial part.

> Arguably 1->0 transition logic shouldn't be triggered, but the locking
> protocol envisioned may not allow unconditionally setting page->_count.
> 

Unfortunately yes I wasn't clear I guess. Any page with a nonzero
refcount must not be unconditionally set.

> Just yanking the page refcount affairs out of update_and_free_page()
> should suffice. Could I get things trimmed down to that?
> 

I could remove the first set_page_count, and make the second conditional
on the page having a zero refcount... for a 3-liner. But that's kind of
ugly (if less intrusive), and it is adds seemingly nonsense code if one
doesn't have the context of my out-of-tree patches.

Hmm... it's obviously not 2.6.16 material so there is no rush to think
it over. It is even simple enough that I don't mind carrying with my
patchset indefinitely.

Thanks,
Nick

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] hugepage allocator cleanup
  2006-01-25 16:52       ` Nick Piggin
@ 2006-01-26  3:04         ` William Lee Irwin III
  2006-01-26 14:19           ` Nick Piggin
  0 siblings, 1 reply; 8+ messages in thread
From: William Lee Irwin III @ 2006-01-26  3:04 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, Linux Memory Management List

On Wed, Jan 25, 2006 at 08:32:43AM -0800, William Lee Irwin III wrote:
>> Just yanking the page refcount affairs out of update_and_free_page()
>> should suffice. Could I get things trimmed down to that?
>> 

On Wed, Jan 25, 2006 at 05:52:08PM +0100, Nick Piggin wrote:
> I could remove the first set_page_count, and make the second conditional
> on the page having a zero refcount... for a 3-liner. But that's kind of
> ugly (if less intrusive), and it is adds seemingly nonsense code if one
> doesn't have the context of my out-of-tree patches.
> Hmm... it's obviously not 2.6.16 material so there is no rush to think
> it over. It is even simple enough that I don't mind carrying with my
> patchset indefinitely.

After I thought about it, alloc_fresh_huge_page() does enqueue pages
with refcounts of 1, where free_huge_page() (called from the freeing
hook in page[1].mapping) enqueues pages with refcounts of 0, so it
would actually make sense (and possibly prevent leaks) to take the
whole patch as-is.


-- wli

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] hugepage allocator cleanup
  2006-01-25  9:11 [patch] hugepage allocator cleanup Nick Piggin
  2006-01-25 15:05 ` William Lee Irwin III
@ 2006-01-26  3:09 ` William Lee Irwin III
  1 sibling, 0 replies; 8+ messages in thread
From: William Lee Irwin III @ 2006-01-26  3:09 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, Linux Memory Management List

On Wed, Jan 25, 2006 at 10:11:03AM +0100, Nick Piggin wrote:
> This is a slight rework of the mechanism for allocating "fresh" hugepages.
> Comments?
> --
> Insert "fresh" huge pages into the hugepage allocator by the same
> means as they are freed back into it. This reduces code size and
> allows enqueue_huge_page to be inlined into the hugepage free
> fastpath.
> Eliminate occurances of hugepages on the free list with non-zero
> refcount. This can allow stricter refcount checks in future. Also
> required for lockless pagecache.
> Signed-off-by: Nick Piggin <npiggin@suse.de>

This patch also eliminates a leak "cleaned up" by re-clobbering the
refcount on every allocation from the hugepage freelists. With respect to
the lockless pagecache, the crucial aspect is to eliminate unconditional
set_page_count() to 0 on pages with potentially nonzero refcounts, though
closer inspection suggests the assignments removed are entirely spurious.

Acked-by: William Irwin <wli@holomorphy.com>


-- wli

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] hugepage allocator cleanup
  2006-01-26  3:04         ` William Lee Irwin III
@ 2006-01-26 14:19           ` Nick Piggin
  0 siblings, 0 replies; 8+ messages in thread
From: Nick Piggin @ 2006-01-26 14:19 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Nick Piggin, Andrew Morton, Linux Memory Management List

On Wed, Jan 25, 2006 at 07:04:24PM -0800, William Lee Irwin III wrote:
> On Wed, Jan 25, 2006 at 08:32:43AM -0800, William Lee Irwin III wrote:
> >> Just yanking the page refcount affairs out of update_and_free_page()
> >> should suffice. Could I get things trimmed down to that?
> >> 
> 
> On Wed, Jan 25, 2006 at 05:52:08PM +0100, Nick Piggin wrote:
> > I could remove the first set_page_count, and make the second conditional
> > on the page having a zero refcount... for a 3-liner. But that's kind of
> > ugly (if less intrusive), and it is adds seemingly nonsense code if one
> > doesn't have the context of my out-of-tree patches.
> > Hmm... it's obviously not 2.6.16 material so there is no rush to think
> > it over. It is even simple enough that I don't mind carrying with my
> > patchset indefinitely.
> 
> After I thought about it, alloc_fresh_huge_page() does enqueue pages
> with refcounts of 1, where free_huge_page() (called from the freeing
> hook in page[1].mapping) enqueues pages with refcounts of 0, so it

Yep.

> would actually make sense (and possibly prevent leaks) to take the
> whole patch as-is.
> 
> 

Yeah, you could add a bad_page-like check to verify the refcount
hasn't been mucked with... and of course the regular allocator can
now verify refcounts are alright when update_and_free_page returns
them.

Nick

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-01-26 14:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-25  9:11 [patch] hugepage allocator cleanup Nick Piggin
2006-01-25 15:05 ` William Lee Irwin III
2006-01-25 15:18   ` Nick Piggin
2006-01-25 16:32     ` William Lee Irwin III
2006-01-25 16:52       ` Nick Piggin
2006-01-26  3:04         ` William Lee Irwin III
2006-01-26 14:19           ` Nick Piggin
2006-01-26  3:09 ` William Lee Irwin III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox