* [patch] mm: pagecache insertion fewer atomics
@ 2008-08-18 12:24 Nick Piggin
2008-08-18 12:25 ` [patch] mm: unlockless reclaim Nick Piggin
` (4 more replies)
0 siblings, 5 replies; 27+ messages in thread
From: Nick Piggin @ 2008-08-18 12:24 UTC (permalink / raw)
To: Andrew Morton, Linux Memory Management List
Setting and clearing the page locked when inserting it into swapcache /
pagecache when it has no other references can use non-atomic page flags
operations because no other CPU may be operating on it at this time.
This saves one atomic operation when inserting a page into pagecache.
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
include/linux/pagemap.h | 37 +++++++++++++++++++++++++++++++++----
mm/swap_state.c | 4 ++--
2 files changed, 35 insertions(+), 6 deletions(-)
Index: linux-2.6/mm/swap_state.c
===================================================================
--- linux-2.6.orig/mm/swap_state.c
+++ linux-2.6/mm/swap_state.c
@@ -302,7 +302,7 @@ struct page *read_swap_cache_async(swp_e
* re-using the just freed swap entry for an existing page.
* May fail (-ENOMEM) if radix-tree node allocation failed.
*/
- set_page_locked(new_page);
+ __set_page_locked(new_page);
err = add_to_swap_cache(new_page, entry, gfp_mask & GFP_KERNEL);
if (likely(!err)) {
/*
@@ -312,7 +312,7 @@ struct page *read_swap_cache_async(swp_e
swap_readpage(NULL, new_page);
return new_page;
}
- clear_page_locked(new_page);
+ __clear_page_locked(new_page);
swap_free(entry);
} while (err != -ENOMEM);
Index: linux-2.6/include/linux/pagemap.h
===================================================================
--- linux-2.6.orig/include/linux/pagemap.h
+++ linux-2.6/include/linux/pagemap.h
@@ -271,14 +271,14 @@ extern int __lock_page_killable(struct p
extern void __lock_page_nosync(struct page *page);
extern void unlock_page(struct page *page);
-static inline void set_page_locked(struct page *page)
+static inline void __set_page_locked(struct page *page)
{
- set_bit(PG_locked, &page->flags);
+ __set_bit(PG_locked, &page->flags);
}
-static inline void clear_page_locked(struct page *page)
+static inline void __clear_page_locked(struct page *page)
{
- clear_bit(PG_locked, &page->flags);
+ __clear_bit(PG_locked, &page->flags);
}
static inline int trylock_page(struct page *page)
@@ -410,17 +410,17 @@ extern void __remove_from_page_cache(str
/*
* Like add_to_page_cache_locked, but used to add newly allocated pages:
- * the page is new, so we can just run set_page_locked() against it.
+ * the page is new, so we can just run __set_page_locked() against it.
*/
static inline int add_to_page_cache(struct page *page,
struct address_space *mapping, pgoff_t offset, gfp_t gfp_mask)
{
int error;
- set_page_locked(page);
+ __set_page_locked(page);
error = add_to_page_cache_locked(page, mapping, offset, gfp_mask);
if (unlikely(error))
- clear_page_locked(page);
+ __clear_page_locked(page);
return error;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* [patch] mm: unlockless reclaim
2008-08-18 12:24 [patch] mm: pagecache insertion fewer atomics Nick Piggin
@ 2008-08-18 12:25 ` Nick Piggin
2008-08-19 5:09 ` KOSAKI Motohiro
2008-08-18 12:28 ` [patch] mm: page lock use lock bitops Nick Piggin
` (3 subsequent siblings)
4 siblings, 1 reply; 27+ messages in thread
From: Nick Piggin @ 2008-08-18 12:25 UTC (permalink / raw)
To: Andrew Morton, Linux Memory Management List
unlock_page is fairly expensive. It can be avoided in page reclaim success
path. By definition if we have any other references to the page it would
be a bug anyway.
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
mm/vmscan.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -637,7 +637,14 @@ static unsigned long shrink_page_list(st
if (!mapping || !__remove_mapping(mapping, page))
goto keep_locked;
- unlock_page(page);
+ /*
+ * At this point, we have no other references and there is
+ * no way to pick any more up (removed from LRU, removed
+ * from pagecache). Can use non-atomic bitops now (and
+ * we obviously don't have to worry about waking up a process
+ * waiting on the page lock, because there are no references.
+ */
+ __clear_page_locked(page);
free_it:
nr_reclaimed++;
if (!pagevec_add(&freed_pvec, page)) {
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* [patch] mm: page lock use lock bitops
2008-08-18 12:24 [patch] mm: pagecache insertion fewer atomics Nick Piggin
2008-08-18 12:25 ` [patch] mm: unlockless reclaim Nick Piggin
@ 2008-08-18 12:28 ` Nick Piggin
2008-08-18 12:29 ` [patch] fs: buffer " Nick Piggin
` (2 subsequent siblings)
4 siblings, 0 replies; 27+ messages in thread
From: Nick Piggin @ 2008-08-18 12:28 UTC (permalink / raw)
To: Andrew Morton, Linux Memory Management List
mm: use lock bitops for page lock
trylock_page, unlock_page open and close a critical section. Hence,
we can use the lock bitops to get the desired memory ordering.
Also, mark trylock as likely to succeed (and remove the annotation from
callers).
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
include/linux/pagemap.h | 2 +-
mm/filemap.c | 13 +++++--------
mm/swapfile.c | 2 +-
3 files changed, 7 insertions(+), 10 deletions(-)
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -557,17 +557,14 @@ EXPORT_SYMBOL(wait_on_page_bit);
* mechananism between PageLocked pages and PageWriteback pages is shared.
* But that's OK - sleepers in wait_on_page_writeback() just go back to sleep.
*
- * The first mb is necessary to safely close the critical section opened by the
- * test_and_set_bit() to lock the page; the second mb is necessary to enforce
- * ordering between the clear_bit and the read of the waitqueue (to avoid SMP
- * races with a parallel wait_on_page_locked()).
+ * The mb is necessary to enforce ordering between the clear_bit and the read
+ * of the waitqueue (to avoid SMP races with a parallel wait_on_page_locked()).
*/
void unlock_page(struct page *page)
{
- smp_mb__before_clear_bit();
- if (!test_and_clear_bit(PG_locked, &page->flags))
- BUG();
- smp_mb__after_clear_bit();
+ VM_BUG_ON(!PageLocked(page));
+ clear_bit_unlock(PG_locked, &page->flags);
+ smp_mb__after_clear_bit();
wake_up_page(page, PG_locked);
}
EXPORT_SYMBOL(unlock_page);
Index: linux-2.6/include/linux/pagemap.h
===================================================================
--- linux-2.6.orig/include/linux/pagemap.h
+++ linux-2.6/include/linux/pagemap.h
@@ -283,7 +283,7 @@ static inline void __clear_page_locked(s
static inline int trylock_page(struct page *page)
{
- return !test_and_set_bit(PG_locked, &page->flags);
+ return (likely(!test_and_set_bit_lock(PG_locked, &page->flags)));
}
/*
Index: linux-2.6/mm/swapfile.c
===================================================================
--- linux-2.6.orig/mm/swapfile.c
+++ linux-2.6/mm/swapfile.c
@@ -403,7 +403,7 @@ void free_swap_and_cache(swp_entry_t ent
if (p) {
if (swap_entry_free(p, swp_offset(entry)) == 1) {
page = find_get_page(&swapper_space, entry.val);
- if (page && unlikely(!trylock_page(page))) {
+ if (page && !trylock_page(page)) {
page_cache_release(page);
page = NULL;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* [patch] fs: buffer lock use lock bitops
2008-08-18 12:24 [patch] mm: pagecache insertion fewer atomics Nick Piggin
2008-08-18 12:25 ` [patch] mm: unlockless reclaim Nick Piggin
2008-08-18 12:28 ` [patch] mm: page lock use lock bitops Nick Piggin
@ 2008-08-18 12:29 ` Nick Piggin
2008-08-18 12:29 ` [patch] mm: page allocator minor speedup Nick Piggin
2008-08-19 5:41 ` [patch] mm: pagecache insertion fewer atomics KOSAKI Motohiro
4 siblings, 0 replies; 27+ messages in thread
From: Nick Piggin @ 2008-08-18 12:29 UTC (permalink / raw)
To: Andrew Morton, Linux Memory Management List
fs: use lock bitops for the buffer lock
trylock_buffer and unlock_buffer open and close a critical section. Hence,
we can use the lock bitops to get the desired memory ordering.
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
fs/buffer.c | 3 +--
include/linux/buffer_head.h | 2 +-
2 files changed, 2 insertions(+), 3 deletions(-)
Index: linux-2.6/fs/buffer.c
===================================================================
--- linux-2.6.orig/fs/buffer.c
+++ linux-2.6/fs/buffer.c
@@ -76,8 +76,7 @@ EXPORT_SYMBOL(__lock_buffer);
void unlock_buffer(struct buffer_head *bh)
{
- smp_mb__before_clear_bit();
- clear_buffer_locked(bh);
+ clear_bit_unlock(BH_Lock, &bh->b_state);
smp_mb__after_clear_bit();
wake_up_bit(&bh->b_state, BH_Lock);
}
Index: linux-2.6/include/linux/buffer_head.h
===================================================================
--- linux-2.6.orig/include/linux/buffer_head.h
+++ linux-2.6/include/linux/buffer_head.h
@@ -322,7 +322,7 @@ static inline void wait_on_buffer(struct
static inline int trylock_buffer(struct buffer_head *bh)
{
- return likely(!test_and_set_bit(BH_Lock, &bh->b_state));
+ return likely(!test_and_set_bit_lock(BH_Lock, &bh->b_state));
}
static inline void lock_buffer(struct buffer_head *bh)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* [patch] mm: page allocator minor speedup
2008-08-18 12:24 [patch] mm: pagecache insertion fewer atomics Nick Piggin
` (2 preceding siblings ...)
2008-08-18 12:29 ` [patch] fs: buffer " Nick Piggin
@ 2008-08-18 12:29 ` Nick Piggin
2008-08-18 13:57 ` Pekka Enberg
` (2 more replies)
2008-08-19 5:41 ` [patch] mm: pagecache insertion fewer atomics KOSAKI Motohiro
4 siblings, 3 replies; 27+ messages in thread
From: Nick Piggin @ 2008-08-18 12:29 UTC (permalink / raw)
To: Andrew Morton, Linux Memory Management List; +Cc: Hugh Dickins
Now that we don't put a ZERO_PAGE in the pagetables any more, and the
"remove PageReserved from core mm" patch has had a long time to mature,
let's remove the page reserved logic from the allocator.
This saves several branches and about 100 bytes in some important paths.
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -449,7 +449,7 @@ static inline void __free_one_page(struc
zone->free_area[order].nr_free++;
}
-static inline int free_pages_check(struct page *page)
+static inline void free_pages_check(struct page *page)
{
if (unlikely(page_mapcount(page) |
(page->mapping != NULL) |
@@ -459,12 +459,6 @@ static inline int free_pages_check(struc
bad_page(page);
if (PageDirty(page))
__ClearPageDirty(page);
- /*
- * For now, we report if PG_reserved was found set, but do not
- * clear it, and do not free the page. But we shall soon need
- * to do more, for when the ZERO_PAGE count wraps negative.
- */
- return PageReserved(page);
}
/*
@@ -509,12 +503,9 @@ static void __free_pages_ok(struct page
{
unsigned long flags;
int i;
- int reserved = 0;
for (i = 0 ; i < (1 << order) ; ++i)
- reserved += free_pages_check(page + i);
- if (reserved)
- return;
+ free_pages_check(page + i);
if (!PageHighMem(page)) {
debug_check_no_locks_freed(page_address(page),PAGE_SIZE<<order);
@@ -593,7 +584,7 @@ static inline void expand(struct zone *z
/*
* This page is about to be returned from the page allocator
*/
-static int prep_new_page(struct page *page, int order, gfp_t gfp_flags)
+static void prep_new_page(struct page *page, int order, gfp_t gfp_flags)
{
if (unlikely(page_mapcount(page) |
(page->mapping != NULL) |
@@ -602,13 +593,6 @@ static int prep_new_page(struct page *pa
(page->flags & PAGE_FLAGS_CHECK_AT_PREP)))
bad_page(page);
- /*
- * For now, we report if PG_reserved was found set, but do not
- * clear it, and do not allocate the page: as a safety net.
- */
- if (PageReserved(page))
- return 1;
-
page->flags &= ~(1 << PG_uptodate | 1 << PG_error | 1 << PG_reclaim |
1 << PG_referenced | 1 << PG_arch_1 |
1 << PG_owner_priv_1 | 1 << PG_mappedtodisk);
@@ -623,8 +607,6 @@ static int prep_new_page(struct page *pa
if (order && (gfp_flags & __GFP_COMP))
prep_compound_page(page, order);
-
- return 0;
}
/*
@@ -970,8 +952,7 @@ static void free_hot_cold_page(struct pa
if (PageAnon(page))
page->mapping = NULL;
- if (free_pages_check(page))
- return;
+ free_pages_check(page);
if (!PageHighMem(page)) {
debug_check_no_locks_freed(page_address(page), PAGE_SIZE);
@@ -1039,7 +1020,6 @@ static struct page *buffered_rmqueue(str
int cpu;
int migratetype = allocflags_to_migratetype(gfp_flags);
-again:
cpu = get_cpu();
if (likely(order == 0)) {
struct per_cpu_pages *pcp;
@@ -1087,8 +1067,7 @@ again:
put_cpu();
VM_BUG_ON(bad_range(zone, page));
- if (prep_new_page(page, order, gfp_flags))
- goto again;
+ prep_new_page(page, order, gfp_flags);
return page;
failed:
Index: linux-2.6/include/linux/page-flags.h
===================================================================
--- linux-2.6.orig/include/linux/page-flags.h
+++ linux-2.6/include/linux/page-flags.h
@@ -330,7 +330,8 @@ static inline void __ClearPageTail(struc
#define PAGE_FLAGS (1 << PG_lru | 1 << PG_private | 1 << PG_locked | \
1 << PG_waiters | 1 << PG_buddy | 1 << PG_writeback | \
- 1 << PG_slab | 1 << PG_swapcache | 1 << PG_active)
+ 1 << PG_slab | 1 << PG_swapcache | 1 << PG_active | \
+ 1 << PG_reserved)
/*
* Flags checked in bad_page(). Pages on the free list should not have
@@ -342,14 +343,14 @@ static inline void __ClearPageTail(struc
* Flags checked when a page is freed. Pages being freed should not have
* these flags set. It they are, there is a problem.
*/
-#define PAGE_FLAGS_CHECK_AT_FREE (PAGE_FLAGS | 1 << PG_reserved)
+#define PAGE_FLAGS_CHECK_AT_FREE (PAGE_FLAGS)
/*
* Flags checked when a page is prepped for return by the page allocator.
* Pages being prepped should not have these flags set. It they are, there
* is a problem.
*/
-#define PAGE_FLAGS_CHECK_AT_PREP (PAGE_FLAGS | 1 << PG_reserved | 1 << PG_dirty)
+#define PAGE_FLAGS_CHECK_AT_PREP (PAGE_FLAGS | 1 << PG_dirty)
#endif /* !__GENERATING_BOUNDS_H */
#endif /* PAGE_FLAGS_H */
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch] mm: page allocator minor speedup
2008-08-18 12:29 ` [patch] mm: page allocator minor speedup Nick Piggin
@ 2008-08-18 13:57 ` Pekka Enberg
2008-08-19 7:49 ` Nick Piggin
2008-08-18 23:29 ` Andrew Morton
2008-10-14 15:52 ` Hugh Dickins
2 siblings, 1 reply; 27+ messages in thread
From: Pekka Enberg @ 2008-08-18 13:57 UTC (permalink / raw)
To: Nick Piggin
Cc: Andrew Morton, Linux Memory Management List, Hugh Dickins,
Christoph Lameter
Hi Nick,
On Mon, Aug 18, 2008 at 3:29 PM, Nick Piggin <npiggin@suse.de> wrote:
> Now that we don't put a ZERO_PAGE in the pagetables any more, and the
> "remove PageReserved from core mm" patch has had a long time to mature,
> let's remove the page reserved logic from the allocator.
>
> This saves several branches and about 100 bytes in some important paths.
Cool. Any numbers for this?
Pekka
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch] mm: page allocator minor speedup
2008-08-18 12:29 ` [patch] mm: page allocator minor speedup Nick Piggin
2008-08-18 13:57 ` Pekka Enberg
@ 2008-08-18 23:29 ` Andrew Morton
2008-08-19 7:51 ` Nick Piggin
2008-10-14 15:52 ` Hugh Dickins
2 siblings, 1 reply; 27+ messages in thread
From: Andrew Morton @ 2008-08-18 23:29 UTC (permalink / raw)
To: Nick Piggin; +Cc: linux-mm, hugh
On Mon, 18 Aug 2008 14:29:57 +0200
Nick Piggin <npiggin@suse.de> wrote:
> Now that we don't put a ZERO_PAGE in the pagetables any more, and the
> "remove PageReserved from core mm" patch has had a long time to mature,
> let's remove the page reserved logic from the allocator.
>
> This saves several branches and about 100 bytes in some important paths.
This of course made a big mess against the page reclaim rewrite. I
think I fixed it up.
I could have merged it ahead, but then that would have made a big mess
of the page reclaim rewrite.
Testing and reviewing the page reclaim rewrite would be a useful
place to spend time..
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch] mm: unlockless reclaim
2008-08-18 12:25 ` [patch] mm: unlockless reclaim Nick Piggin
@ 2008-08-19 5:09 ` KOSAKI Motohiro
2008-08-19 10:05 ` Nick Piggin
0 siblings, 1 reply; 27+ messages in thread
From: KOSAKI Motohiro @ 2008-08-19 5:09 UTC (permalink / raw)
To: Nick Piggin; +Cc: kosaki.motohiro, Andrew Morton, Linux Memory Management List
> - unlock_page(page);
> + /*
> + * At this point, we have no other references and there is
> + * no way to pick any more up (removed from LRU, removed
> + * from pagecache). Can use non-atomic bitops now (and
> + * we obviously don't have to worry about waking up a process
> + * waiting on the page lock, because there are no references.
> + */
> + __clear_page_locked(page);
> free_it:
> nr_reclaimed++;
> if (!pagevec_add(&freed_pvec, page)) {
>
To insert VM_BUG_ON(page_count(page) != 1) is better?
Otherthing, looks good to me.
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch] mm: pagecache insertion fewer atomics
2008-08-18 12:24 [patch] mm: pagecache insertion fewer atomics Nick Piggin
` (3 preceding siblings ...)
2008-08-18 12:29 ` [patch] mm: page allocator minor speedup Nick Piggin
@ 2008-08-19 5:41 ` KOSAKI Motohiro
2008-08-19 5:50 ` Andrew Morton
2008-08-19 10:07 ` Nick Piggin
4 siblings, 2 replies; 27+ messages in thread
From: KOSAKI Motohiro @ 2008-08-19 5:41 UTC (permalink / raw)
To: Nick Piggin; +Cc: kosaki.motohiro, Andrew Morton, Linux Memory Management List
Hi Nick,
> Setting and clearing the page locked when inserting it into swapcache /
> pagecache when it has no other references can use non-atomic page flags
> operations because no other CPU may be operating on it at this time.
>
> This saves one atomic operation when inserting a page into pagecache.
>
> Signed-off-by: Nick Piggin <npiggin@suse.de>
> ---
> include/linux/pagemap.h | 37 +++++++++++++++++++++++++++++++++----
> mm/swap_state.c | 4 ++--
> 2 files changed, 35 insertions(+), 6 deletions(-)
>
> Index: linux-2.6/mm/swap_state.c
> ===================================================================
> --- linux-2.6.orig/mm/swap_state.c
> +++ linux-2.6/mm/swap_state.c
> @@ -302,7 +302,7 @@ struct page *read_swap_cache_async(swp_e
> * re-using the just freed swap entry for an existing page.
> * May fail (-ENOMEM) if radix-tree node allocation failed.
> */
> - set_page_locked(new_page);
> + __set_page_locked(new_page);
> err = add_to_swap_cache(new_page, entry, gfp_mask & GFP_KERNEL);
> if (likely(!err)) {
> /*
What version do you working on?
2.6.27-rc1-mm1 is not contain set_page_locked().
mmotm?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch] mm: pagecache insertion fewer atomics
2008-08-19 5:41 ` [patch] mm: pagecache insertion fewer atomics KOSAKI Motohiro
@ 2008-08-19 5:50 ` Andrew Morton
2008-08-19 10:07 ` Nick Piggin
1 sibling, 0 replies; 27+ messages in thread
From: Andrew Morton @ 2008-08-19 5:50 UTC (permalink / raw)
To: KOSAKI Motohiro; +Cc: Nick Piggin, Linux Memory Management List
On Tue, 19 Aug 2008 14:41:50 +0900 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> Hi Nick,
>
> > Setting and clearing the page locked when inserting it into swapcache /
> > pagecache when it has no other references can use non-atomic page flags
> > operations because no other CPU may be operating on it at this time.
> >
> > This saves one atomic operation when inserting a page into pagecache.
> >
> > Signed-off-by: Nick Piggin <npiggin@suse.de>
> > ---
> > include/linux/pagemap.h | 37 +++++++++++++++++++++++++++++++++----
> > mm/swap_state.c | 4 ++--
> > 2 files changed, 35 insertions(+), 6 deletions(-)
> >
> > Index: linux-2.6/mm/swap_state.c
> > ===================================================================
> > --- linux-2.6.orig/mm/swap_state.c
> > +++ linux-2.6/mm/swap_state.c
> > @@ -302,7 +302,7 @@ struct page *read_swap_cache_async(swp_e
> > * re-using the just freed swap entry for an existing page.
> > * May fail (-ENOMEM) if radix-tree node allocation failed.
> > */
> > - set_page_locked(new_page);
> > + __set_page_locked(new_page);
> > err = add_to_swap_cache(new_page, entry, gfp_mask & GFP_KERNEL);
> > if (likely(!err)) {
> > /*
>
> What version do you working on?
Who knows... I had to fix a few rejects.
> 2.6.27-rc1-mm1 is not contain set_page_locked().
> mmotm?
Current mmotm has these patches integrated.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch] mm: page allocator minor speedup
2008-08-18 13:57 ` Pekka Enberg
@ 2008-08-19 7:49 ` Nick Piggin
2008-08-19 7:51 ` Pekka Enberg
0 siblings, 1 reply; 27+ messages in thread
From: Nick Piggin @ 2008-08-19 7:49 UTC (permalink / raw)
To: Pekka Enberg
Cc: Andrew Morton, Linux Memory Management List, Hugh Dickins,
Christoph Lameter
On Mon, Aug 18, 2008 at 04:57:00PM +0300, Pekka Enberg wrote:
> Hi Nick,
>
> On Mon, Aug 18, 2008 at 3:29 PM, Nick Piggin <npiggin@suse.de> wrote:
> > Now that we don't put a ZERO_PAGE in the pagetables any more, and the
> > "remove PageReserved from core mm" patch has had a long time to mature,
> > let's remove the page reserved logic from the allocator.
> >
> > This saves several branches and about 100 bytes in some important paths.
>
> Cool. Any numbers for this?
No, no numbers. I expect it would be very difficult to measure because
it probably only starts saving cycles when the workload exceeds L1I and/or
the branch caches.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch] mm: page allocator minor speedup
2008-08-18 23:29 ` Andrew Morton
@ 2008-08-19 7:51 ` Nick Piggin
0 siblings, 0 replies; 27+ messages in thread
From: Nick Piggin @ 2008-08-19 7:51 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, hugh
On Mon, Aug 18, 2008 at 04:29:34PM -0700, Andrew Morton wrote:
> On Mon, 18 Aug 2008 14:29:57 +0200
> Nick Piggin <npiggin@suse.de> wrote:
>
> > Now that we don't put a ZERO_PAGE in the pagetables any more, and the
> > "remove PageReserved from core mm" patch has had a long time to mature,
> > let's remove the page reserved logic from the allocator.
> >
> > This saves several branches and about 100 bytes in some important paths.
>
> This of course made a big mess against the page reclaim rewrite. I
> think I fixed it up.
>
> I could have merged it ahead, but then that would have made a big mess
> of the page reclaim rewrite.
Sure, thanks. It's not very important so it can wait on the page reclaim
rewrite.
> Testing and reviewing the page reclaim rewrite would be a useful
> place to spend time..
Yes it seems to have settled down a bit now. Fair few other things pretty
high on the list to look at too, but I will have to make time for this.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch] mm: page allocator minor speedup
2008-08-19 7:49 ` Nick Piggin
@ 2008-08-19 7:51 ` Pekka Enberg
2008-08-19 10:25 ` Nick Piggin
0 siblings, 1 reply; 27+ messages in thread
From: Pekka Enberg @ 2008-08-19 7:51 UTC (permalink / raw)
To: Nick Piggin
Cc: Andrew Morton, Linux Memory Management List, Hugh Dickins,
Christoph Lameter
Hi Nick,
On Mon, Aug 18, 2008 at 3:29 PM, Nick Piggin <npiggin@suse.de> wrote:
> > > Now that we don't put a ZERO_PAGE in the pagetables any more, and the
> > > "remove PageReserved from core mm" patch has had a long time to mature,
> > > let's remove the page reserved logic from the allocator.
> > >
> > > This saves several branches and about 100 bytes in some important paths.
i>>?
On Mon, Aug 18, 2008 at 04:57:00PM +0300, Pekka Enberg wrote:
> > Cool. Any numbers for this?
i>>?On Tue, 2008-08-19 at 09:49 +0200, Nick Piggin wrote:
> No, no numbers. I expect it would be very difficult to measure because
> it probably only starts saving cycles when the workload exceeds L1I and/or
> the branch caches.
OK, I am asking this because any improvements in the page allocator fast
paths are going to be a gain for SLUB intensive workloads as well.
Pekka
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch] mm: unlockless reclaim
2008-08-19 5:09 ` KOSAKI Motohiro
@ 2008-08-19 10:05 ` Nick Piggin
2008-08-19 10:20 ` KOSAKI Motohiro
0 siblings, 1 reply; 27+ messages in thread
From: Nick Piggin @ 2008-08-19 10:05 UTC (permalink / raw)
To: KOSAKI Motohiro; +Cc: Andrew Morton, Linux Memory Management List
On Tue, Aug 19, 2008 at 02:09:07PM +0900, KOSAKI Motohiro wrote:
> > - unlock_page(page);
> > + /*
> > + * At this point, we have no other references and there is
> > + * no way to pick any more up (removed from LRU, removed
> > + * from pagecache). Can use non-atomic bitops now (and
> > + * we obviously don't have to worry about waking up a process
> > + * waiting on the page lock, because there are no references.
> > + */
> > + __clear_page_locked(page);
> > free_it:
> > nr_reclaimed++;
> > if (!pagevec_add(&freed_pvec, page)) {
> >
>
> To insert VM_BUG_ON(page_count(page) != 1) is better?
> Otherthing, looks good to me.
That is a very good idea, except that now with lockless pagecache, we're
not so well placed to do this type of check. Actually at this point,
the refcount will be 0 because of page_freeze_refs, and then if anybody
else tries to do a get_page, they should hit the VM_BUG_ON in get_page.
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Thanks for looking at it.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch] mm: pagecache insertion fewer atomics
2008-08-19 5:41 ` [patch] mm: pagecache insertion fewer atomics KOSAKI Motohiro
2008-08-19 5:50 ` Andrew Morton
@ 2008-08-19 10:07 ` Nick Piggin
1 sibling, 0 replies; 27+ messages in thread
From: Nick Piggin @ 2008-08-19 10:07 UTC (permalink / raw)
To: KOSAKI Motohiro; +Cc: Andrew Morton, Linux Memory Management List
On Tue, Aug 19, 2008 at 02:41:50PM +0900, KOSAKI Motohiro wrote:
> Hi Nick,
>
> > Setting and clearing the page locked when inserting it into swapcache /
> > pagecache when it has no other references can use non-atomic page flags
> > operations because no other CPU may be operating on it at this time.
> >
> > This saves one atomic operation when inserting a page into pagecache.
> >
> > Signed-off-by: Nick Piggin <npiggin@suse.de>
> > ---
> > include/linux/pagemap.h | 37 +++++++++++++++++++++++++++++++++----
> > mm/swap_state.c | 4 ++--
> > 2 files changed, 35 insertions(+), 6 deletions(-)
> >
> > Index: linux-2.6/mm/swap_state.c
> > ===================================================================
> > --- linux-2.6.orig/mm/swap_state.c
> > +++ linux-2.6/mm/swap_state.c
> > @@ -302,7 +302,7 @@ struct page *read_swap_cache_async(swp_e
> > * re-using the just freed swap entry for an existing page.
> > * May fail (-ENOMEM) if radix-tree node allocation failed.
> > */
> > - set_page_locked(new_page);
> > + __set_page_locked(new_page);
> > err = add_to_swap_cache(new_page, entry, gfp_mask & GFP_KERNEL);
> > if (likely(!err)) {
> > /*
>
> What version do you working on?
>
> 2.6.27-rc1-mm1 is not contain set_page_locked().
> mmotm?
Upstream actually, because -mm was so far behind :) I forgot to check
mmotm, sorry.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch] mm: unlockless reclaim
2008-08-19 10:05 ` Nick Piggin
@ 2008-08-19 10:20 ` KOSAKI Motohiro
0 siblings, 0 replies; 27+ messages in thread
From: KOSAKI Motohiro @ 2008-08-19 10:20 UTC (permalink / raw)
To: Nick Piggin; +Cc: kosaki.motohiro, Andrew Morton, Linux Memory Management List
> On Tue, Aug 19, 2008 at 02:09:07PM +0900, KOSAKI Motohiro wrote:
> > > - unlock_page(page);
> > > + /*
> > > + * At this point, we have no other references and there is
> > > + * no way to pick any more up (removed from LRU, removed
> > > + * from pagecache). Can use non-atomic bitops now (and
> > > + * we obviously don't have to worry about waking up a process
> > > + * waiting on the page lock, because there are no references.
> > > + */
> > > + __clear_page_locked(page);
> > > free_it:
> > > nr_reclaimed++;
> > > if (!pagevec_add(&freed_pvec, page)) {
> > >
> >
> > To insert VM_BUG_ON(page_count(page) != 1) is better?
> > Otherthing, looks good to me.
>
> That is a very good idea, except that now with lockless pagecache, we're
> not so well placed to do this type of check. Actually at this point,
> the refcount will be 0 because of page_freeze_refs, and then if anybody
> else tries to do a get_page, they should hit the VM_BUG_ON in get_page.
Ah, you are right.
Sorry, I often forgot it ;)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch] mm: page allocator minor speedup
2008-08-19 7:51 ` Pekka Enberg
@ 2008-08-19 10:25 ` Nick Piggin
0 siblings, 0 replies; 27+ messages in thread
From: Nick Piggin @ 2008-08-19 10:25 UTC (permalink / raw)
To: Pekka Enberg
Cc: Andrew Morton, Linux Memory Management List, Hugh Dickins,
Christoph Lameter
On Tue, Aug 19, 2008 at 10:51:41AM +0300, Pekka Enberg wrote:
> Hi Nick,
>
> On Mon, Aug 18, 2008 at 3:29 PM, Nick Piggin <npiggin@suse.de> wrote:
> > > > Now that we don't put a ZERO_PAGE in the pagetables any more, and the
> > > > "remove PageReserved from core mm" patch has had a long time to mature,
> > > > let's remove the page reserved logic from the allocator.
> > > >
> > > > This saves several branches and about 100 bytes in some important paths.
> ???
> On Mon, Aug 18, 2008 at 04:57:00PM +0300, Pekka Enberg wrote:
> > > Cool. Any numbers for this?
>
> ???On Tue, 2008-08-19 at 09:49 +0200, Nick Piggin wrote:
> > No, no numbers. I expect it would be very difficult to measure because
> > it probably only starts saving cycles when the workload exceeds L1I and/or
> > the branch caches.
>
> OK, I am asking this because any improvements in the page allocator fast
> paths are going to be a gain for SLUB intensive workloads as well.
Right. "OLTP" is *very* cache and branch sensitive... but I doubt this
would stand out from the noise.
BTW. I have a patch somewhere that adds a new interface to the page
allocator which avoids setting the page refcount. This way if you're
careful you can free the page after use without the expensive
atomic_dec_and_test & branch.
I didn't quite get it into a form that I like (would have required some
more extensive page allocator rework). But if you're interested in
numbers, then what I had should be enough to get an idea...
Remember to give SLAB the same advantage too ;)!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch] mm: page allocator minor speedup
2008-08-18 12:29 ` [patch] mm: page allocator minor speedup Nick Piggin
2008-08-18 13:57 ` Pekka Enberg
2008-08-18 23:29 ` Andrew Morton
@ 2008-10-14 15:52 ` Hugh Dickins
2008-10-15 4:37 ` Nick Piggin
2 siblings, 1 reply; 27+ messages in thread
From: Hugh Dickins @ 2008-10-14 15:52 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andrew Morton, Dave Jones, linux-mm
On Mon, 18 Aug 2008, Nick Piggin wrote:
> Now that we don't put a ZERO_PAGE in the pagetables any more, and the
> "remove PageReserved from core mm" patch has had a long time to mature,
> let's remove the page reserved logic from the allocator.
>
> This saves several branches and about 100 bytes in some important paths.
>
> Signed-off-by: Nick Piggin <npiggin@suse.de>
As usual, I'm ever so slightly on the slow side... sorry.
I'm afraid I disagree with mm-page-allocator-minor-speedup.patch.
I'm perfectly happy with bringing PG_reserved into "PAGE_FLAGS"
and not special-casing it there. My problem with your patch is
that we ought to be retaining the several branches and 100 bytes
of code, extending them to _every_ case of a "bad state" page.
So that any such suspect page is taken out of circulation (needs
count forced to 1, whatever it was before?), so the system can
then proceed a little more safely.
That would go hand-in-hand with removing the page_remove_rmap()
BUG() and reworking the info shown there. I think it's fair to
say that none of the "Eeek!" messaging added in the last couple
of years has actually shed any light; but it's still worth having
a special message there, because the "bad page state" ones are
liable to follow too late, when most of the info has been lost.
As in one of the old debug patches I had, I'd like to print out
the actual pte and _its_ physical address, info not currently to
hand within page_remove_rmap() - they might sometimes correspond
to that "BIOS corrupting low 64kB" issue, for example. Shown in
such a way that kerneloops.org is sure to report them.
As you can see, I've not quite got around to doing that yet...
but mm-page-allocator-minor-speedup.patch takes us in the wrong
direction.
I expect we're going to have our usual "Hugh wants to spot page
table corruption" versus "Nick wants to cut overhead" fight!
As we had over the pfn_valid in vm_normal_page - I think I lost
that one, the HAVE_PTE_SPECIAL VM_BUG_ON neuters its usefulness.
Hugh
> ---
> Index: linux-2.6/mm/page_alloc.c
> ===================================================================
> --- linux-2.6.orig/mm/page_alloc.c
> +++ linux-2.6/mm/page_alloc.c
> @@ -449,7 +449,7 @@ static inline void __free_one_page(struc
> zone->free_area[order].nr_free++;
> }
>
> -static inline int free_pages_check(struct page *page)
> +static inline void free_pages_check(struct page *page)
> {
> if (unlikely(page_mapcount(page) |
> (page->mapping != NULL) |
> @@ -459,12 +459,6 @@ static inline int free_pages_check(struc
> bad_page(page);
> if (PageDirty(page))
> __ClearPageDirty(page);
> - /*
> - * For now, we report if PG_reserved was found set, but do not
> - * clear it, and do not free the page. But we shall soon need
> - * to do more, for when the ZERO_PAGE count wraps negative.
> - */
> - return PageReserved(page);
> }
>
> /*
> @@ -509,12 +503,9 @@ static void __free_pages_ok(struct page
> {
> unsigned long flags;
> int i;
> - int reserved = 0;
>
> for (i = 0 ; i < (1 << order) ; ++i)
> - reserved += free_pages_check(page + i);
> - if (reserved)
> - return;
> + free_pages_check(page + i);
>
> if (!PageHighMem(page)) {
> debug_check_no_locks_freed(page_address(page),PAGE_SIZE<<order);
> @@ -593,7 +584,7 @@ static inline void expand(struct zone *z
> /*
> * This page is about to be returned from the page allocator
> */
> -static int prep_new_page(struct page *page, int order, gfp_t gfp_flags)
> +static void prep_new_page(struct page *page, int order, gfp_t gfp_flags)
> {
> if (unlikely(page_mapcount(page) |
> (page->mapping != NULL) |
> @@ -602,13 +593,6 @@ static int prep_new_page(struct page *pa
> (page->flags & PAGE_FLAGS_CHECK_AT_PREP)))
> bad_page(page);
>
> - /*
> - * For now, we report if PG_reserved was found set, but do not
> - * clear it, and do not allocate the page: as a safety net.
> - */
> - if (PageReserved(page))
> - return 1;
> -
> page->flags &= ~(1 << PG_uptodate | 1 << PG_error | 1 << PG_reclaim |
> 1 << PG_referenced | 1 << PG_arch_1 |
> 1 << PG_owner_priv_1 | 1 << PG_mappedtodisk);
> @@ -623,8 +607,6 @@ static int prep_new_page(struct page *pa
>
> if (order && (gfp_flags & __GFP_COMP))
> prep_compound_page(page, order);
> -
> - return 0;
> }
>
> /*
> @@ -970,8 +952,7 @@ static void free_hot_cold_page(struct pa
>
> if (PageAnon(page))
> page->mapping = NULL;
> - if (free_pages_check(page))
> - return;
> + free_pages_check(page);
>
> if (!PageHighMem(page)) {
> debug_check_no_locks_freed(page_address(page), PAGE_SIZE);
> @@ -1039,7 +1020,6 @@ static struct page *buffered_rmqueue(str
> int cpu;
> int migratetype = allocflags_to_migratetype(gfp_flags);
>
> -again:
> cpu = get_cpu();
> if (likely(order == 0)) {
> struct per_cpu_pages *pcp;
> @@ -1087,8 +1067,7 @@ again:
> put_cpu();
>
> VM_BUG_ON(bad_range(zone, page));
> - if (prep_new_page(page, order, gfp_flags))
> - goto again;
> + prep_new_page(page, order, gfp_flags);
> return page;
>
> failed:
> Index: linux-2.6/include/linux/page-flags.h
> ===================================================================
> --- linux-2.6.orig/include/linux/page-flags.h
> +++ linux-2.6/include/linux/page-flags.h
> @@ -330,7 +330,8 @@ static inline void __ClearPageTail(struc
>
> #define PAGE_FLAGS (1 << PG_lru | 1 << PG_private | 1 << PG_locked | \
> 1 << PG_waiters | 1 << PG_buddy | 1 << PG_writeback | \
> - 1 << PG_slab | 1 << PG_swapcache | 1 << PG_active)
> + 1 << PG_slab | 1 << PG_swapcache | 1 << PG_active | \
> + 1 << PG_reserved)
>
> /*
> * Flags checked in bad_page(). Pages on the free list should not have
> @@ -342,14 +343,14 @@ static inline void __ClearPageTail(struc
> * Flags checked when a page is freed. Pages being freed should not have
> * these flags set. It they are, there is a problem.
> */
> -#define PAGE_FLAGS_CHECK_AT_FREE (PAGE_FLAGS | 1 << PG_reserved)
> +#define PAGE_FLAGS_CHECK_AT_FREE (PAGE_FLAGS)
>
> /*
> * Flags checked when a page is prepped for return by the page allocator.
> * Pages being prepped should not have these flags set. It they are, there
> * is a problem.
> */
> -#define PAGE_FLAGS_CHECK_AT_PREP (PAGE_FLAGS | 1 << PG_reserved | 1 << PG_dirty)
> +#define PAGE_FLAGS_CHECK_AT_PREP (PAGE_FLAGS | 1 << PG_dirty)
>
> #endif /* !__GENERATING_BOUNDS_H */
> #endif /* PAGE_FLAGS_H */
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch] mm: page allocator minor speedup
2008-10-14 15:52 ` Hugh Dickins
@ 2008-10-15 4:37 ` Nick Piggin
0 siblings, 0 replies; 27+ messages in thread
From: Nick Piggin @ 2008-10-15 4:37 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Andrew Morton, Dave Jones, linux-mm
On Tue, Oct 14, 2008 at 04:52:03PM +0100, Hugh Dickins wrote:
> On Mon, 18 Aug 2008, Nick Piggin wrote:
>
> > Now that we don't put a ZERO_PAGE in the pagetables any more, and the
> > "remove PageReserved from core mm" patch has had a long time to mature,
> > let's remove the page reserved logic from the allocator.
> >
> > This saves several branches and about 100 bytes in some important paths.
> >
> > Signed-off-by: Nick Piggin <npiggin@suse.de>
>
> As usual, I'm ever so slightly on the slow side... sorry.
> I'm afraid I disagree with mm-page-allocator-minor-speedup.patch.
That's OK. I never really cared to rush things into merge. And
a thoughtful review is never too late IMO.
> I'm perfectly happy with bringing PG_reserved into "PAGE_FLAGS"
> and not special-casing it there. My problem with your patch is
> that we ought to be retaining the several branches and 100 bytes
> of code, extending them to _every_ case of a "bad state" page.
> So that any such suspect page is taken out of circulation (needs
> count forced to 1, whatever it was before?), so the system can
> then proceed a little more safely.
Hmm. I don't entirely disagree. Although would that change logically
come after this one? Or is it annoying to have to re-add some of the
logic to drop pages?
One thing that I am slightly sad about is that mm/ fastpaths are
largely lumped with detecting this stuff. These days I'd say most
or all bugs seen in these checks are due to bad hardware or bugs
or memory scribbles probably from other parts of the kernel.
And the thing is, we only check maybe a couple of % of all memory
if we're just checking some page flags and ptes... why not take
a crc of the struct dentry on each modification, and recheck it
before subsequently using or modifying it? ;) How about radix tree
nodes? or any other data structure.
> That would go hand-in-hand with removing the page_remove_rmap()
> BUG() and reworking the info shown there. I think it's fair to
> say that none of the "Eeek!" messaging added in the last couple
> of years has actually shed any light; but it's still worth having
> a special message there, because the "bad page state" ones are
> liable to follow too late, when most of the info has been lost.
No, only really helpful when developing mm or driver code I think
(which would suggest it should be DEBUG_VM, however I agree it could
be a bit more of a special case and enabled on production kernels
especially if the messages can be made more useful).
> As in one of the old debug patches I had, I'd like to print out
> the actual pte and _its_ physical address, info not currently to
> hand within page_remove_rmap() - they might sometimes correspond
> to that "BIOS corrupting low 64kB" issue, for example. Shown in
> such a way that kerneloops.org is sure to report them.
>
> As you can see, I've not quite got around to doing that yet...
> but mm-page-allocator-minor-speedup.patch takes us in the wrong
> direction.
>
> I expect we're going to have our usual "Hugh wants to spot page
> table corruption" versus "Nick wants to cut overhead" fight!
> As we had over the pfn_valid in vm_normal_page - I think I lost
> that one, the HAVE_PTE_SPECIAL VM_BUG_ON neuters its usefulness.
Hmm... I don't mind so much, especially if you're planning other
improvements to the code.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* [patch] mm: unlockless reclaim
2007-11-10 11:51 ` Peter Zijlstra
@ 2007-11-11 8:40 ` Nick Piggin
0 siblings, 0 replies; 27+ messages in thread
From: Nick Piggin @ 2007-11-11 8:40 UTC (permalink / raw)
To: Andrew Morton
Cc: Linux Memory Management List, Linux Kernel Mailing List,
Linus Torvalds, Peter Zijlstra
Combined with the previous and subsequent patches, throughput of pages through
the pagecache on an Opteron system here goes up by anywhere from 50% to 500%,
depending on the number of files and threads involved.
--
unlock_page is fairly expensive. It can be avoided in page reclaim.
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -589,7 +589,14 @@ static unsigned long shrink_page_list(st
goto keep_locked;
free_it:
- unlock_page(page);
+ /*
+ * At this point, we have no other references and there is
+ * no way to pick any more up (removed from LRU, removed
+ * from pagecache). Can use non-atomic bitops now (and
+ * we obviously don't have to worry about waking up a process
+ * waiting on the page lock, because there are no references.
+ */
+ __clear_page_locked(page);
nr_reclaimed++;
if (!pagevec_add(&freed_pvec, page))
__pagevec_release_nonlru(&freed_pvec);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch] mm: unlockless reclaim
2007-07-12 19:20 ` Andrew Morton
@ 2007-07-14 8:35 ` Nick Piggin
0 siblings, 0 replies; 27+ messages in thread
From: Nick Piggin @ 2007-07-14 8:35 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux Memory Management List
On Thu, Jul 12, 2007 at 12:20:39PM -0700, Andrew Morton wrote:
> On Thu, 12 Jul 2007 06:11:15 +0200
> Nick Piggin <npiggin@suse.de> wrote:
>
> > unlock_page is pretty expensive. Even after my patches to optimise the
> > memory order and away the waitqueue hit for uncontended pages, it is
> > still a locked operation, which may be anywhere up to hundreds of cycles
> > on some CPUs.
> >
> > When we reclaim a page, we don't need to "unlock" it as such, because
> > we know there will be no contention (if there was, it would be a bug
> > because the page is just about to get freed).
> >
> > Signed-off-by: Nick Piggin <npiggin@suse.de>
> >
> > Index: linux-2.6/include/linux/page-flags.h
> > ===================================================================
> > --- linux-2.6.orig/include/linux/page-flags.h
> > +++ linux-2.6/include/linux/page-flags.h
> > @@ -115,6 +115,8 @@
> > test_and_set_bit(PG_locked, &(page)->flags)
> > #define ClearPageLocked(page) \
> > clear_bit(PG_locked, &(page)->flags)
> > +#define __ClearPageLocked(page) \
> > + __clear_bit(PG_locked, &(page)->flags)
> > #define TestClearPageLocked(page) \
> > test_and_clear_bit(PG_locked, &(page)->flags)
> >
> > Index: linux-2.6/mm/vmscan.c
> > ===================================================================
> > --- linux-2.6.orig/mm/vmscan.c
> > +++ linux-2.6/mm/vmscan.c
> > @@ -576,7 +576,7 @@ static unsigned long shrink_page_list(st
> > goto keep_locked;
> >
> > free_it:
> > - unlock_page(page);
> > + __ClearPageLocked(page);
> > nr_reclaimed++;
> > if (!pagevec_add(&freed_pvec, page))
> > __pagevec_release_nonlru(&freed_pvec);
>
> I really hate this patch :( For the usual reasons.
>
> I'd have thought that such a terrifying point-cannon-at-someone-else's-foot
> hack would at least merit a comment explaining (fully) to the reader why it
> is a safe thing to do at this site.
OK, comments are fair enough although we already do similar things
to clear other flags here and in the page allocator path itself so
I thought it was already deemed kosher. I'll add some in a next
iteration.
> And explaining to them why __pagevec_release_nonlru() immediately
> contradicts the assumption which this code is making.
Again fair enough. I'll send it sometime when you've got less mm
stuff in your tree I guess.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch] mm: unlockless reclaim
2007-07-12 4:11 Nick Piggin
2007-07-12 7:43 ` Andrew Morton
@ 2007-07-12 19:20 ` Andrew Morton
2007-07-14 8:35 ` Nick Piggin
1 sibling, 1 reply; 27+ messages in thread
From: Andrew Morton @ 2007-07-12 19:20 UTC (permalink / raw)
To: Nick Piggin; +Cc: Linux Memory Management List
On Thu, 12 Jul 2007 06:11:15 +0200
Nick Piggin <npiggin@suse.de> wrote:
> unlock_page is pretty expensive. Even after my patches to optimise the
> memory order and away the waitqueue hit for uncontended pages, it is
> still a locked operation, which may be anywhere up to hundreds of cycles
> on some CPUs.
>
> When we reclaim a page, we don't need to "unlock" it as such, because
> we know there will be no contention (if there was, it would be a bug
> because the page is just about to get freed).
>
> Signed-off-by: Nick Piggin <npiggin@suse.de>
>
> Index: linux-2.6/include/linux/page-flags.h
> ===================================================================
> --- linux-2.6.orig/include/linux/page-flags.h
> +++ linux-2.6/include/linux/page-flags.h
> @@ -115,6 +115,8 @@
> test_and_set_bit(PG_locked, &(page)->flags)
> #define ClearPageLocked(page) \
> clear_bit(PG_locked, &(page)->flags)
> +#define __ClearPageLocked(page) \
> + __clear_bit(PG_locked, &(page)->flags)
> #define TestClearPageLocked(page) \
> test_and_clear_bit(PG_locked, &(page)->flags)
>
> Index: linux-2.6/mm/vmscan.c
> ===================================================================
> --- linux-2.6.orig/mm/vmscan.c
> +++ linux-2.6/mm/vmscan.c
> @@ -576,7 +576,7 @@ static unsigned long shrink_page_list(st
> goto keep_locked;
>
> free_it:
> - unlock_page(page);
> + __ClearPageLocked(page);
> nr_reclaimed++;
> if (!pagevec_add(&freed_pvec, page))
> __pagevec_release_nonlru(&freed_pvec);
I really hate this patch :( For the usual reasons.
I'd have thought that such a terrifying point-cannon-at-someone-else's-foot
hack would at least merit a comment explaining (fully) to the reader why it
is a safe thing to do at this site.
And explaining to them why __pagevec_release_nonlru() immediately
contradicts the assumption which this code is making.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch] mm: unlockless reclaim
2007-07-12 8:00 ` Andrew Morton
@ 2007-07-12 8:18 ` Nick Piggin
0 siblings, 0 replies; 27+ messages in thread
From: Nick Piggin @ 2007-07-12 8:18 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux Memory Management List
On Thu, Jul 12, 2007 at 01:00:07AM -0700, Andrew Morton wrote:
> On Thu, 12 Jul 2007 09:55:32 +0200 Nick Piggin <npiggin@suse.de> wrote:
>
> > >
> > > mutter.
> > >
> > > So why does __pagevec_release_nonlru() check the page refcount?
> >
> > It doesn't
>
> yes it does
That was in answer to your question: I mean: it doesn't need to.
> > although it will have to return the count to zero of course.
> >
> > I don't want to submit that because the lockless pagecache always needs
> > the refcount to be checked :) And which I am actually going to submit to
> > you after you chuck out a few patches.
> >
> > But unlock_page is really murderous on my powerpc (with all the
> > unlock-speeup patches, dd if=/dev/zero of=/dev/null of a huge sparse file
> > goes up by 10% throughput on the G5!!).
>
> well this change won't help that much.
Oh, well the dd includes reclaim and so it ends up doing 2 locks for
each page (1 to reading, 1 to reclaim). So this alone supposedly should
help by 5% :)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch] mm: unlockless reclaim
2007-07-12 7:55 ` Nick Piggin
@ 2007-07-12 8:00 ` Andrew Morton
2007-07-12 8:18 ` Nick Piggin
0 siblings, 1 reply; 27+ messages in thread
From: Andrew Morton @ 2007-07-12 8:00 UTC (permalink / raw)
To: Nick Piggin; +Cc: Linux Memory Management List
On Thu, 12 Jul 2007 09:55:32 +0200 Nick Piggin <npiggin@suse.de> wrote:
> On Thu, Jul 12, 2007 at 12:43:39AM -0700, Andrew Morton wrote:
> > On Thu, 12 Jul 2007 06:11:15 +0200 Nick Piggin <npiggin@suse.de> wrote:
> >
> > > unlock_page is pretty expensive. Even after my patches to optimise the
> > > memory order and away the waitqueue hit for uncontended pages, it is
> > > still a locked operation, which may be anywhere up to hundreds of cycles
> > > on some CPUs.
> > >
> > > When we reclaim a page, we don't need to "unlock" it as such, because
> > > we know there will be no contention (if there was, it would be a bug
> > > because the page is just about to get freed).
> > >
> > > Signed-off-by: Nick Piggin <npiggin@suse.de>
> > >
> > > Index: linux-2.6/include/linux/page-flags.h
> > > ===================================================================
> > > --- linux-2.6.orig/include/linux/page-flags.h
> > > +++ linux-2.6/include/linux/page-flags.h
> > > @@ -115,6 +115,8 @@
> > > test_and_set_bit(PG_locked, &(page)->flags)
> > > #define ClearPageLocked(page) \
> > > clear_bit(PG_locked, &(page)->flags)
> > > +#define __ClearPageLocked(page) \
> > > + __clear_bit(PG_locked, &(page)->flags)
> > > #define TestClearPageLocked(page) \
> > > test_and_clear_bit(PG_locked, &(page)->flags)
> > >
> > > Index: linux-2.6/mm/vmscan.c
> > > ===================================================================
> > > --- linux-2.6.orig/mm/vmscan.c
> > > +++ linux-2.6/mm/vmscan.c
> > > @@ -576,7 +576,7 @@ static unsigned long shrink_page_list(st
> > > goto keep_locked;
> > >
> > > free_it:
> > > - unlock_page(page);
> > > + __ClearPageLocked(page);
> > > nr_reclaimed++;
> > > if (!pagevec_add(&freed_pvec, page))
> > > __pagevec_release_nonlru(&freed_pvec);
> >
> > mutter.
> >
> > So why does __pagevec_release_nonlru() check the page refcount?
>
> It doesn't
yes it does
> although it will have to return the count to zero of course.
>
> I don't want to submit that because the lockless pagecache always needs
> the refcount to be checked :) And which I am actually going to submit to
> you after you chuck out a few patches.
>
> But unlock_page is really murderous on my powerpc (with all the
> unlock-speeup patches, dd if=/dev/zero of=/dev/null of a huge sparse file
> goes up by 10% throughput on the G5!!).
well this change won't help that much.
mutter.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch] mm: unlockless reclaim
2007-07-12 7:43 ` Andrew Morton
@ 2007-07-12 7:55 ` Nick Piggin
2007-07-12 8:00 ` Andrew Morton
0 siblings, 1 reply; 27+ messages in thread
From: Nick Piggin @ 2007-07-12 7:55 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux Memory Management List
On Thu, Jul 12, 2007 at 12:43:39AM -0700, Andrew Morton wrote:
> On Thu, 12 Jul 2007 06:11:15 +0200 Nick Piggin <npiggin@suse.de> wrote:
>
> > unlock_page is pretty expensive. Even after my patches to optimise the
> > memory order and away the waitqueue hit for uncontended pages, it is
> > still a locked operation, which may be anywhere up to hundreds of cycles
> > on some CPUs.
> >
> > When we reclaim a page, we don't need to "unlock" it as such, because
> > we know there will be no contention (if there was, it would be a bug
> > because the page is just about to get freed).
> >
> > Signed-off-by: Nick Piggin <npiggin@suse.de>
> >
> > Index: linux-2.6/include/linux/page-flags.h
> > ===================================================================
> > --- linux-2.6.orig/include/linux/page-flags.h
> > +++ linux-2.6/include/linux/page-flags.h
> > @@ -115,6 +115,8 @@
> > test_and_set_bit(PG_locked, &(page)->flags)
> > #define ClearPageLocked(page) \
> > clear_bit(PG_locked, &(page)->flags)
> > +#define __ClearPageLocked(page) \
> > + __clear_bit(PG_locked, &(page)->flags)
> > #define TestClearPageLocked(page) \
> > test_and_clear_bit(PG_locked, &(page)->flags)
> >
> > Index: linux-2.6/mm/vmscan.c
> > ===================================================================
> > --- linux-2.6.orig/mm/vmscan.c
> > +++ linux-2.6/mm/vmscan.c
> > @@ -576,7 +576,7 @@ static unsigned long shrink_page_list(st
> > goto keep_locked;
> >
> > free_it:
> > - unlock_page(page);
> > + __ClearPageLocked(page);
> > nr_reclaimed++;
> > if (!pagevec_add(&freed_pvec, page))
> > __pagevec_release_nonlru(&freed_pvec);
>
> mutter.
>
> So why does __pagevec_release_nonlru() check the page refcount?
It doesn't, although it will have to return the count to zero of course.
I don't want to submit that because the lockless pagecache always needs
the refcount to be checked :) And which I am actually going to submit to
you after you chuck out a few patches.
But unlock_page is really murderous on my powerpc (with all the
unlock-speeup patches, dd if=/dev/zero of=/dev/null of a huge sparse file
goes up by 10% throughput on the G5!!).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch] mm: unlockless reclaim
2007-07-12 4:11 Nick Piggin
@ 2007-07-12 7:43 ` Andrew Morton
2007-07-12 7:55 ` Nick Piggin
2007-07-12 19:20 ` Andrew Morton
1 sibling, 1 reply; 27+ messages in thread
From: Andrew Morton @ 2007-07-12 7:43 UTC (permalink / raw)
To: Nick Piggin; +Cc: Linux Memory Management List
On Thu, 12 Jul 2007 06:11:15 +0200 Nick Piggin <npiggin@suse.de> wrote:
> unlock_page is pretty expensive. Even after my patches to optimise the
> memory order and away the waitqueue hit for uncontended pages, it is
> still a locked operation, which may be anywhere up to hundreds of cycles
> on some CPUs.
>
> When we reclaim a page, we don't need to "unlock" it as such, because
> we know there will be no contention (if there was, it would be a bug
> because the page is just about to get freed).
>
> Signed-off-by: Nick Piggin <npiggin@suse.de>
>
> Index: linux-2.6/include/linux/page-flags.h
> ===================================================================
> --- linux-2.6.orig/include/linux/page-flags.h
> +++ linux-2.6/include/linux/page-flags.h
> @@ -115,6 +115,8 @@
> test_and_set_bit(PG_locked, &(page)->flags)
> #define ClearPageLocked(page) \
> clear_bit(PG_locked, &(page)->flags)
> +#define __ClearPageLocked(page) \
> + __clear_bit(PG_locked, &(page)->flags)
> #define TestClearPageLocked(page) \
> test_and_clear_bit(PG_locked, &(page)->flags)
>
> Index: linux-2.6/mm/vmscan.c
> ===================================================================
> --- linux-2.6.orig/mm/vmscan.c
> +++ linux-2.6/mm/vmscan.c
> @@ -576,7 +576,7 @@ static unsigned long shrink_page_list(st
> goto keep_locked;
>
> free_it:
> - unlock_page(page);
> + __ClearPageLocked(page);
> nr_reclaimed++;
> if (!pagevec_add(&freed_pvec, page))
> __pagevec_release_nonlru(&freed_pvec);
mutter.
So why does __pagevec_release_nonlru() check the page refcount?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* [patch] mm: unlockless reclaim
@ 2007-07-12 4:11 Nick Piggin
2007-07-12 7:43 ` Andrew Morton
2007-07-12 19:20 ` Andrew Morton
0 siblings, 2 replies; 27+ messages in thread
From: Nick Piggin @ 2007-07-12 4:11 UTC (permalink / raw)
To: Andrew Morton, Linux Memory Management List
unlock_page is pretty expensive. Even after my patches to optimise the
memory order and away the waitqueue hit for uncontended pages, it is
still a locked operation, which may be anywhere up to hundreds of cycles
on some CPUs.
When we reclaim a page, we don't need to "unlock" it as such, because
we know there will be no contention (if there was, it would be a bug
because the page is just about to get freed).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Index: linux-2.6/include/linux/page-flags.h
===================================================================
--- linux-2.6.orig/include/linux/page-flags.h
+++ linux-2.6/include/linux/page-flags.h
@@ -115,6 +115,8 @@
test_and_set_bit(PG_locked, &(page)->flags)
#define ClearPageLocked(page) \
clear_bit(PG_locked, &(page)->flags)
+#define __ClearPageLocked(page) \
+ __clear_bit(PG_locked, &(page)->flags)
#define TestClearPageLocked(page) \
test_and_clear_bit(PG_locked, &(page)->flags)
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -576,7 +576,7 @@ static unsigned long shrink_page_list(st
goto keep_locked;
free_it:
- unlock_page(page);
+ __ClearPageLocked(page);
nr_reclaimed++;
if (!pagevec_add(&freed_pvec, page))
__pagevec_release_nonlru(&freed_pvec);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2008-10-15 4:37 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-08-18 12:24 [patch] mm: pagecache insertion fewer atomics Nick Piggin
2008-08-18 12:25 ` [patch] mm: unlockless reclaim Nick Piggin
2008-08-19 5:09 ` KOSAKI Motohiro
2008-08-19 10:05 ` Nick Piggin
2008-08-19 10:20 ` KOSAKI Motohiro
2008-08-18 12:28 ` [patch] mm: page lock use lock bitops Nick Piggin
2008-08-18 12:29 ` [patch] fs: buffer " Nick Piggin
2008-08-18 12:29 ` [patch] mm: page allocator minor speedup Nick Piggin
2008-08-18 13:57 ` Pekka Enberg
2008-08-19 7:49 ` Nick Piggin
2008-08-19 7:51 ` Pekka Enberg
2008-08-19 10:25 ` Nick Piggin
2008-08-18 23:29 ` Andrew Morton
2008-08-19 7:51 ` Nick Piggin
2008-10-14 15:52 ` Hugh Dickins
2008-10-15 4:37 ` Nick Piggin
2008-08-19 5:41 ` [patch] mm: pagecache insertion fewer atomics KOSAKI Motohiro
2008-08-19 5:50 ` Andrew Morton
2008-08-19 10:07 ` Nick Piggin
-- strict thread matches above, loose matches on Subject: below --
2007-11-10 5:12 [patch 1/2] mm: page trylock rename Nick Piggin
2007-11-10 5:43 ` Nick Piggin
2007-11-10 11:51 ` Peter Zijlstra
2007-11-11 8:40 ` [patch] mm: unlockless reclaim Nick Piggin
2007-07-12 4:11 Nick Piggin
2007-07-12 7:43 ` Andrew Morton
2007-07-12 7:55 ` Nick Piggin
2007-07-12 8:00 ` Andrew Morton
2007-07-12 8:18 ` Nick Piggin
2007-07-12 19:20 ` Andrew Morton
2007-07-14 8:35 ` Nick Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox