* [RFC][PATCH 0/2] mm: some patches about add_to_swap_cache() @ 2009-08-10 2:23 Daisuke Nishimura 2009-08-10 2:26 ` [BUGFIX][1/2] mm: add_to_swap_cache() must not sleep Daisuke Nishimura 2009-08-10 2:27 ` [cleanup][2/2] mm: add_to_swap_cache() does not return -EEXIST Daisuke Nishimura 0 siblings, 2 replies; 8+ messages in thread From: Daisuke Nishimura @ 2009-08-10 2:23 UTC (permalink / raw) To: LKML, linux-mm Cc: Andrew Morton, Balbir Singh, Hugh Dickins, KAMEZAWA Hiroyuki, Johannes Weiner, Daisuke Nishimura These are patches about add_to_swap_cache(), related to the commit 355cfa73(mm: modify swap_map and add SWAP_HAS_CACHE flag). [BUGFIX][1/2] mm: add_to_swap_cache() must not sleep [cleanup][2/2] mm: add_to_swap_cache() does not return -EEXIST These are based on 2.6.31-rc5, but can be applied onto mmotm. Any comments or suggestions would be welcome. Thanks, Daisuke Nishimura. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* [BUGFIX][1/2] mm: add_to_swap_cache() must not sleep 2009-08-10 2:23 [RFC][PATCH 0/2] mm: some patches about add_to_swap_cache() Daisuke Nishimura @ 2009-08-10 2:26 ` Daisuke Nishimura 2009-08-10 3:16 ` KAMEZAWA Hiroyuki 2009-08-10 2:27 ` [cleanup][2/2] mm: add_to_swap_cache() does not return -EEXIST Daisuke Nishimura 1 sibling, 1 reply; 8+ messages in thread From: Daisuke Nishimura @ 2009-08-10 2:26 UTC (permalink / raw) To: LKML, linux-mm Cc: Andrew Morton, Balbir Singh, Hugh Dickins, KAMEZAWA Hiroyuki, Johannes Weiner, Daisuke Nishimura After commit 355cfa73(mm: modify swap_map and add SWAP_HAS_CACHE flag), read_swap_cache_async() will busy-wait while a entry doesn't on swap cache but it has SWAP_HAS_CACHE flag. Such entries can exist on add/delete path of swap cache. On add path, add_to_swap_cache() is called soon after SWAP_HAS_CACHE flag is set, and on delete path, swapcache_free() will be called (SWAP_HAS_CACHE flag is cleared) soon after __delete_from_swap_cache() is called. So, the busy-wait works well in most cases. But this mechanism can cause soft lockup if add_to_swap_cache() sleeps and read_swap_cache_async() tries to swap-in the same entry on the same cpu. add_to_swap() and shmem_writepage() call add_to_swap_cache() w/o __GFP_WAIT, but read_swap_cache_async() can call it w/ __GFP_WAIT, so it can cause soft lockup. This patch changes the gfp_mask of add_to_swap_cache() in read_swap_cache_async(). Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> --- mm/swap_state.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/mm/swap_state.c b/mm/swap_state.c index 42cd38e..3e6dd72 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -76,6 +76,7 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp_mask) VM_BUG_ON(!PageLocked(page)); VM_BUG_ON(PageSwapCache(page)); VM_BUG_ON(!PageSwapBacked(page)); + VM_BUG_ON(gfp_mask & __GFP_WAIT); error = radix_tree_preload(gfp_mask); if (!error) { @@ -307,7 +308,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, */ __set_page_locked(new_page); SetPageSwapBacked(new_page); - err = add_to_swap_cache(new_page, entry, gfp_mask & GFP_KERNEL); + err = add_to_swap_cache(new_page, entry, GFP_ATOMIC); if (likely(!err)) { /* * Initiate read into locked page and return. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BUGFIX][1/2] mm: add_to_swap_cache() must not sleep 2009-08-10 2:26 ` [BUGFIX][1/2] mm: add_to_swap_cache() must not sleep Daisuke Nishimura @ 2009-08-10 3:16 ` KAMEZAWA Hiroyuki 2009-08-10 5:49 ` Daisuke Nishimura 0 siblings, 1 reply; 8+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-08-10 3:16 UTC (permalink / raw) To: Daisuke Nishimura Cc: LKML, linux-mm, Andrew Morton, Balbir Singh, Hugh Dickins, Johannes Weiner On Mon, 10 Aug 2009 11:26:41 +0900 Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote: > After commit 355cfa73(mm: modify swap_map and add SWAP_HAS_CACHE flag), > read_swap_cache_async() will busy-wait while a entry doesn't on swap cache > but it has SWAP_HAS_CACHE flag. > > Such entries can exist on add/delete path of swap cache. > On add path, add_to_swap_cache() is called soon after SWAP_HAS_CACHE flag > is set, and on delete path, swapcache_free() will be called (SWAP_HAS_CACHE > flag is cleared) soon after __delete_from_swap_cache() is called. > So, the busy-wait works well in most cases. > yes. > But this mechanism can cause soft lockup if add_to_swap_cache() sleeps > and read_swap_cache_async() tries to swap-in the same entry on the same cpu. > Hmm.. > add_to_swap() and shmem_writepage() call add_to_swap_cache() w/o __GFP_WAIT, > but read_swap_cache_async() can call it w/ __GFP_WAIT, so it can cause > soft lockup. > > This patch changes the gfp_mask of add_to_swap_cache() in read_swap_cache_async(). > > Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Thank you for catching. Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> But Hm...I wonder whether this is the best fix. If I was you, I may do following. 1. remove radix_tree_preload() and gfp_mask from add_to_swapcache(). Then, rename it fo __add_to_swapcache(). Or, move swap_duplicate() into add_to_swapcache() with a new flag. 2. do things in following order. radix_tree_peload(); swap_duplicate(); # this never sleeps. add_to_swapcache() radix_tree_peload_end(); Good point of this approach is - we can use __GFP_WAIT in gfp_mask. - -ENOMEM means OOM, then, we should be aggressive to get a page. How do you think ? Thanks, -Kame > --- > mm/swap_state.c | 3 ++- > 1 files changed, 2 insertions(+), 1 deletions(-) > > diff --git a/mm/swap_state.c b/mm/swap_state.c > index 42cd38e..3e6dd72 100644 > --- a/mm/swap_state.c > +++ b/mm/swap_state.c > @@ -76,6 +76,7 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp_mask) > VM_BUG_ON(!PageLocked(page)); > VM_BUG_ON(PageSwapCache(page)); > VM_BUG_ON(!PageSwapBacked(page)); > + VM_BUG_ON(gfp_mask & __GFP_WAIT); > > error = radix_tree_preload(gfp_mask); > if (!error) { > @@ -307,7 +308,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, > */ > __set_page_locked(new_page); > SetPageSwapBacked(new_page); > - err = add_to_swap_cache(new_page, entry, gfp_mask & GFP_KERNEL); > + err = add_to_swap_cache(new_page, entry, GFP_ATOMIC); > if (likely(!err)) { > /* > * Initiate read into locked page and return. > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BUGFIX][1/2] mm: add_to_swap_cache() must not sleep 2009-08-10 3:16 ` KAMEZAWA Hiroyuki @ 2009-08-10 5:49 ` Daisuke Nishimura 2009-08-10 5:58 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 8+ messages in thread From: Daisuke Nishimura @ 2009-08-10 5:49 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: LKML, linux-mm, Andrew Morton, Balbir Singh, Hugh Dickins, Johannes Weiner, Daisuke Nishimura On Mon, 10 Aug 2009 12:16:44 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Mon, 10 Aug 2009 11:26:41 +0900 > Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote: > > > After commit 355cfa73(mm: modify swap_map and add SWAP_HAS_CACHE flag), > > read_swap_cache_async() will busy-wait while a entry doesn't on swap cache > > but it has SWAP_HAS_CACHE flag. > > > > Such entries can exist on add/delete path of swap cache. > > On add path, add_to_swap_cache() is called soon after SWAP_HAS_CACHE flag > > is set, and on delete path, swapcache_free() will be called (SWAP_HAS_CACHE > > flag is cleared) soon after __delete_from_swap_cache() is called. > > So, the busy-wait works well in most cases. > > > yes. > > > But this mechanism can cause soft lockup if add_to_swap_cache() sleeps > > and read_swap_cache_async() tries to swap-in the same entry on the same cpu. > > > Hmm.. > > > add_to_swap() and shmem_writepage() call add_to_swap_cache() w/o __GFP_WAIT, > > but read_swap_cache_async() can call it w/ __GFP_WAIT, so it can cause > > soft lockup. > > > > This patch changes the gfp_mask of add_to_swap_cache() in read_swap_cache_async(). > > > > Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> > > Thank you for catching. > Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > But Hm...I wonder whether this is the best fix. > > If I was you, I may do following. > > 1. remove radix_tree_preload() and gfp_mask from add_to_swapcache(). > Then, rename it fo __add_to_swapcache(). > Or, move swap_duplicate() into add_to_swapcache() with a new flag. > > 2. do things in following order. > > radix_tree_peload(); > swap_duplicate(); # this never sleeps. > add_to_swapcache() > radix_tree_peload_end(); > > Good point of this approach is > - we can use __GFP_WAIT in gfp_mask. > - -ENOMEM means OOM, then, we should be aggressive to get a page. > > How do you think ? > Thank you for your suggestion. It's a good idea. How about this one (Passed build test only) ? If it's good for you, I'll resend it removing RFC after a long term test, unless someone else tell me not to. === From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> After commit 355cfa73(mm: modify swap_map and add SWAP_HAS_CACHE flag), read_swap_cache_async() will busy-wait while a entry doesn't on swap cache but it has SWAP_HAS_CACHE flag. Such entries can exist on add/delete path of swap cache. On add path, add_to_swap_cache() is called soon after SWAP_HAS_CACHE flag is set, and on delete path, swapcache_free() will be called (SWAP_HAS_CACHE flag is cleared) soon after __delete_from_swap_cache() is called. So, the busy-wait works well in most cases. But this mechanism can cause soft lockup if add_to_swap_cache() sleeps and read_swap_cache_async() tries to swap-in the same entry on the same cpu. This patch calls radix_tree_preload() before swapcache_prepare() and divide add_to_swap_cache() into two part: radix_tree_preload() part and radix_tree_insert() part(define it as __add_to_swap_cache()). Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> --- mm/swap_state.c | 70 ++++++++++++++++++++++++++++++++++++------------------- 1 files changed, 46 insertions(+), 24 deletions(-) diff --git a/mm/swap_state.c b/mm/swap_state.c index 42cd38e..0313a13 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -66,10 +66,10 @@ void show_swap_cache_info(void) } /* - * add_to_swap_cache resembles add_to_page_cache_locked on swapper_space, + * __add_to_swap_cache resembles add_to_page_cache_locked on swapper_space, * but sets SwapCache flag and private instead of mapping and index. */ -int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp_mask) +static int __add_to_swap_cache(struct page *page, swp_entry_t entry) { int error; @@ -77,28 +77,37 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp_mask) VM_BUG_ON(PageSwapCache(page)); VM_BUG_ON(!PageSwapBacked(page)); + page_cache_get(page); + SetPageSwapCache(page); + set_page_private(page, entry.val); + + spin_lock_irq(&swapper_space.tree_lock); + error = radix_tree_insert(&swapper_space.page_tree, entry.val, page); + if (likely(!error)) { + total_swapcache_pages++; + __inc_zone_page_state(page, NR_FILE_PAGES); + INC_CACHE_INFO(add_total); + } + spin_unlock_irq(&swapper_space.tree_lock); + + if (unlikely(error)) { + set_page_private(page, 0UL); + ClearPageSwapCache(page); + page_cache_release(page); + } + + return error; +} + + +int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp_mask) +{ + int error; + error = radix_tree_preload(gfp_mask); if (!error) { - page_cache_get(page); - SetPageSwapCache(page); - set_page_private(page, entry.val); - - spin_lock_irq(&swapper_space.tree_lock); - error = radix_tree_insert(&swapper_space.page_tree, - entry.val, page); - if (likely(!error)) { - total_swapcache_pages++; - __inc_zone_page_state(page, NR_FILE_PAGES); - INC_CACHE_INFO(add_total); - } - spin_unlock_irq(&swapper_space.tree_lock); + error = __add_to_swap_cache(page, entry); radix_tree_preload_end(); - - if (unlikely(error)) { - set_page_private(page, 0UL); - ClearPageSwapCache(page); - page_cache_release(page); - } } return error; } @@ -289,13 +298,24 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, } /* + * call radix_tree_preload() while we can wait. + */ + err = radix_tree_preload(gfp_mask & GFP_KERNEL); + if (err) + break; + + /* * Swap entry may have been freed since our caller observed it. */ err = swapcache_prepare(entry); - if (err == -EEXIST) /* seems racy */ + if (err == -EEXIST) { /* seems racy */ + radix_tree_preload_end(); continue; - if (err) /* swp entry is obsolete ? */ + } + if (err) { /* swp entry is obsolete ? */ + radix_tree_preload_end(); break; + } /* * Associate the page with swap entry in the swap cache. @@ -307,8 +327,9 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, */ __set_page_locked(new_page); SetPageSwapBacked(new_page); - err = add_to_swap_cache(new_page, entry, gfp_mask & GFP_KERNEL); + err = __add_to_swap_cache(new_page, entry); if (likely(!err)) { + radix_tree_preload_end(); /* * Initiate read into locked page and return. */ @@ -316,6 +337,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, swap_readpage(new_page); return new_page; } + radix_tree_preload_end(); ClearPageSwapBacked(new_page); __clear_page_locked(new_page); swapcache_free(entry, NULL); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BUGFIX][1/2] mm: add_to_swap_cache() must not sleep 2009-08-10 5:49 ` Daisuke Nishimura @ 2009-08-10 5:58 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 8+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-08-10 5:58 UTC (permalink / raw) To: Daisuke Nishimura Cc: LKML, linux-mm, Andrew Morton, Balbir Singh, Hugh Dickins, Johannes Weiner On Mon, 10 Aug 2009 14:49:41 +0900 Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote: > Thank you for your suggestion. It's a good idea. > > How about this one (Passed build test only) ? > > If it's good for you, I'll resend it removing RFC after a long term test, > unless someone else tell me not to. > Seems very nice to me. I hope answers from swap-people is good.. Thanks, -Kame > === > From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> > > After commit 355cfa73(mm: modify swap_map and add SWAP_HAS_CACHE flag), > read_swap_cache_async() will busy-wait while a entry doesn't on swap cache > but it has SWAP_HAS_CACHE flag. > > Such entries can exist on add/delete path of swap cache. > On add path, add_to_swap_cache() is called soon after SWAP_HAS_CACHE flag > is set, and on delete path, swapcache_free() will be called (SWAP_HAS_CACHE > flag is cleared) soon after __delete_from_swap_cache() is called. > So, the busy-wait works well in most cases. > > But this mechanism can cause soft lockup if add_to_swap_cache() sleeps > and read_swap_cache_async() tries to swap-in the same entry on the same cpu. > > This patch calls radix_tree_preload() before swapcache_prepare() and divide > add_to_swap_cache() into two part: radix_tree_preload() part and > radix_tree_insert() part(define it as __add_to_swap_cache()). > > Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> > --- > mm/swap_state.c | 70 ++++++++++++++++++++++++++++++++++++------------------- > 1 files changed, 46 insertions(+), 24 deletions(-) > > diff --git a/mm/swap_state.c b/mm/swap_state.c > index 42cd38e..0313a13 100644 > --- a/mm/swap_state.c > +++ b/mm/swap_state.c > @@ -66,10 +66,10 @@ void show_swap_cache_info(void) > } > > /* > - * add_to_swap_cache resembles add_to_page_cache_locked on swapper_space, > + * __add_to_swap_cache resembles add_to_page_cache_locked on swapper_space, > * but sets SwapCache flag and private instead of mapping and index. > */ > -int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp_mask) > +static int __add_to_swap_cache(struct page *page, swp_entry_t entry) > { > int error; > > @@ -77,28 +77,37 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp_mask) > VM_BUG_ON(PageSwapCache(page)); > VM_BUG_ON(!PageSwapBacked(page)); > > + page_cache_get(page); > + SetPageSwapCache(page); > + set_page_private(page, entry.val); > + > + spin_lock_irq(&swapper_space.tree_lock); > + error = radix_tree_insert(&swapper_space.page_tree, entry.val, page); > + if (likely(!error)) { > + total_swapcache_pages++; > + __inc_zone_page_state(page, NR_FILE_PAGES); > + INC_CACHE_INFO(add_total); > + } > + spin_unlock_irq(&swapper_space.tree_lock); > + > + if (unlikely(error)) { > + set_page_private(page, 0UL); > + ClearPageSwapCache(page); > + page_cache_release(page); > + } > + > + return error; > +} > + > + > +int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp_mask) > +{ > + int error; > + > error = radix_tree_preload(gfp_mask); > if (!error) { > - page_cache_get(page); > - SetPageSwapCache(page); > - set_page_private(page, entry.val); > - > - spin_lock_irq(&swapper_space.tree_lock); > - error = radix_tree_insert(&swapper_space.page_tree, > - entry.val, page); > - if (likely(!error)) { > - total_swapcache_pages++; > - __inc_zone_page_state(page, NR_FILE_PAGES); > - INC_CACHE_INFO(add_total); > - } > - spin_unlock_irq(&swapper_space.tree_lock); > + error = __add_to_swap_cache(page, entry); > radix_tree_preload_end(); > - > - if (unlikely(error)) { > - set_page_private(page, 0UL); > - ClearPageSwapCache(page); > - page_cache_release(page); > - } > } > return error; > } > @@ -289,13 +298,24 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, > } > > /* > + * call radix_tree_preload() while we can wait. > + */ > + err = radix_tree_preload(gfp_mask & GFP_KERNEL); > + if (err) > + break; > + > + /* > * Swap entry may have been freed since our caller observed it. > */ > err = swapcache_prepare(entry); > - if (err == -EEXIST) /* seems racy */ > + if (err == -EEXIST) { /* seems racy */ > + radix_tree_preload_end(); > continue; > - if (err) /* swp entry is obsolete ? */ > + } > + if (err) { /* swp entry is obsolete ? */ > + radix_tree_preload_end(); > break; > + } > > /* > * Associate the page with swap entry in the swap cache. > @@ -307,8 +327,9 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, > */ > __set_page_locked(new_page); > SetPageSwapBacked(new_page); > - err = add_to_swap_cache(new_page, entry, gfp_mask & GFP_KERNEL); > + err = __add_to_swap_cache(new_page, entry); > if (likely(!err)) { > + radix_tree_preload_end(); > /* > * Initiate read into locked page and return. > */ > @@ -316,6 +337,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, > swap_readpage(new_page); > return new_page; > } > + radix_tree_preload_end(); > ClearPageSwapBacked(new_page); > __clear_page_locked(new_page); > swapcache_free(entry, NULL); > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* [cleanup][2/2] mm: add_to_swap_cache() does not return -EEXIST 2009-08-10 2:23 [RFC][PATCH 0/2] mm: some patches about add_to_swap_cache() Daisuke Nishimura 2009-08-10 2:26 ` [BUGFIX][1/2] mm: add_to_swap_cache() must not sleep Daisuke Nishimura @ 2009-08-10 2:27 ` Daisuke Nishimura 2009-08-10 3:19 ` KAMEZAWA Hiroyuki 2009-08-14 6:16 ` Daisuke Nishimura 1 sibling, 2 replies; 8+ messages in thread From: Daisuke Nishimura @ 2009-08-10 2:27 UTC (permalink / raw) To: LKML, linux-mm Cc: Andrew Morton, Balbir Singh, Hugh Dickins, KAMEZAWA Hiroyuki, Johannes Weiner, Daisuke Nishimura After commit 355cfa73(mm: modify swap_map and add SWAP_HAS_CACHE flag), only the context which have set SWAP_HAS_CACHE flag by swapcache_prepare() or get_swap_page() would call add_to_swap_cache(). So add_to_swap_cache() doesn't return -EEXIST any more. Even though it doesn't return -EEXIST, it's not a good behavior conceptually to call swapcache_prepare() in -EEXIST case, because it means clearing SWAP_HAS_CACHE flag while the entry is on swap cache. This patch removes redundant codes and comments from callers of it, and adds VM_BUG_ON() in error path of add_to_swap_cache() and some comments. Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> --- mm/shmem.c | 4 +++ mm/swap_state.c | 75 +++++++++++++++++++++++++++---------------------------- 2 files changed, 41 insertions(+), 38 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index d713239..c71ac6c 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1097,6 +1097,10 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) shmem_swp_unmap(entry); unlock: spin_unlock(&info->lock); + /* + * add_to_swap_cache() doesn't return -EEXIST, so we can safely + * clear SWAP_HAS_CACHE flag. + */ swapcache_free(swap, NULL); redirty: set_page_dirty(page); diff --git a/mm/swap_state.c b/mm/swap_state.c index 3e6dd72..e891208 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -96,6 +96,12 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp_mask) radix_tree_preload_end(); if (unlikely(error)) { + /* + * Only the context which have set SWAP_HAS_CACHE flag + * would call add_to_swap_cache(). + * So add_to_swap_cache() doesn't returns -EEXIST. + */ + VM_BUG_ON(error == -EEXIST); set_page_private(page, 0UL); ClearPageSwapCache(page); page_cache_release(page); @@ -137,38 +143,34 @@ int add_to_swap(struct page *page) VM_BUG_ON(!PageLocked(page)); VM_BUG_ON(!PageUptodate(page)); - for (;;) { - entry = get_swap_page(); - if (!entry.val) - return 0; + entry = get_swap_page(); + if (!entry.val) + return 0; + /* + * Radix-tree node allocations from PF_MEMALLOC contexts could + * completely exhaust the page allocator. __GFP_NOMEMALLOC + * stops emergency reserves from being allocated. + * + * TODO: this could cause a theoretical memory reclaim + * deadlock in the swap out path. + */ + /* + * Add it to the swap cache and mark it dirty + */ + err = add_to_swap_cache(page, entry, + __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN); + + if (!err) { /* Success */ + SetPageDirty(page); + return 1; + } else { /* -ENOMEM radix-tree allocation failure */ /* - * Radix-tree node allocations from PF_MEMALLOC contexts could - * completely exhaust the page allocator. __GFP_NOMEMALLOC - * stops emergency reserves from being allocated. - * - * TODO: this could cause a theoretical memory reclaim - * deadlock in the swap out path. - */ - /* - * Add it to the swap cache and mark it dirty + * add_to_swap_cache() doesn't return -EEXIST, so we can safely + * clear SWAP_HAS_CACHE flag. */ - err = add_to_swap_cache(page, entry, - __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN); - - switch (err) { - case 0: /* Success */ - SetPageDirty(page); - return 1; - case -EEXIST: - /* Raced with "speculative" read_swap_cache_async */ - swapcache_free(entry, NULL); - continue; - default: - /* -ENOMEM radix-tree allocation failure */ - swapcache_free(entry, NULL); - return 0; - } + swapcache_free(entry, NULL); + return 0; } } @@ -298,14 +300,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, if (err) /* swp entry is obsolete ? */ break; - /* - * Associate the page with swap entry in the swap cache. - * May fail (-EEXIST) if there is already a page associated - * with this entry in the swap cache: added by a racing - * read_swap_cache_async, or add_to_swap or shmem_writepage - * re-using the just freed swap entry for an existing page. - * May fail (-ENOMEM) if radix-tree node allocation failed. - */ + /* May fail (-ENOMEM) if radix-tree node allocation failed. */ __set_page_locked(new_page); SetPageSwapBacked(new_page); err = add_to_swap_cache(new_page, entry, GFP_ATOMIC); @@ -319,6 +314,10 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, } ClearPageSwapBacked(new_page); __clear_page_locked(new_page); + /* + * add_to_swap_cache() doesn't return -EEXIST, so we can safely + * clear SWAP_HAS_CACHE flag. + */ swapcache_free(entry, NULL); } while (err != -ENOMEM); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [cleanup][2/2] mm: add_to_swap_cache() does not return -EEXIST 2009-08-10 2:27 ` [cleanup][2/2] mm: add_to_swap_cache() does not return -EEXIST Daisuke Nishimura @ 2009-08-10 3:19 ` KAMEZAWA Hiroyuki 2009-08-14 6:16 ` Daisuke Nishimura 1 sibling, 0 replies; 8+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-08-10 3:19 UTC (permalink / raw) To: Daisuke Nishimura Cc: LKML, linux-mm, Andrew Morton, Balbir Singh, Hugh Dickins, Johannes Weiner On Mon, 10 Aug 2009 11:27:16 +0900 Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote: > After commit 355cfa73(mm: modify swap_map and add SWAP_HAS_CACHE flag), > only the context which have set SWAP_HAS_CACHE flag by swapcache_prepare() > or get_swap_page() would call add_to_swap_cache(). > So add_to_swap_cache() doesn't return -EEXIST any more. > > Even though it doesn't return -EEXIST, it's not a good behavior conceptually > to call swapcache_prepare() in -EEXIST case, because it means clearing > SWAP_HAS_CACHE flag while the entry is on swap cache. > > This patch removes redundant codes and comments from callers of it, and > adds VM_BUG_ON() in error path of add_to_swap_cache() and some comments. > > Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Nice! I've postponed this ;( Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > --- > mm/shmem.c | 4 +++ > mm/swap_state.c | 75 +++++++++++++++++++++++++++---------------------------- > 2 files changed, 41 insertions(+), 38 deletions(-) > > diff --git a/mm/shmem.c b/mm/shmem.c > index d713239..c71ac6c 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -1097,6 +1097,10 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) > shmem_swp_unmap(entry); > unlock: > spin_unlock(&info->lock); > + /* > + * add_to_swap_cache() doesn't return -EEXIST, so we can safely > + * clear SWAP_HAS_CACHE flag. > + */ > swapcache_free(swap, NULL); > redirty: > set_page_dirty(page); > diff --git a/mm/swap_state.c b/mm/swap_state.c > index 3e6dd72..e891208 100644 > --- a/mm/swap_state.c > +++ b/mm/swap_state.c > @@ -96,6 +96,12 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp_mask) > radix_tree_preload_end(); > > if (unlikely(error)) { > + /* > + * Only the context which have set SWAP_HAS_CACHE flag > + * would call add_to_swap_cache(). > + * So add_to_swap_cache() doesn't returns -EEXIST. > + */ > + VM_BUG_ON(error == -EEXIST); > set_page_private(page, 0UL); > ClearPageSwapCache(page); > page_cache_release(page); > @@ -137,38 +143,34 @@ int add_to_swap(struct page *page) > VM_BUG_ON(!PageLocked(page)); > VM_BUG_ON(!PageUptodate(page)); > > - for (;;) { > - entry = get_swap_page(); > - if (!entry.val) > - return 0; > + entry = get_swap_page(); > + if (!entry.val) > + return 0; > > + /* > + * Radix-tree node allocations from PF_MEMALLOC contexts could > + * completely exhaust the page allocator. __GFP_NOMEMALLOC > + * stops emergency reserves from being allocated. > + * > + * TODO: this could cause a theoretical memory reclaim > + * deadlock in the swap out path. > + */ > + /* > + * Add it to the swap cache and mark it dirty > + */ > + err = add_to_swap_cache(page, entry, > + __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN); > + > + if (!err) { /* Success */ > + SetPageDirty(page); > + return 1; > + } else { /* -ENOMEM radix-tree allocation failure */ > /* > - * Radix-tree node allocations from PF_MEMALLOC contexts could > - * completely exhaust the page allocator. __GFP_NOMEMALLOC > - * stops emergency reserves from being allocated. > - * > - * TODO: this could cause a theoretical memory reclaim > - * deadlock in the swap out path. > - */ > - /* > - * Add it to the swap cache and mark it dirty > + * add_to_swap_cache() doesn't return -EEXIST, so we can safely > + * clear SWAP_HAS_CACHE flag. > */ > - err = add_to_swap_cache(page, entry, > - __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN); > - > - switch (err) { > - case 0: /* Success */ > - SetPageDirty(page); > - return 1; > - case -EEXIST: > - /* Raced with "speculative" read_swap_cache_async */ > - swapcache_free(entry, NULL); > - continue; > - default: > - /* -ENOMEM radix-tree allocation failure */ > - swapcache_free(entry, NULL); > - return 0; > - } > + swapcache_free(entry, NULL); > + return 0; > } > } > > @@ -298,14 +300,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, > if (err) /* swp entry is obsolete ? */ > break; > > - /* > - * Associate the page with swap entry in the swap cache. > - * May fail (-EEXIST) if there is already a page associated > - * with this entry in the swap cache: added by a racing > - * read_swap_cache_async, or add_to_swap or shmem_writepage > - * re-using the just freed swap entry for an existing page. > - * May fail (-ENOMEM) if radix-tree node allocation failed. > - */ > + /* May fail (-ENOMEM) if radix-tree node allocation failed. */ > __set_page_locked(new_page); > SetPageSwapBacked(new_page); > err = add_to_swap_cache(new_page, entry, GFP_ATOMIC); > @@ -319,6 +314,10 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, > } > ClearPageSwapBacked(new_page); > __clear_page_locked(new_page); > + /* > + * add_to_swap_cache() doesn't return -EEXIST, so we can safely > + * clear SWAP_HAS_CACHE flag. > + */ > swapcache_free(entry, NULL); > } while (err != -ENOMEM); > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [cleanup][2/2] mm: add_to_swap_cache() does not return -EEXIST 2009-08-10 2:27 ` [cleanup][2/2] mm: add_to_swap_cache() does not return -EEXIST Daisuke Nishimura 2009-08-10 3:19 ` KAMEZAWA Hiroyuki @ 2009-08-14 6:16 ` Daisuke Nishimura 1 sibling, 0 replies; 8+ messages in thread From: Daisuke Nishimura @ 2009-08-14 6:16 UTC (permalink / raw) To: Andrew Morton Cc: Balbir Singh, Hugh Dickins, KAMEZAWA Hiroyuki, Johannes Weiner, Daisuke Nishimura, LKML, linux-mm Andrew, this is a minor fix for mm-add_to_swap_cache-does-not-return-eexist.patch. It didn't catch up with the change in [1/2](mm-add_to_swap_cache-must-not-sleep.patch). === From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> fix indent size. Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> --- mm/swap_state.c | 12 ++++++------ 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/mm/swap_state.c b/mm/swap_state.c index ff5bd8c..6d1daeb 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -92,12 +92,12 @@ static int __add_to_swap_cache(struct page *page, swp_entry_t entry) spin_unlock_irq(&swapper_space.tree_lock); if (unlikely(error)) { - /* - * Only the context which have set SWAP_HAS_CACHE flag - * would call add_to_swap_cache(). - * So add_to_swap_cache() doesn't returns -EEXIST. - */ - VM_BUG_ON(error == -EEXIST); + /* + * Only the context which have set SWAP_HAS_CACHE flag + * would call add_to_swap_cache(). + * So add_to_swap_cache() doesn't returns -EEXIST. + */ + VM_BUG_ON(error == -EEXIST); set_page_private(page, 0UL); ClearPageSwapCache(page); page_cache_release(page); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-08-14 6:20 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-08-10 2:23 [RFC][PATCH 0/2] mm: some patches about add_to_swap_cache() Daisuke Nishimura 2009-08-10 2:26 ` [BUGFIX][1/2] mm: add_to_swap_cache() must not sleep Daisuke Nishimura 2009-08-10 3:16 ` KAMEZAWA Hiroyuki 2009-08-10 5:49 ` Daisuke Nishimura 2009-08-10 5:58 ` KAMEZAWA Hiroyuki 2009-08-10 2:27 ` [cleanup][2/2] mm: add_to_swap_cache() does not return -EEXIST Daisuke Nishimura 2009-08-10 3:19 ` KAMEZAWA Hiroyuki 2009-08-14 6:16 ` Daisuke Nishimura
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox