linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: kanoj@google.engr.sgi.com (Kanoj Sarcar)
To: Andrea Arcangeli <andrea@suse.de>
Cc: Ben LaHaise <bcrl@redhat.com>,
	riel@nl.linux.org, Linus Torvalds <torvalds@transmeta.com>,
	linux-mm@kvack.org
Subject: Re: [patch] take 2 Re: PG_swap_entry bug in recent kernels
Date: Sat, 8 Apr 2000 16:18:53 -0700 (PDT)	[thread overview]
Message-ID: <200004082318.QAA60782@google.engr.sgi.com> (raw)
In-Reply-To: <Pine.LNX.4.21.0004090005020.18345-100000@alpha.random> from "Andrea Arcangeli" at Apr 09, 2000 01:02:05 AM

> 
> On Sat, 8 Apr 2000, Kanoj Sarcar wrote:
> 
> >shrink_mmap
> >--------------						__find_get_page
> >get pagemap_lru_lock					----------------
> >LockPage					
> >drop pagemap_lru_lock
> >Fail if page_count(page) > 1
> >get pagecache_lock
> >get_page
> >Fail if page_count(page) != 2
> >if PageSwapCache, drop pagecache_lock
> >							get pagecache_lock
> >							Finds page in swapcache,
> >								does get_page
> >							drop pagecache_lock
> >	and __delete_from_swap_cache,
> >	which releases PageLock.
> >							LockPage succeeds,
> >							erronesouly believes he
> >							has swapcache page.
> >
> >Did I miss some interlocking step that would prevent this from happening?
> 
> Oh, very good point indeed, I don't think you are missing anything. Thanks
> for showing me that!
> 
> It seems to me the only reason we was dropping the lock earlier for the
> swap cache was to be able to use the remove_inode_page and so avoding
> having to export a secondary remove_inode_page that doesn't grab the
> page_cache_lock. It looks the only reason was an implementation issue.

Agreed.

> 
> So IMHVO it would be nicer to change the locking in shrink_mmap() instead
> of putting the page-cache check in the swap cache lookup fast path. Swap
> cache and page cache are sharing the same locking rules w.r.t. the
> hashtable. That was the only exception as far I can tell and removing it
> would give us a cleaner design IMHO.

Yes, its easy enough to do, I am sure your attached patch does it properly.

Notwithstanding, I will repeat this again. Just because you change how
shrink_mmap deletes swapcache pages, does not make it a good idea to 
change lookup_swap_cache(). Lets first fix the swap cache deletion races,
then you can change lookup_swap_cache() not to lock the page ... which is
trivial anyway, once we think we have all the holes nailed.

I am off to polishing my previous swapoff patch. Andrea, it might make it 
easier to understand your previous patch if you were to send out a list of
the specific races it is fixing. (Something like my above race example).

Kanoj

> 
> What do you think about something like this?
> 
> diff -urN swap-entry-2/include/linux/mm.h swap-entry-3/include/linux/mm.h
> --- swap-entry-2/include/linux/mm.h	Sat Apr  8 19:16:28 2000
> +++ swap-entry-3/include/linux/mm.h	Sun Apr  9 00:18:43 2000
> @@ -449,6 +449,7 @@
>  struct zone_t;
>  /* filemap.c */
>  extern void remove_inode_page(struct page *);
> +extern void __remove_inode_page(struct page *);
>  extern unsigned long page_unuse(struct page *);
>  extern int shrink_mmap(int, int, zone_t *);
>  extern void truncate_inode_pages(struct address_space *, loff_t);
> diff -urN swap-entry-2/include/linux/swap.h swap-entry-3/include/linux/swap.h
> --- swap-entry-2/include/linux/swap.h	Sat Apr  8 18:08:37 2000
> +++ swap-entry-3/include/linux/swap.h	Sun Apr  9 00:47:42 2000
> @@ -105,6 +105,7 @@
>  /*
>   * Make these inline later once they are working properly.
>   */
> +extern void shrink_swap_cache(struct page *);
>  extern void unlink_from_swap_cache(struct page *);
>  extern void __delete_from_swap_cache(struct page *page);
>  extern void delete_from_swap_cache(struct page *page);
> diff -urN swap-entry-2/mm/filemap.c swap-entry-3/mm/filemap.c
> --- swap-entry-2/mm/filemap.c	Sat Apr  8 04:46:04 2000
> +++ swap-entry-3/mm/filemap.c	Sun Apr  9 00:39:23 2000
> @@ -77,6 +77,13 @@
>  	atomic_dec(&page_cache_size);
>  }
>  
> +inline void __remove_inode_page(struct page *page)
> +{
> +	remove_page_from_inode_queue(page);
> +	remove_page_from_hash_queue(page);
> +	page->mapping = NULL;
> +}
> +
>  /*
>   * Remove a page from the page cache and free it. Caller has to make
>   * sure the page is locked and that nobody else uses it - or that usage
> @@ -88,9 +95,7 @@
>  		PAGE_BUG(page);
>  
>  	spin_lock(&pagecache_lock);
> -	remove_page_from_inode_queue(page);
> -	remove_page_from_hash_queue(page);
> -	page->mapping = NULL;
> +	__remove_inode_page(page);
>  	spin_unlock(&pagecache_lock);
>  }
>  
> @@ -298,8 +303,8 @@
>  		 * were to be marked referenced..
>  		 */
>  		if (PageSwapCache(page)) {
> +			shrink_swap_cache(page);
>  			spin_unlock(&pagecache_lock);
> -			__delete_from_swap_cache(page);
>  			/* the page is local to us now */
>  			page->flags &= ~(1UL << PG_swap_entry);
>  			goto made_inode_progress;
> diff -urN swap-entry-2/mm/swap_state.c swap-entry-3/mm/swap_state.c
> --- swap-entry-2/mm/swap_state.c	Sat Apr  8 17:29:46 2000
> +++ swap-entry-3/mm/swap_state.c	Sun Apr  9 00:39:17 2000
> @@ -55,6 +55,19 @@
>  	return ret;
>  }
>  
> +static inline void __remove_from_swap_cache(struct page *page)
> +{
> +	struct address_space *mapping = page->mapping;
> +
> +	if (mapping != &swapper_space)
> +		BUG();
> +	if (!PageSwapCache(page) || !PageLocked(page))
> +		PAGE_BUG(page);
> +
> +	PageClearSwapCache(page);
> +	__remove_inode_page(page);
> +}
> +
>  static inline void remove_from_swap_cache(struct page *page)
>  {
>  	struct address_space *mapping = page->mapping;
> @@ -76,6 +89,20 @@
>  	lru_cache_del(page);
>  	remove_from_swap_cache(page);
>  	__free_page(page);
> +}
> +
> +/* called by shrink_mmap() with the page_cache_lock held */
> +void shrink_swap_cache(struct page *page)
> +{
> +	swp_entry_t entry;
> +
> +	entry.val = page->index;
> +
> +#ifdef SWAP_CACHE_INFO
> +	swap_cache_del_total++;
> +#endif
> +	__remove_from_swap_cache(page);
> +	swap_free(entry);
>  }
>  
>  /*
> 
> 
> The other option is to keep the checks in the lookup swap cache fast path.
> 
> Comments?
> 
> Andrea
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

  reply	other threads:[~2000-04-08 23:18 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2000-04-03 22:22 Ben LaHaise
2000-04-04 15:06 ` Andrea Arcangeli
2000-04-04 15:46   ` Rik van Riel
2000-04-04 16:50     ` Andrea Arcangeli
2000-04-04 17:06       ` Ben LaHaise
2000-04-04 18:03         ` Andrea Arcangeli
2000-04-06 22:11           ` [patch] take 2 " Ben LaHaise
2000-04-07 10:45             ` Andrea Arcangeli
2000-04-07 11:29               ` Rik van Riel
2000-04-07 12:00                 ` Andrea Arcangeli
2000-04-07 12:54                   ` Rik van Riel
2000-04-07 13:14                     ` Andrea Arcangeli
2000-04-07 20:12               ` Kanoj Sarcar
2000-04-07 23:26                 ` Andrea Arcangeli
2000-04-08  0:11                   ` Kanoj Sarcar
2000-04-08  0:37                     ` Kanoj Sarcar
2000-04-08 13:20                       ` Andrea Arcangeli
2000-04-08 21:39                         ` Kanoj Sarcar
2000-04-08 23:02                           ` Andrea Arcangeli
2000-04-08 23:18                             ` Kanoj Sarcar [this message]
2000-04-08 23:58                               ` Andrea Arcangeli
2000-04-08 13:30                     ` Andrea Arcangeli
2000-04-08 17:39                       ` Andrea Arcangeli
2000-04-07 23:54                 ` Andrea Arcangeli
2000-04-08  0:15                   ` Kanoj Sarcar
2000-04-08 13:14                     ` Andrea Arcangeli
2000-04-08 21:47                       ` Kanoj Sarcar
2000-04-08 23:10                         ` Andrea Arcangeli
2000-04-08 23:21                           ` Kanoj Sarcar
2000-04-08 23:39                             ` Andrea Arcangeli
2000-04-09  0:40                               ` Kanoj Sarcar
2000-04-10  8:55                                 ` andrea
2000-04-11  2:45                                   ` Kanoj Sarcar
2000-04-11 16:22                                     ` Andrea Arcangeli
2000-04-11 17:40                                       ` Rik van Riel
2000-04-11 18:20                                         ` Kanoj Sarcar
2000-04-21 18:23                                         ` Andrea Arcangeli
2000-04-21 21:00                                           ` Rik van Riel
2000-04-22  1:12                                             ` Andrea Arcangeli
2000-04-22  1:51                                               ` Linus Torvalds
2000-04-22 18:29                                                 ` Rik van Riel
2000-04-22 19:58                                                   ` Linus Torvalds
2000-04-11 18:26                                       ` Kanoj Sarcar
2000-04-10 19:10                         ` Stephen C. Tweedie
2000-04-08  0:04                 ` Andrea Arcangeli
     [not found] <yttem7xstk2.fsf@vexeta.dc.fi.udc.es>
2000-04-23  0:52 ` Andrea Arcangeli
     [not found] <yttk8ho26s8.fsf@vexeta.dc.fi.udc.es>
2000-04-23 16:07 ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200004082318.QAA60782@google.engr.sgi.com \
    --to=kanoj@google.engr.sgi.com \
    --cc=andrea@suse.de \
    --cc=bcrl@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=riel@nl.linux.org \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox