linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: Michael Leun <lkml20101129@newton.leun.net>,
	hughd@google.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: kernel BUG at mm/truncate.c:475!
Date: Mon, 13 Dec 2010 14:20:59 -0800	[thread overview]
Message-ID: <20101213142059.643f8080.akpm@linux-foundation.org> (raw)
In-Reply-To: <E1PRQDn-0007jZ-5S@pomaz-ex.szeredi.hu>

On Sat, 11 Dec 2010 15:14:47 +0100
Miklos Szeredi <miklos@szeredi.hu> wrote:

> On Mon, 6 Dec 2010, Michael Leun wrote:
> > At the moment I'm trying to create an easy to reproduce scenario.
> > 
> 
> I've managed to reproduce the BUG.  First I thought it has to do with
> fork() racing with invalidate_inode_pages2_range() but it turns out,
> just two parallel invocation of invalidate_inode_pages2_range() with
> some page faults going on can trigger it.
> 
> The problem is: unmap_mapping_range() is not prepared for more than
> one concurrent invocation per inode.  For example:
> 
>   thread1: going through a big range, stops in the middle of a vma and
>      stores the restart address in vm_truncate_count.
> 
>   thread2: comes in with a small (e.g. single page) unmap request on
>      the same vma, somewhere before restart_address, finds that the

"restart_addr", please.

>      vma was already unmapped up to the restart address and happily
>      returns without doing anything.
> 
> Another scenario would be two big unmap requests, both having to
> restart the unmapping and each one setting vm_truncate_count to its
> own value.  This could go on forever without any of them being able to
> finish.
> 
> Truncate and hole punching already serialize with i_mutex.  Other
> callers of unmap_mapping_range() do not, however, and I see difficulty
> with doing it in the callers.  I think the proper solution is to add
> serialization to unmap_mapping_range() itself.
> 
> Attached patch attempts to do this without adding more fields to
> struct address_space.  It fixes the bug in my testing.
> 

That's a pretty old bug, isn't it?  5+ years.

> 
> 
> ---
>  include/linux/pagemap.h |    1 +
>  mm/memory.c             |   14 ++++++++++++++
>  2 files changed, 15 insertions(+)
> 
> Index: linux.git/include/linux/pagemap.h
> ===================================================================
> --- linux.git.orig/include/linux/pagemap.h	2010-11-26 10:52:17.000000000 +0100
> +++ linux.git/include/linux/pagemap.h	2010-12-11 13:39:32.000000000 +0100
> @@ -24,6 +24,7 @@ enum mapping_flags {
>  	AS_ENOSPC	= __GFP_BITS_SHIFT + 1,	/* ENOSPC on async write */
>  	AS_MM_ALL_LOCKS	= __GFP_BITS_SHIFT + 2,	/* under mm_take_all_locks() */
>  	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
> +	AS_UNMAPPING	= __GFP_BITS_SHIFT + 4, /* for unmap_mapping_range() */
>  };
>  
>  static inline void mapping_set_error(struct address_space *mapping, int error)
> Index: linux.git/mm/memory.c
> ===================================================================
> --- linux.git.orig/mm/memory.c	2010-12-11 13:07:28.000000000 +0100
> +++ linux.git/mm/memory.c	2010-12-11 14:09:42.000000000 +0100
> @@ -2535,6 +2535,12 @@ static inline void unmap_mapping_range_l
>  	}
>  }
>  
> +static int mapping_sleep(void *x)
> +{
> +	schedule();
> +	return 0;
> +}
> +
>  /**
>   * unmap_mapping_range - unmap the portion of all mmaps in the specified address_space corresponding to the specified page range in the underlying file.
>   * @mapping: the address space containing mmaps to be unmapped.
> @@ -2572,6 +2578,9 @@ void unmap_mapping_range(struct address_
>  		details.last_index = ULONG_MAX;
>  	details.i_mmap_lock = &mapping->i_mmap_lock;
>  
> +	wait_on_bit_lock(&mapping->flags, AS_UNMAPPING, mapping_sleep,
> +			 TASK_UNINTERRUPTIBLE);
> +
>  	spin_lock(&mapping->i_mmap_lock);
>  
>  	/* Protect against endless unmapping loops */
> @@ -2588,6 +2597,11 @@ void unmap_mapping_range(struct address_
>  	if (unlikely(!list_empty(&mapping->i_mmap_nonlinear)))
>  		unmap_mapping_range_list(&mapping->i_mmap_nonlinear, &details);
>  	spin_unlock(&mapping->i_mmap_lock);
> +
> +	clear_bit_unlock(AS_UNMAPPING, &mapping->flags);
> +	smp_mb__after_clear_bit();
> +	wake_up_bit(&mapping->flags, AS_UNMAPPING);
> +

I do think this was premature optimisation.  The open-coded lock is
hidden from lockdep so we won't find out if this introduces potential
deadlocks.  It would be better to add a new mutex at least temporarily,
then look at replacing it with a MiklosLock later on, when the code is
bedded in.

At which time, replacing mutexes with MiklosLocks becomes part of a
general "shrink the address_space" exercise in which there's no reason
to exclusively concentrate on that new mutex!


How hard is it to avoid adding a new lock and using an existing one,
presumablt i_mutex?  Because if we can get i_mutex coverage over
unmap_mapping_range() then I suspect all the
vm_truncate_count/restart_addr stuff can go away?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2010-12-13 22:21 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20101130194945.58962c44@xenia.leun.net>
2010-11-30 23:00 ` Hugh Dickins
2010-12-01 10:25   ` Miklos Szeredi
2010-12-01 11:45     ` Michael Leun
2010-12-01 17:22       ` Miklos Szeredi
2010-12-02  7:41         ` Michael Leun
2010-12-02  8:15           ` Michael Leun
2010-12-02  9:42             ` Miklos Szeredi
2010-12-02 10:57               ` Michael Leun
2010-12-03  7:53                 ` Michael Leun
2010-12-06 12:36                   ` Miklos Szeredi
2010-12-06 19:43                     ` Michael Leun
2010-12-11 14:14                       ` Miklos Szeredi
2010-12-11 19:50                         ` Michael Leun
2010-12-13 22:20                         ` Andrew Morton [this message]
2010-12-14  7:31                           ` Hugh Dickins
2010-12-14  9:10                             ` Peter Zijlstra
2010-12-14  9:11                             ` Peter Zijlstra
2010-12-14 10:45                           ` Miklos Szeredi
2010-12-15  4:32                             ` Hugh Dickins
2010-12-15 11:22                               ` Miklos Szeredi
2010-12-16  0:26                                 ` Hugh Dickins
2010-12-16 14:50                               ` Robert Święcki
2010-12-16 19:05                                 ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101213142059.643f8080.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkml20101129@newton.leun.net \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox