linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: dio_get_page() lockdep complaints
       [not found] <20070419073828.GB20928@kernel.dk>
@ 2007-04-19  8:01 ` Andrew Morton
  2007-04-19  8:01   ` Jens Axboe
  2007-04-19 14:36   ` Chris Mason
       [not found] ` <1194627742.6289.175.camel@twins>
  1 sibling, 2 replies; 17+ messages in thread
From: Andrew Morton @ 2007-04-19  8:01 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-kernel, linux-aio, reiserfs-dev, Vladimir V. Saveliev, linux-mm

On Thu, 19 Apr 2007 09:38:30 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:

> Hi,
> 
> Doing some testing on CFQ, I ran into this 100% reproducible report:
> 
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.21-rc7 #5
> -------------------------------------------------------
> fio/9741 is trying to acquire lock:
>  (&mm->mmap_sem){----}, at: [<b018cb34>] dio_get_page+0x54/0x161
> 
> but task is already holding lock:
>  (&inode->i_mutex){--..}, at: [<b038c6e5>] mutex_lock+0x1c/0x1f
> 
> which lock already depends on the new lock.
> 

This is the correct ranking: i_mutex outside mmap_sem.

> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (&inode->i_mutex){--..}:
>        [<b013e3fb>] __lock_acquire+0xdee/0xf9c
>        [<b013e600>] lock_acquire+0x57/0x70
>        [<b038c4a5>] __mutex_lock_slowpath+0x73/0x297
>        [<b038c6e5>] mutex_lock+0x1c/0x1f
>        [<b01b17e9>] reiserfs_file_release+0x54/0x447
>        [<b016afe7>] __fput+0x53/0x101
>        [<b016b0ee>] fput+0x19/0x1c
>        [<b015bcd5>] remove_vma+0x3b/0x4d
>        [<b015c659>] do_munmap+0x17f/0x1cf
>        [<b015c6db>] sys_munmap+0x32/0x42
>        [<b0103f04>] sysenter_past_esp+0x5d/0x99
>        [<ffffffff>] 0xffffffff
> 
> -> #0 (&mm->mmap_sem){----}:
>        [<b013e259>] __lock_acquire+0xc4c/0xf9c
>        [<b013e600>] lock_acquire+0x57/0x70
>        [<b0137b92>] down_read+0x3a/0x4c
>        [<b018cb34>] dio_get_page+0x54/0x161
>        [<b018d7a9>] __blockdev_direct_IO+0x514/0xe2a
>        [<b01cf449>] ext3_direct_IO+0x98/0x1e5
>        [<b014e8df>] generic_file_direct_IO+0x63/0x133
>        [<b01500e9>] generic_file_aio_read+0x16b/0x222
>        [<b017f8b6>] aio_rw_vect_retry+0x5a/0x116
>        [<b0180147>] aio_run_iocb+0x69/0x129
>        [<b0180a78>] io_submit_one+0x194/0x2eb
>        [<b0181331>] sys_io_submit+0x92/0xe7
>        [<b0103f90>] syscall_call+0x7/0xb
>        [<ffffffff>] 0xffffffff

But here reiserfs is taking i_mutex in its file_operations.release(), which
can be called under mmap_sem.

Vladimir's recent de14569f94513279e3d44d9571a421e9da1759ae.  "resierfs:
avoid tail packing if an inode was ever mmapped" comes real close to this
code, but afaict it did not cause this bug.

I can't think of anything which we've done in the 2.6.21 cycle which would have
caused this to start happening.  Odd.


> The test run was fio, the job file used is:
> 
> # fio job file snip below
> [global]
> bs=4k
> buffered=0
> ioengine=libaio
> iodepth=4
> thread
> 
> [readers]
> numjobs=8
> size=128m
> rw=read
> # fio job file snip above
> 
> Filesystem was ext3, default mkfs and mount options. Kernel was
> 2.6.21-rc7 as of this morning, with some CFQ patches applied.
> 

It's interesting that lockdep learned the (wrong) ranking from a reiserfs
operation then later detected it being violated by ext3.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dio_get_page() lockdep complaints
  2007-04-19  8:01 ` dio_get_page() lockdep complaints Andrew Morton
@ 2007-04-19  8:01   ` Jens Axboe
  2007-04-19  8:25     ` Andrew Morton
  2007-04-19 14:36   ` Chris Mason
  1 sibling, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2007-04-19  8:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-aio, reiserfs-dev, Vladimir V. Saveliev, linux-mm

On Thu, Apr 19 2007, Andrew Morton wrote:
> On Thu, 19 Apr 2007 09:38:30 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> > Hi,
> > 
> > Doing some testing on CFQ, I ran into this 100% reproducible report:
> > 
> > =======================================================
> > [ INFO: possible circular locking dependency detected ]
> > 2.6.21-rc7 #5
> > -------------------------------------------------------
> > fio/9741 is trying to acquire lock:
> >  (&mm->mmap_sem){----}, at: [<b018cb34>] dio_get_page+0x54/0x161
> > 
> > but task is already holding lock:
> >  (&inode->i_mutex){--..}, at: [<b038c6e5>] mutex_lock+0x1c/0x1f
> > 
> > which lock already depends on the new lock.
> > 
> 
> This is the correct ranking: i_mutex outside mmap_sem.
> 
> > 
> > the existing dependency chain (in reverse order) is:
> > 
> > -> #1 (&inode->i_mutex){--..}:
> >        [<b013e3fb>] __lock_acquire+0xdee/0xf9c
> >        [<b013e600>] lock_acquire+0x57/0x70
> >        [<b038c4a5>] __mutex_lock_slowpath+0x73/0x297
> >        [<b038c6e5>] mutex_lock+0x1c/0x1f
> >        [<b01b17e9>] reiserfs_file_release+0x54/0x447
> >        [<b016afe7>] __fput+0x53/0x101
> >        [<b016b0ee>] fput+0x19/0x1c
> >        [<b015bcd5>] remove_vma+0x3b/0x4d
> >        [<b015c659>] do_munmap+0x17f/0x1cf
> >        [<b015c6db>] sys_munmap+0x32/0x42
> >        [<b0103f04>] sysenter_past_esp+0x5d/0x99
> >        [<ffffffff>] 0xffffffff
> > 
> > -> #0 (&mm->mmap_sem){----}:
> >        [<b013e259>] __lock_acquire+0xc4c/0xf9c
> >        [<b013e600>] lock_acquire+0x57/0x70
> >        [<b0137b92>] down_read+0x3a/0x4c
> >        [<b018cb34>] dio_get_page+0x54/0x161
> >        [<b018d7a9>] __blockdev_direct_IO+0x514/0xe2a
> >        [<b01cf449>] ext3_direct_IO+0x98/0x1e5
> >        [<b014e8df>] generic_file_direct_IO+0x63/0x133
> >        [<b01500e9>] generic_file_aio_read+0x16b/0x222
> >        [<b017f8b6>] aio_rw_vect_retry+0x5a/0x116
> >        [<b0180147>] aio_run_iocb+0x69/0x129
> >        [<b0180a78>] io_submit_one+0x194/0x2eb
> >        [<b0181331>] sys_io_submit+0x92/0xe7
> >        [<b0103f90>] syscall_call+0x7/0xb
> >        [<ffffffff>] 0xffffffff
> 
> But here reiserfs is taking i_mutex in its file_operations.release(),
> which can be called under mmap_sem.
> 
> Vladimir's recent de14569f94513279e3d44d9571a421e9da1759ae.
> "resierfs: avoid tail packing if an inode was ever mmapped" comes real
> close to this code, but afaict it did not cause this bug.
> 
> I can't think of anything which we've done in the 2.6.21 cycle which
> would have caused this to start happening.  Odd.

The bug may be holder, let me know if you want me to check 2.6.20 or
earlier.

> > The test run was fio, the job file used is:
> > 
> > # fio job file snip below
> > [global]
> > bs=4k
> > buffered=0
> > ioengine=libaio
> > iodepth=4
> > thread
> > 
> > [readers]
> > numjobs=8
> > size=128m
> > rw=read
> > # fio job file snip above
> > 
> > Filesystem was ext3, default mkfs and mount options. Kernel was
> > 2.6.21-rc7 as of this morning, with some CFQ patches applied.
> > 
> 
> It's interesting that lockdep learned the (wrong) ranking from a reiserfs
> operation then later detected it being violated by ext3.

It's a scratch test box, which for some reason has reiserfs as the
rootfs. So reiser gets to run first :-)

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dio_get_page() lockdep complaints
  2007-04-19  8:01   ` Jens Axboe
@ 2007-04-19  8:25     ` Andrew Morton
  2007-04-19  8:34       ` Jens Axboe
  2007-04-19 14:57       ` Vladimir V. Saveliev
  0 siblings, 2 replies; 17+ messages in thread
From: Andrew Morton @ 2007-04-19  8:25 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-kernel, linux-aio, reiserfs-dev, Vladimir V. Saveliev, linux-mm

On Thu, 19 Apr 2007 10:01:57 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:

> On Thu, Apr 19 2007, Andrew Morton wrote:
> > On Thu, 19 Apr 2007 09:38:30 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> > 
> > > Hi,
> > > 
> > > Doing some testing on CFQ, I ran into this 100% reproducible report:
> > > 
> > > =======================================================
> > > [ INFO: possible circular locking dependency detected ]
> > > 2.6.21-rc7 #5
> > > -------------------------------------------------------
> > > fio/9741 is trying to acquire lock:
> > >  (&mm->mmap_sem){----}, at: [<b018cb34>] dio_get_page+0x54/0x161
> > > 
> > > but task is already holding lock:
> > >  (&inode->i_mutex){--..}, at: [<b038c6e5>] mutex_lock+0x1c/0x1f
> > > 
> > > which lock already depends on the new lock.
> > > 
> > 
> > This is the correct ranking: i_mutex outside mmap_sem.
> > 
> > > 
> > > the existing dependency chain (in reverse order) is:
> > > 
> > > -> #1 (&inode->i_mutex){--..}:
> > >        [<b013e3fb>] __lock_acquire+0xdee/0xf9c
> > >        [<b013e600>] lock_acquire+0x57/0x70
> > >        [<b038c4a5>] __mutex_lock_slowpath+0x73/0x297
> > >        [<b038c6e5>] mutex_lock+0x1c/0x1f
> > >        [<b01b17e9>] reiserfs_file_release+0x54/0x447
> > >        [<b016afe7>] __fput+0x53/0x101
> > >        [<b016b0ee>] fput+0x19/0x1c
> > >        [<b015bcd5>] remove_vma+0x3b/0x4d
> > >        [<b015c659>] do_munmap+0x17f/0x1cf
> > >        [<b015c6db>] sys_munmap+0x32/0x42
> > >        [<b0103f04>] sysenter_past_esp+0x5d/0x99
> > >        [<ffffffff>] 0xffffffff
> > > 
> > > -> #0 (&mm->mmap_sem){----}:
> > >        [<b013e259>] __lock_acquire+0xc4c/0xf9c
> > >        [<b013e600>] lock_acquire+0x57/0x70
> > >        [<b0137b92>] down_read+0x3a/0x4c
> > >        [<b018cb34>] dio_get_page+0x54/0x161
> > >        [<b018d7a9>] __blockdev_direct_IO+0x514/0xe2a
> > >        [<b01cf449>] ext3_direct_IO+0x98/0x1e5
> > >        [<b014e8df>] generic_file_direct_IO+0x63/0x133
> > >        [<b01500e9>] generic_file_aio_read+0x16b/0x222
> > >        [<b017f8b6>] aio_rw_vect_retry+0x5a/0x116
> > >        [<b0180147>] aio_run_iocb+0x69/0x129
> > >        [<b0180a78>] io_submit_one+0x194/0x2eb
> > >        [<b0181331>] sys_io_submit+0x92/0xe7
> > >        [<b0103f90>] syscall_call+0x7/0xb
> > >        [<ffffffff>] 0xffffffff
> > 
> > But here reiserfs is taking i_mutex in its file_operations.release(),
> > which can be called under mmap_sem.
> > 
> > Vladimir's recent de14569f94513279e3d44d9571a421e9da1759ae.
> > "resierfs: avoid tail packing if an inode was ever mmapped" comes real
> > close to this code, but afaict it did not cause this bug.
> > 
> > I can't think of anything which we've done in the 2.6.21 cycle which
> > would have caused this to start happening.  Odd.
> 
> The bug may be holder, let me know if you want me to check 2.6.20 or
> earlier.

Would be great if you could test 2.6.20.  I have a feeling that I missed
something, but what?  We didn't change the refcounting of lifetime of
vma.vm_file...


> > > The test run was fio, the job file used is:
> > > 
> > > # fio job file snip below
> > > [global]
> > > bs=4k
> > > buffered=0
> > > ioengine=libaio
> > > iodepth=4
> > > thread
> > > 
> > > [readers]
> > > numjobs=8
> > > size=128m
> > > rw=read
> > > # fio job file snip above
> > > 
> > > Filesystem was ext3, default mkfs and mount options. Kernel was
> > > 2.6.21-rc7 as of this morning, with some CFQ patches applied.
> > > 
> > 
> > It's interesting that lockdep learned the (wrong) ranking from a reiserfs
> > operation then later detected it being violated by ext3.
> 
> It's a scratch test box, which for some reason has reiserfs as the
> rootfs. So reiser gets to run first :-)

direct-io reads against reiserfs also will take i_mutex outside mmap_sem. 
As will pagefaults inside generic_file_write() (which is where this ranking
is primarily defined).

So an all-reiserfs system should be getting the same reports.  Obviously,
that isn't happening.

It's a bit odd that reiserfs is playing with file contents within
file_operations.release(): there could be other files open against that
inode.  One would expect this sort of thing to be happening in an
inode_operation.  But it's been like that for a long time.

Is it possible that fio was changed?  That it was changed to close() the fd
before doing the munmapping whereas it used to hold the file open?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dio_get_page() lockdep complaints
  2007-04-19  8:25     ` Andrew Morton
@ 2007-04-19  8:34       ` Jens Axboe
  2007-04-19 12:43         ` Vladimir V. Saveliev
  2007-04-19 14:15         ` Jens Axboe
  2007-04-19 14:57       ` Vladimir V. Saveliev
  1 sibling, 2 replies; 17+ messages in thread
From: Jens Axboe @ 2007-04-19  8:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-aio, reiserfs-dev, Vladimir V. Saveliev, linux-mm

On Thu, Apr 19 2007, Andrew Morton wrote:
> On Thu, 19 Apr 2007 10:01:57 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> > On Thu, Apr 19 2007, Andrew Morton wrote:
> > > On Thu, 19 Apr 2007 09:38:30 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> > > 
> > > > Hi,
> > > > 
> > > > Doing some testing on CFQ, I ran into this 100% reproducible report:
> > > > 
> > > > =======================================================
> > > > [ INFO: possible circular locking dependency detected ]
> > > > 2.6.21-rc7 #5
> > > > -------------------------------------------------------
> > > > fio/9741 is trying to acquire lock:
> > > >  (&mm->mmap_sem){----}, at: [<b018cb34>] dio_get_page+0x54/0x161
> > > > 
> > > > but task is already holding lock:
> > > >  (&inode->i_mutex){--..}, at: [<b038c6e5>] mutex_lock+0x1c/0x1f
> > > > 
> > > > which lock already depends on the new lock.
> > > > 
> > > 
> > > This is the correct ranking: i_mutex outside mmap_sem.
> > > 
> > > > 
> > > > the existing dependency chain (in reverse order) is:
> > > > 
> > > > -> #1 (&inode->i_mutex){--..}:
> > > >        [<b013e3fb>] __lock_acquire+0xdee/0xf9c
> > > >        [<b013e600>] lock_acquire+0x57/0x70
> > > >        [<b038c4a5>] __mutex_lock_slowpath+0x73/0x297
> > > >        [<b038c6e5>] mutex_lock+0x1c/0x1f
> > > >        [<b01b17e9>] reiserfs_file_release+0x54/0x447
> > > >        [<b016afe7>] __fput+0x53/0x101
> > > >        [<b016b0ee>] fput+0x19/0x1c
> > > >        [<b015bcd5>] remove_vma+0x3b/0x4d
> > > >        [<b015c659>] do_munmap+0x17f/0x1cf
> > > >        [<b015c6db>] sys_munmap+0x32/0x42
> > > >        [<b0103f04>] sysenter_past_esp+0x5d/0x99
> > > >        [<ffffffff>] 0xffffffff
> > > > 
> > > > -> #0 (&mm->mmap_sem){----}:
> > > >        [<b013e259>] __lock_acquire+0xc4c/0xf9c
> > > >        [<b013e600>] lock_acquire+0x57/0x70
> > > >        [<b0137b92>] down_read+0x3a/0x4c
> > > >        [<b018cb34>] dio_get_page+0x54/0x161
> > > >        [<b018d7a9>] __blockdev_direct_IO+0x514/0xe2a
> > > >        [<b01cf449>] ext3_direct_IO+0x98/0x1e5
> > > >        [<b014e8df>] generic_file_direct_IO+0x63/0x133
> > > >        [<b01500e9>] generic_file_aio_read+0x16b/0x222
> > > >        [<b017f8b6>] aio_rw_vect_retry+0x5a/0x116
> > > >        [<b0180147>] aio_run_iocb+0x69/0x129
> > > >        [<b0180a78>] io_submit_one+0x194/0x2eb
> > > >        [<b0181331>] sys_io_submit+0x92/0xe7
> > > >        [<b0103f90>] syscall_call+0x7/0xb
> > > >        [<ffffffff>] 0xffffffff
> > > 
> > > But here reiserfs is taking i_mutex in its file_operations.release(),
> > > which can be called under mmap_sem.
> > > 
> > > Vladimir's recent de14569f94513279e3d44d9571a421e9da1759ae.
> > > "resierfs: avoid tail packing if an inode was ever mmapped" comes real
> > > close to this code, but afaict it did not cause this bug.
> > > 
> > > I can't think of anything which we've done in the 2.6.21 cycle which
> > > would have caused this to start happening.  Odd.
> > 
> > The bug may be holder, let me know if you want me to check 2.6.20 or
> > earlier.
> 
> Would be great if you could test 2.6.20.  I have a feeling that I missed
> something, but what?  We didn't change the refcounting of lifetime of
> vma.vm_file...

2.6.20.7 tested, same lockdep triggers. Attached for reference.

> > > > The test run was fio, the job file used is:
> > > > 
> > > > # fio job file snip below
> > > > [global]
> > > > bs=4k
> > > > buffered=0
> > > > ioengine=libaio
> > > > iodepth=4
> > > > thread
> > > > 
> > > > [readers]
> > > > numjobs=8
> > > > size=128m
> > > > rw=read
> > > > # fio job file snip above
> > > > 
> > > > Filesystem was ext3, default mkfs and mount options. Kernel was
> > > > 2.6.21-rc7 as of this morning, with some CFQ patches applied.
> > > > 
> > > 
> > > It's interesting that lockdep learned the (wrong) ranking from a reiserfs
> > > operation then later detected it being violated by ext3.
> > 
> > It's a scratch test box, which for some reason has reiserfs as the
> > rootfs. So reiser gets to run first :-)
> 
> direct-io reads against reiserfs also will take i_mutex outside mmap_sem. 
> As will pagefaults inside generic_file_write() (which is where this ranking
> is primarily defined).
> 
> So an all-reiserfs system should be getting the same reports.  Obviously,
> that isn't happening.
> 
> It's a bit odd that reiserfs is playing with file contents within
> file_operations.release(): there could be other files open against that
> inode.  One would expect this sort of thing to be happening in an
> inode_operation.  But it's been like that for a long time.
> 
> Is it possible that fio was changed?  That it was changed to close() the fd
> before doing the munmapping whereas it used to hold the file open?

It's been a while since I tested on this box, so I don't really recall.
But fio does close() the fd before doing munmap(). This particular test
case doesn't use mmap(), though.


=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.20.7 #1
-------------------------------------------------------
fio/6651 is trying to acquire lock:
 (&mm->mmap_sem){----}, at: [<b01899c4>] dio_get_page+0x54/0x161

but task is already holding lock:
 (&inode->i_mutex){--..}, at: [<b0385e85>] mutex_lock+0x1c/0x1f

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&inode->i_mutex){--..}:
       [<b013ba73>] __lock_acquire+0xc86/0xd64
       [<b013bba8>] lock_acquire+0x57/0x70
       [<b0385c45>] __mutex_lock_slowpath+0x73/0x297
       [<b0385e85>] mutex_lock+0x1c/0x1f
       [<b01ae3b5>] reiserfs_file_release+0x54/0x44b
       [<b0167b27>] __fput+0x53/0x101
       [<b0167c2e>] fput+0x19/0x1c
       [<b015884c>] remove_vma+0x37/0x49
       [<b01591d0>] do_munmap+0x17f/0x1d0
       [<b0159253>] sys_munmap+0x32/0x42
       [<b0102f04>] sysenter_past_esp+0x5d/0x99
       [<ffffffff>] 0xffffffff

-> #0 (&mm->mmap_sem){----}:
       [<b013b8f5>] __lock_acquire+0xb08/0xd64
       [<b013bba8>] lock_acquire+0x57/0x70
       [<b013701e>] down_read+0x3a/0x4c
       [<b01899c4>] dio_get_page+0x54/0x161
       [<b018a639>] __blockdev_direct_IO+0x514/0xe2a
       [<b01cc009>] ext3_direct_IO+0x98/0x1e5
       [<b014b72b>] generic_file_direct_IO+0x63/0x133
       [<b014cf79>] generic_file_aio_read+0x16b/0x222
       [<b017c466>] aio_rw_vect_retry+0x5a/0x116
       [<b017ccf7>] aio_run_iocb+0x69/0x129
       [<b017d6ed>] io_submit_one+0x194/0x2ec
       [<b017dffb>] sys_io_submit+0x92/0xe6
       [<b0102f90>] syscall_call+0x7/0xb
       [<ffffffff>] 0xffffffff

other info that might help us debug this:

1 lock held by fio/6651:
 #0:  (&inode->i_mutex){--..}, at: [<b0385e85>] mutex_lock+0x1c/0x1f

stack backtrace:
 [<b0103f54>] show_trace_log_lvl+0x1a/0x30
 [<b01045f6>] show_trace+0x12/0x14
 [<b010467d>] dump_stack+0x16/0x18
 [<b0139d29>] print_circular_bug_tail+0x68/0x71
 [<b013b8f5>] __lock_acquire+0xb08/0xd64
 [<b013bba8>] lock_acquire+0x57/0x70
 [<b013701e>] down_read+0x3a/0x4c
 [<b01899c4>] dio_get_page+0x54/0x161
 [<b018a639>] __blockdev_direct_IO+0x514/0xe2a
 [<b01cc009>] ext3_direct_IO+0x98/0x1e5
 [<b014b72b>] generic_file_direct_IO+0x63/0x133
 [<b014cf79>] generic_file_aio_read+0x16b/0x222
 [<b017c466>] aio_rw_vect_retry+0x5a/0x116
 [<b017ccf7>] aio_run_iocb+0x69/0x129
 [<b017d6ed>] io_submit_one+0x194/0x2ec
 [<b017dffb>] sys_io_submit+0x92/0xe6
 [<b0102f90>] syscall_call+0x7/0xb
 =======================

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dio_get_page() lockdep complaints
  2007-04-19  8:34       ` Jens Axboe
@ 2007-04-19 12:43         ` Vladimir V. Saveliev
  2007-04-19 12:49           ` Jens Axboe
  2007-04-19 14:15         ` Jens Axboe
  1 sibling, 1 reply; 17+ messages in thread
From: Vladimir V. Saveliev @ 2007-04-19 12:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, linux-kernel, linux-aio, reiserfs-dev, linux-mm

Hello

On Thursday 19 April 2007 12:34, Jens Axboe wrote:
> On Thu, Apr 19 2007, Andrew Morton wrote:
> > On Thu, 19 Apr 2007 10:01:57 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> > 
> > > On Thu, Apr 19 2007, Andrew Morton wrote:
> > > > On Thu, 19 Apr 2007 09:38:30 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> > > > 
> > > > > Hi,
> > > > > 
> > > > > Doing some testing on CFQ, I ran into this 100% reproducible report:
> > > > > 
> > > > > =======================================================
> > > > > [ INFO: possible circular locking dependency detected ]
> > > > > 2.6.21-rc7 #5
> > > > > -------------------------------------------------------
> > > > > fio/9741 is trying to acquire lock:
> > > > >  (&mm->mmap_sem){----}, at: [<b018cb34>] dio_get_page+0x54/0x161
> > > > > 
> > > > > but task is already holding lock:
> > > > >  (&inode->i_mutex){--..}, at: [<b038c6e5>] mutex_lock+0x1c/0x1f
> > > > > 
> > > > > which lock already depends on the new lock.
> > > > > 
> > > > 
> > > > This is the correct ranking: i_mutex outside mmap_sem.
> > > > 
> > > > > 
> > > > > the existing dependency chain (in reverse order) is:
> > > > > 
> > > > > -> #1 (&inode->i_mutex){--..}:
> > > > >        [<b013e3fb>] __lock_acquire+0xdee/0xf9c
> > > > >        [<b013e600>] lock_acquire+0x57/0x70
> > > > >        [<b038c4a5>] __mutex_lock_slowpath+0x73/0x297
> > > > >        [<b038c6e5>] mutex_lock+0x1c/0x1f
> > > > >        [<b01b17e9>] reiserfs_file_release+0x54/0x447
> > > > >        [<b016afe7>] __fput+0x53/0x101
> > > > >        [<b016b0ee>] fput+0x19/0x1c
> > > > >        [<b015bcd5>] remove_vma+0x3b/0x4d
> > > > >        [<b015c659>] do_munmap+0x17f/0x1cf
> > > > >        [<b015c6db>] sys_munmap+0x32/0x42
> > > > >        [<b0103f04>] sysenter_past_esp+0x5d/0x99
> > > > >        [<ffffffff>] 0xffffffff
> > > > > 
> > > > > -> #0 (&mm->mmap_sem){----}:
> > > > >        [<b013e259>] __lock_acquire+0xc4c/0xf9c
> > > > >        [<b013e600>] lock_acquire+0x57/0x70
> > > > >        [<b0137b92>] down_read+0x3a/0x4c
> > > > >        [<b018cb34>] dio_get_page+0x54/0x161
> > > > >        [<b018d7a9>] __blockdev_direct_IO+0x514/0xe2a
> > > > >        [<b01cf449>] ext3_direct_IO+0x98/0x1e5
> > > > >        [<b014e8df>] generic_file_direct_IO+0x63/0x133
> > > > >        [<b01500e9>] generic_file_aio_read+0x16b/0x222
> > > > >        [<b017f8b6>] aio_rw_vect_retry+0x5a/0x116
> > > > >        [<b0180147>] aio_run_iocb+0x69/0x129
> > > > >        [<b0180a78>] io_submit_one+0x194/0x2eb
> > > > >        [<b0181331>] sys_io_submit+0x92/0xe7
> > > > >        [<b0103f90>] syscall_call+0x7/0xb
> > > > >        [<ffffffff>] 0xffffffff
> > > > 
> > > > But here reiserfs is taking i_mutex in its file_operations.release(),
> > > > which can be called under mmap_sem.
> > > > 
> > > > Vladimir's recent de14569f94513279e3d44d9571a421e9da1759ae.
> > > > "resierfs: avoid tail packing if an inode was ever mmapped" comes real
> > > > close to this code, but afaict it did not cause this bug.
> > > > 
> > > > I can't think of anything which we've done in the 2.6.21 cycle which
> > > > would have caused this to start happening.  Odd.
> > > 
> > > The bug may be holder, let me know if you want me to check 2.6.20 or
> > > earlier.
> > 
> > Would be great if you could test 2.6.20.  I have a feeling that I missed
> > something, but what?  We didn't change the refcounting of lifetime of
> > vma.vm_file...
> 
> 2.6.20.7 tested, same lockdep triggers. Attached for reference.
> 

Did you have CFQ patches mentioned below applied?
Would you please send your .config?
I tried fio (1.15) with this job file and did not get the possible circular locking dependency detected

> > > > > The test run was fio, the job file used is:
> > > > > 
> > > > > # fio job file snip below
> > > > > [global]
> > > > > bs=4k
> > > > > buffered=0
> > > > > ioengine=libaio
> > > > > iodepth=4
> > > > > thread
> > > > > 
> > > > > [readers]
> > > > > numjobs=8
> > > > > size=128m
> > > > > rw=read
> > > > > # fio job file snip above
> > > > > 
> > > > > Filesystem was ext3, default mkfs and mount options. Kernel was
> > > > > 2.6.21-rc7 as of this morning, with some CFQ patches applied.
> > > > > 
> > > > 
> > > > It's interesting that lockdep learned the (wrong) ranking from a reiserfs
> > > > operation then later detected it being violated by ext3.
> > > 
> > > It's a scratch test box, which for some reason has reiserfs as the
> > > rootfs. So reiser gets to run first :-)
> > 
> > direct-io reads against reiserfs also will take i_mutex outside mmap_sem. 
> > As will pagefaults inside generic_file_write() (which is where this ranking
> > is primarily defined).
> > 
> > So an all-reiserfs system should be getting the same reports.  Obviously,
> > that isn't happening.
> > 
> > It's a bit odd that reiserfs is playing with file contents within
> > file_operations.release(): there could be other files open against that
> > inode.  One would expect this sort of thing to be happening in an
> > inode_operation.  But it's been like that for a long time.
> > 
> > Is it possible that fio was changed?  That it was changed to close() the fd
> > before doing the munmapping whereas it used to hold the file open?
> 
> It's been a while since I tested on this box, so I don't really recall.
> But fio does close() the fd before doing munmap(). This particular test
> case doesn't use mmap(), though.
> 
> 
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.20.7 #1
> -------------------------------------------------------
> fio/6651 is trying to acquire lock:
>  (&mm->mmap_sem){----}, at: [<b01899c4>] dio_get_page+0x54/0x161
> 
> but task is already holding lock:
>  (&inode->i_mutex){--..}, at: [<b0385e85>] mutex_lock+0x1c/0x1f
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (&inode->i_mutex){--..}:
>        [<b013ba73>] __lock_acquire+0xc86/0xd64
>        [<b013bba8>] lock_acquire+0x57/0x70
>        [<b0385c45>] __mutex_lock_slowpath+0x73/0x297
>        [<b0385e85>] mutex_lock+0x1c/0x1f
>        [<b01ae3b5>] reiserfs_file_release+0x54/0x44b
>        [<b0167b27>] __fput+0x53/0x101
>        [<b0167c2e>] fput+0x19/0x1c
>        [<b015884c>] remove_vma+0x37/0x49
>        [<b01591d0>] do_munmap+0x17f/0x1d0
>        [<b0159253>] sys_munmap+0x32/0x42
>        [<b0102f04>] sysenter_past_esp+0x5d/0x99
>        [<ffffffff>] 0xffffffff
> 
> -> #0 (&mm->mmap_sem){----}:
>        [<b013b8f5>] __lock_acquire+0xb08/0xd64
>        [<b013bba8>] lock_acquire+0x57/0x70
>        [<b013701e>] down_read+0x3a/0x4c
>        [<b01899c4>] dio_get_page+0x54/0x161
>        [<b018a639>] __blockdev_direct_IO+0x514/0xe2a
>        [<b01cc009>] ext3_direct_IO+0x98/0x1e5
>        [<b014b72b>] generic_file_direct_IO+0x63/0x133
>        [<b014cf79>] generic_file_aio_read+0x16b/0x222
>        [<b017c466>] aio_rw_vect_retry+0x5a/0x116
>        [<b017ccf7>] aio_run_iocb+0x69/0x129
>        [<b017d6ed>] io_submit_one+0x194/0x2ec
>        [<b017dffb>] sys_io_submit+0x92/0xe6
>        [<b0102f90>] syscall_call+0x7/0xb
>        [<ffffffff>] 0xffffffff
> 
> other info that might help us debug this:
> 
> 1 lock held by fio/6651:
>  #0:  (&inode->i_mutex){--..}, at: [<b0385e85>] mutex_lock+0x1c/0x1f
> 
> stack backtrace:
>  [<b0103f54>] show_trace_log_lvl+0x1a/0x30
>  [<b01045f6>] show_trace+0x12/0x14
>  [<b010467d>] dump_stack+0x16/0x18
>  [<b0139d29>] print_circular_bug_tail+0x68/0x71
>  [<b013b8f5>] __lock_acquire+0xb08/0xd64
>  [<b013bba8>] lock_acquire+0x57/0x70
>  [<b013701e>] down_read+0x3a/0x4c
>  [<b01899c4>] dio_get_page+0x54/0x161
>  [<b018a639>] __blockdev_direct_IO+0x514/0xe2a
>  [<b01cc009>] ext3_direct_IO+0x98/0x1e5
>  [<b014b72b>] generic_file_direct_IO+0x63/0x133
>  [<b014cf79>] generic_file_aio_read+0x16b/0x222
>  [<b017c466>] aio_rw_vect_retry+0x5a/0x116
>  [<b017ccf7>] aio_run_iocb+0x69/0x129
>  [<b017d6ed>] io_submit_one+0x194/0x2ec
>  [<b017dffb>] sys_io_submit+0x92/0xe6
>  [<b0102f90>] syscall_call+0x7/0xb
>  =======================
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dio_get_page() lockdep complaints
  2007-04-19 12:43         ` Vladimir V. Saveliev
@ 2007-04-19 12:49           ` Jens Axboe
  2007-04-19 12:52             ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2007-04-19 12:49 UTC (permalink / raw)
  To: Vladimir V. Saveliev
  Cc: Andrew Morton, linux-kernel, linux-aio, reiserfs-dev, linux-mm

[-- Attachment #1: Type: text/plain, Size: 4261 bytes --]

On Thu, Apr 19 2007, Vladimir V. Saveliev wrote:
> Hello
> 
> On Thursday 19 April 2007 12:34, Jens Axboe wrote:
> > On Thu, Apr 19 2007, Andrew Morton wrote:
> > > On Thu, 19 Apr 2007 10:01:57 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> > > 
> > > > On Thu, Apr 19 2007, Andrew Morton wrote:
> > > > > On Thu, 19 Apr 2007 09:38:30 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > > Doing some testing on CFQ, I ran into this 100% reproducible report:
> > > > > > 
> > > > > > =======================================================
> > > > > > [ INFO: possible circular locking dependency detected ]
> > > > > > 2.6.21-rc7 #5
> > > > > > -------------------------------------------------------
> > > > > > fio/9741 is trying to acquire lock:
> > > > > >  (&mm->mmap_sem){----}, at: [<b018cb34>] dio_get_page+0x54/0x161
> > > > > > 
> > > > > > but task is already holding lock:
> > > > > >  (&inode->i_mutex){--..}, at: [<b038c6e5>] mutex_lock+0x1c/0x1f
> > > > > > 
> > > > > > which lock already depends on the new lock.
> > > > > > 
> > > > > 
> > > > > This is the correct ranking: i_mutex outside mmap_sem.
> > > > > 
> > > > > > 
> > > > > > the existing dependency chain (in reverse order) is:
> > > > > > 
> > > > > > -> #1 (&inode->i_mutex){--..}:
> > > > > >        [<b013e3fb>] __lock_acquire+0xdee/0xf9c
> > > > > >        [<b013e600>] lock_acquire+0x57/0x70
> > > > > >        [<b038c4a5>] __mutex_lock_slowpath+0x73/0x297
> > > > > >        [<b038c6e5>] mutex_lock+0x1c/0x1f
> > > > > >        [<b01b17e9>] reiserfs_file_release+0x54/0x447
> > > > > >        [<b016afe7>] __fput+0x53/0x101
> > > > > >        [<b016b0ee>] fput+0x19/0x1c
> > > > > >        [<b015bcd5>] remove_vma+0x3b/0x4d
> > > > > >        [<b015c659>] do_munmap+0x17f/0x1cf
> > > > > >        [<b015c6db>] sys_munmap+0x32/0x42
> > > > > >        [<b0103f04>] sysenter_past_esp+0x5d/0x99
> > > > > >        [<ffffffff>] 0xffffffff
> > > > > > 
> > > > > > -> #0 (&mm->mmap_sem){----}:
> > > > > >        [<b013e259>] __lock_acquire+0xc4c/0xf9c
> > > > > >        [<b013e600>] lock_acquire+0x57/0x70
> > > > > >        [<b0137b92>] down_read+0x3a/0x4c
> > > > > >        [<b018cb34>] dio_get_page+0x54/0x161
> > > > > >        [<b018d7a9>] __blockdev_direct_IO+0x514/0xe2a
> > > > > >        [<b01cf449>] ext3_direct_IO+0x98/0x1e5
> > > > > >        [<b014e8df>] generic_file_direct_IO+0x63/0x133
> > > > > >        [<b01500e9>] generic_file_aio_read+0x16b/0x222
> > > > > >        [<b017f8b6>] aio_rw_vect_retry+0x5a/0x116
> > > > > >        [<b0180147>] aio_run_iocb+0x69/0x129
> > > > > >        [<b0180a78>] io_submit_one+0x194/0x2eb
> > > > > >        [<b0181331>] sys_io_submit+0x92/0xe7
> > > > > >        [<b0103f90>] syscall_call+0x7/0xb
> > > > > >        [<ffffffff>] 0xffffffff
> > > > > 
> > > > > But here reiserfs is taking i_mutex in its file_operations.release(),
> > > > > which can be called under mmap_sem.
> > > > > 
> > > > > Vladimir's recent de14569f94513279e3d44d9571a421e9da1759ae.
> > > > > "resierfs: avoid tail packing if an inode was ever mmapped" comes real
> > > > > close to this code, but afaict it did not cause this bug.
> > > > > 
> > > > > I can't think of anything which we've done in the 2.6.21 cycle which
> > > > > would have caused this to start happening.  Odd.
> > > > 
> > > > The bug may be holder, let me know if you want me to check 2.6.20 or
> > > > earlier.
> > > 
> > > Would be great if you could test 2.6.20.  I have a feeling that I missed
> > > something, but what?  We didn't change the refcounting of lifetime of
> > > vma.vm_file...
> > 
> > 2.6.20.7 tested, same lockdep triggers. Attached for reference.
> > 
> 
> Did you have CFQ patches mentioned below applied?

Nope, stock 2.6.20.7. The CFQ patches should not make a difference,
unless I royally screwed something up :-)

> Would you please send your .config?

Attached. It's the 2.6.21-rc7 config, for 2.6.20.7 I just did a make
oldconfig, the options that showed up should not impact anything.

> I tried fio (1.15) with this job file and did not get the possible
> circular locking dependency detected

Perhaps some of the preempt settings? The box is an emc centera, it's a
lowly p4/ht.

-- 
Jens Axboe


[-- Attachment #2: centera-2.6.21-rc7 --]
[-- Type: text/plain, Size: 27012 bytes --]

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.21-rc7
# Thu Apr 19 10:09:15 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
# CONFIG_IPC_NS is not set
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_POSIX_MQUEUE is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_UTS_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_CPUSETS is not set
CONFIG_SYSFS_DEPRECATED=y
CONFIG_RELAY=y
# CONFIG_BLK_DEV_INITRD is not set
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_EMBEDDED=y
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_SLAB=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
# CONFIG_SLOB is not set

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y
CONFIG_STOP_MACHINE=y

#
# Block layer
#
CONFIG_BLOCK=y
CONFIG_LBD=y
CONFIG_BLK_DEV_IO_TRACE=y
# CONFIG_LSF is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"

#
# Processor type and features
#
# CONFIG_TICK_ONESHOT is not set
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
CONFIG_SMP=y
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_PARAVIRT is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MCORE2 is not set
CONFIG_MPENTIUM4=y
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=7
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
# CONFIG_HPET_TIMER is not set
CONFIG_NR_CPUS=4
# CONFIG_SCHED_SMT is not set
# CONFIG_SCHED_MC is not set
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_BKL=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_NONFATAL=y
CONFIG_X86_MCE_P4THERMAL=y
CONFIG_VM86=y
# CONFIG_TOSHIBA is not set
# CONFIG_I8K is not set
# CONFIG_X86_REBOOTFIXUPS is not set
# CONFIG_MICROCODE is not set
# CONFIG_X86_MSR is not set
# CONFIG_X86_CPUID is not set

#
# Firmware Drivers
#
# CONFIG_EDD is not set
# CONFIG_DELL_RBU is not set
# CONFIG_DCDBAS is not set
CONFIG_NOHIGHMEM=y
# CONFIG_HIGHMEM4G is not set
# CONFIG_HIGHMEM64G is not set
# CONFIG_VMSPLIT_3G is not set
CONFIG_VMSPLIT_3G_OPT=y
# CONFIG_VMSPLIT_2G is not set
# CONFIG_VMSPLIT_1G is not set
CONFIG_PAGE_OFFSET=0xB0000000
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
# CONFIG_DISCONTIGMEM_MANUAL is not set
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_SPARSEMEM_STATIC=y
CONFIG_SPLIT_PTLOCK_CPUS=4
# CONFIG_RESOURCES_64BIT is not set
CONFIG_ZONE_DMA_FLAG=1
# CONFIG_MATH_EMULATION is not set
CONFIG_MTRR=y
# CONFIG_EFI is not set
CONFIG_IRQBALANCE=y
# CONFIG_SECCOMP is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
# CONFIG_KEXEC is not set
CONFIG_PHYSICAL_START=0x100000
# CONFIG_RELOCATABLE is not set
CONFIG_PHYSICAL_ALIGN=0x100000
CONFIG_HOTPLUG_CPU=y
CONFIG_COMPAT_VDSO=y

#
# Power management options (ACPI, APM)
#
CONFIG_PM=y
CONFIG_PM_LEGACY=y
# CONFIG_PM_DEBUG is not set
# CONFIG_PM_SYSFS_DEPRECATED is not set
CONFIG_SOFTWARE_SUSPEND=y
CONFIG_PM_STD_PARTITION="/dev/sda1"
CONFIG_SUSPEND_SMP=y

#
# ACPI (Advanced Configuration and Power Interface) Support
#
CONFIG_ACPI=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_SLEEP_PROC_FS=y
# CONFIG_ACPI_SLEEP_PROC_SLEEP is not set
CONFIG_ACPI_PROCFS=y
CONFIG_ACPI_AC=m
CONFIG_ACPI_BATTERY=m
CONFIG_ACPI_BUTTON=m
CONFIG_ACPI_VIDEO=m
CONFIG_ACPI_FAN=m
# CONFIG_ACPI_DOCK is not set
CONFIG_ACPI_PROCESSOR=m
CONFIG_ACPI_HOTPLUG_CPU=y
CONFIG_ACPI_THERMAL=m
CONFIG_ACPI_ASUS=m
CONFIG_ACPI_IBM=m
# CONFIG_ACPI_IBM_DOCK is not set
# CONFIG_ACPI_IBM_BAY is not set
CONFIG_ACPI_TOSHIBA=m
CONFIG_ACPI_BLACKLIST_YEAR=2000
# CONFIG_ACPI_DEBUG is not set
CONFIG_ACPI_EC=y
CONFIG_ACPI_POWER=y
CONFIG_ACPI_SYSTEM=y
CONFIG_X86_PM_TIMER=y
CONFIG_ACPI_CONTAINER=m

#
# APM (Advanced Power Management) BIOS Support
#
# CONFIG_APM is not set

#
# CPU Frequency scaling
#
# CONFIG_CPU_FREQ is not set

#
# Bus options (PCI, PCMCIA, EISA, MCA, ISA)
#
CONFIG_PCI=y
# CONFIG_PCI_GOBIOS is not set
# CONFIG_PCI_GOMMCONFIG is not set
# CONFIG_PCI_GODIRECT is not set
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCIEPORTBUS=y
# CONFIG_HOTPLUG_PCI_PCIE is not set
# CONFIG_PCIEAER is not set
# CONFIG_PCI_MSI is not set
# CONFIG_PCI_DEBUG is not set
# CONFIG_HT_IRQ is not set
CONFIG_ISA_DMA_API=y
CONFIG_ISA=y
# CONFIG_EISA is not set
# CONFIG_MCA is not set
# CONFIG_SCx200 is not set

#
# PCCARD (PCMCIA/CardBus) support
#
# CONFIG_PCCARD is not set

#
# PCI Hotplug Support
#
CONFIG_HOTPLUG_PCI=y
# CONFIG_HOTPLUG_PCI_FAKE is not set
# CONFIG_HOTPLUG_PCI_COMPAQ is not set
# CONFIG_HOTPLUG_PCI_IBM is not set
# CONFIG_HOTPLUG_PCI_ACPI is not set
# CONFIG_HOTPLUG_PCI_CPCI is not set
# CONFIG_HOTPLUG_PCI_SHPC is not set

#
# Executable file formats
#
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_AOUT=y
CONFIG_BINFMT_MISC=y

#
# Networking
#
CONFIG_NET=y

#
# Networking options
#
# CONFIG_NETDEBUG is not set
CONFIG_PACKET=y
# CONFIG_PACKET_MMAP is not set
CONFIG_UNIX=y
# CONFIG_NET_KEY is not set
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
# CONFIG_IP_ADVANCED_ROUTER is not set
CONFIG_IP_FIB_HASH=y
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_IP_MROUTE is not set
# CONFIG_ARPD is not set
# CONFIG_SYN_COOKIES is not set
# CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set
# CONFIG_INET_IPCOMP is not set
# CONFIG_INET_XFRM_TUNNEL is not set
# CONFIG_INET_TUNNEL is not set
# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
# CONFIG_INET_XFRM_MODE_TUNNEL is not set
# CONFIG_INET_XFRM_MODE_BEET is not set
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
# CONFIG_TCP_CONG_ADVANCED is not set
CONFIG_TCP_CONG_CUBIC=y
CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
# CONFIG_IPV6 is not set
# CONFIG_INET6_XFRM_TUNNEL is not set
# CONFIG_INET6_TUNNEL is not set
# CONFIG_NETWORK_SECMARK is not set
# CONFIG_NETFILTER is not set

#
# DCCP Configuration (EXPERIMENTAL)
#
# CONFIG_IP_DCCP is not set

#
# SCTP Configuration (EXPERIMENTAL)
#
# CONFIG_IP_SCTP is not set

#
# TIPC Configuration (EXPERIMENTAL)
#
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
# CONFIG_BRIDGE is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set

#
# QoS and/or fair queueing
#
# CONFIG_NET_SCHED is not set

#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_HAMRADIO is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
# CONFIG_IEEE80211 is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
# CONFIG_FW_LOADER is not set
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
# CONFIG_SYS_HYPERVISOR is not set

#
# Connector - unified userspace <-> kernelspace linker
#
# CONFIG_CONNECTOR is not set

#
# Memory Technology Devices (MTD)
#
# CONFIG_MTD is not set

#
# Parallel port support
#
# CONFIG_PARPORT is not set

#
# Plug and Play support
#
CONFIG_PNP=y
# CONFIG_PNP_DEBUG is not set

#
# Protocols
#
# CONFIG_ISAPNP is not set
# CONFIG_PNPBIOS is not set
CONFIG_PNPACPI=y

#
# Block devices
#
# CONFIG_BLK_DEV_FD is not set
# CONFIG_BLK_DEV_XD is not set
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=m
# CONFIG_BLK_DEV_CRYPTOLOOP is not set
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_SX8 is not set
# CONFIG_BLK_DEV_RAM is not set
# CONFIG_CDROM_PKTCDVD is not set
# CONFIG_ATA_OVER_ETH is not set

#
# Misc devices
#
# CONFIG_IBM_ASM is not set
# CONFIG_SGI_IOC4 is not set
# CONFIG_TIFM_CORE is not set
# CONFIG_MSI_LAPTOP is not set
# CONFIG_SONY_LAPTOP is not set

#
# ATA/ATAPI/MFM/RLL support
#
# CONFIG_IDE is not set

#
# SCSI device support
#
# CONFIG_RAID_ATTRS is not set
CONFIG_SCSI=y
# CONFIG_SCSI_TGT is not set
# CONFIG_SCSI_NETLINK is not set
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
# CONFIG_CHR_DEV_ST is not set
# CONFIG_CHR_DEV_OSST is not set
# CONFIG_BLK_DEV_SR is not set
CONFIG_CHR_DEV_SG=y
# CONFIG_CHR_DEV_SCH is not set

#
# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
#
# CONFIG_SCSI_MULTI_LUN is not set
# CONFIG_SCSI_CONSTANTS is not set
# CONFIG_SCSI_LOGGING is not set
# CONFIG_SCSI_SCAN_ASYNC is not set

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=y
# CONFIG_SCSI_FC_ATTRS is not set
# CONFIG_SCSI_ISCSI_ATTRS is not set
# CONFIG_SCSI_SAS_ATTRS is not set
# CONFIG_SCSI_SAS_LIBSAS is not set

#
# SCSI low-level drivers
#
# CONFIG_ISCSI_TCP is not set
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_7000FASST is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AHA152X is not set
# CONFIG_SCSI_AHA1542 is not set
# CONFIG_SCSI_AACRAID is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC7XXX_OLD is not set
# CONFIG_SCSI_AIC79XX is not set
# CONFIG_SCSI_AIC94XX is not set
# CONFIG_SCSI_DPT_I2O is not set
# CONFIG_SCSI_ADVANSYS is not set
# CONFIG_SCSI_IN2000 is not set
# CONFIG_SCSI_ARCMSR is not set
# CONFIG_MEGARAID_NEWGEN is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
# CONFIG_SCSI_HPTIOP is not set
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_DTC3280 is not set
# CONFIG_SCSI_EATA is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
# CONFIG_SCSI_GDTH is not set
# CONFIG_SCSI_GENERIC_NCR5380 is not set
# CONFIG_SCSI_GENERIC_NCR5380_MMIO is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_NCR53C406A is not set
# CONFIG_SCSI_STEX is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_IPR is not set
# CONFIG_SCSI_PAS16 is not set
# CONFIG_SCSI_PSI240I is not set
# CONFIG_SCSI_QLOGIC_FAS is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
# CONFIG_SCSI_QLA_FC is not set
# CONFIG_SCSI_QLA_ISCSI is not set
# CONFIG_SCSI_LPFC is not set
# CONFIG_SCSI_SEAGATE is not set
# CONFIG_SCSI_SYM53C416 is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_DC390T is not set
# CONFIG_SCSI_T128 is not set
# CONFIG_SCSI_U14_34F is not set
# CONFIG_SCSI_ULTRASTOR is not set
# CONFIG_SCSI_NSP32 is not set
CONFIG_SCSI_DEBUG=m
# CONFIG_SCSI_SRP is not set

#
# Serial ATA (prod) and Parallel ATA (experimental) drivers
#
CONFIG_ATA=y
# CONFIG_ATA_NONSTANDARD is not set
CONFIG_SATA_AHCI=y
# CONFIG_SATA_SVW is not set
# CONFIG_ATA_PIIX is not set
# CONFIG_SATA_MV is not set
# CONFIG_SATA_NV is not set
# CONFIG_PDC_ADMA is not set
# CONFIG_SATA_QSTOR is not set
# CONFIG_SATA_PROMISE is not set
# CONFIG_SATA_SX4 is not set
# CONFIG_SATA_SIL is not set
# CONFIG_SATA_SIL24 is not set
# CONFIG_SATA_SIS is not set
# CONFIG_SATA_ULI is not set
# CONFIG_SATA_VIA is not set
# CONFIG_SATA_VITESSE is not set
# CONFIG_SATA_INIC162X is not set
CONFIG_SATA_ACPI=y
# CONFIG_PATA_ALI is not set
# CONFIG_PATA_AMD is not set
# CONFIG_PATA_ARTOP is not set
# CONFIG_PATA_ATIIXP is not set
# CONFIG_PATA_CMD64X is not set
# CONFIG_PATA_CS5520 is not set
# CONFIG_PATA_CS5530 is not set
# CONFIG_PATA_CS5535 is not set
# CONFIG_PATA_CYPRESS is not set
# CONFIG_PATA_EFAR is not set
# CONFIG_ATA_GENERIC is not set
# CONFIG_PATA_HPT366 is not set
# CONFIG_PATA_HPT37X is not set
# CONFIG_PATA_HPT3X2N is not set
# CONFIG_PATA_HPT3X3 is not set
# CONFIG_PATA_IT821X is not set
# CONFIG_PATA_IT8213 is not set
# CONFIG_PATA_JMICRON is not set
# CONFIG_PATA_LEGACY is not set
# CONFIG_PATA_TRIFLEX is not set
# CONFIG_PATA_MARVELL is not set
# CONFIG_PATA_MPIIX is not set
# CONFIG_PATA_OLDPIIX is not set
# CONFIG_PATA_NETCELL is not set
# CONFIG_PATA_NS87410 is not set
# CONFIG_PATA_OPTI is not set
# CONFIG_PATA_OPTIDMA is not set
# CONFIG_PATA_PDC_OLD is not set
# CONFIG_PATA_QDI is not set
# CONFIG_PATA_RADISYS is not set
# CONFIG_PATA_RZ1000 is not set
# CONFIG_PATA_SC1200 is not set
# CONFIG_PATA_SERVERWORKS is not set
# CONFIG_PATA_PDC2027X is not set
# CONFIG_PATA_SIL680 is not set
# CONFIG_PATA_SIS is not set
# CONFIG_PATA_VIA is not set
# CONFIG_PATA_WINBOND is not set
# CONFIG_PATA_WINBOND_VLB is not set
# CONFIG_PATA_PLATFORM is not set

#
# Old CD-ROM drivers (not SCSI, not IDE)
#
# CONFIG_CD_NO_IDESCSI is not set

#
# Multi-device support (RAID and LVM)
#
CONFIG_MD=y
CONFIG_BLK_DEV_MD=m
# CONFIG_MD_LINEAR is not set
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
# CONFIG_MD_RAID10 is not set
# CONFIG_MD_RAID456 is not set
# CONFIG_MD_MULTIPATH is not set
# CONFIG_MD_FAULTY is not set
# CONFIG_BLK_DEV_DM is not set

#
# Fusion MPT device support
#
# CONFIG_FUSION is not set
# CONFIG_FUSION_SPI is not set
# CONFIG_FUSION_FC is not set
# CONFIG_FUSION_SAS is not set

#
# IEEE 1394 (FireWire) support
#
# CONFIG_IEEE1394 is not set

#
# I2O device support
#
# CONFIG_I2O is not set

#
# Macintosh device drivers
#
# CONFIG_MAC_EMUMOUSEBTN is not set

#
# Network device support
#
CONFIG_NETDEVICES=y
CONFIG_DUMMY=m
# CONFIG_BONDING is not set
# CONFIG_EQUALIZER is not set
# CONFIG_TUN is not set
# CONFIG_NET_SB1000 is not set

#
# ARCnet devices
#
# CONFIG_ARCNET is not set

#
# PHY device support
#

#
# Ethernet (10 or 100Mbit)
#
# CONFIG_NET_ETHERNET is not set

#
# Ethernet (1000 Mbit)
#
# CONFIG_ACENIC is not set
# CONFIG_DL2K is not set
CONFIG_E1000=y
CONFIG_E1000_NAPI=y
CONFIG_E1000_DISABLE_PACKET_SPLIT=y
# CONFIG_NS83820 is not set
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
# CONFIG_R8169 is not set
# CONFIG_SIS190 is not set
# CONFIG_SKGE is not set
# CONFIG_SKY2 is not set
# CONFIG_SK98LIN is not set
# CONFIG_TIGON3 is not set
# CONFIG_BNX2 is not set
# CONFIG_QLA3XXX is not set
# CONFIG_ATL1 is not set

#
# Ethernet (10000 Mbit)
#
# CONFIG_CHELSIO_T1 is not set
# CONFIG_CHELSIO_T3 is not set
# CONFIG_IXGB is not set
# CONFIG_S2IO is not set
# CONFIG_MYRI10GE is not set
# CONFIG_NETXEN_NIC is not set

#
# Token Ring devices
#
# CONFIG_TR is not set

#
# Wireless LAN (non-hamradio)
#
# CONFIG_NET_RADIO is not set

#
# Wan interfaces
#
# CONFIG_WAN is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
# CONFIG_NET_FC is not set
# CONFIG_SHAPER is not set
# CONFIG_NETCONSOLE is not set
# CONFIG_NETPOLL is not set
# CONFIG_NET_POLL_CONTROLLER is not set

#
# ISDN subsystem
#
# CONFIG_ISDN is not set

#
# Telephony Support
#
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y
# CONFIG_INPUT_FF_MEMLESS is not set

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_TSDEV is not set
# CONFIG_INPUT_EVDEV is not set
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_XTKBD is not set
# CONFIG_KEYBOARD_NEWTON is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
# CONFIG_MOUSE_SERIAL is not set
# CONFIG_MOUSE_INPORT is not set
# CONFIG_MOUSE_LOGIBM is not set
# CONFIG_MOUSE_PC110PAD is not set
# CONFIG_MOUSE_VSXXXAA is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
# CONFIG_SERIO_SERPORT is not set
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
# CONFIG_SERIO_RAW is not set
# CONFIG_GAMEPORT is not set

#
# Character devices
#
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
# CONFIG_VT_HW_CONSOLE_BINDING is not set
# CONFIG_SERIAL_NONSTANDARD is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_PNP=y
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
# CONFIG_SERIAL_8250_EXTENDED is not set

#
# Non-8250 serial port support
#
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
# CONFIG_SERIAL_JSM is not set
CONFIG_UNIX98_PTYS=y
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=256

#
# IPMI
#
# CONFIG_IPMI_HANDLER is not set

#
# Watchdog Cards
#
# CONFIG_WATCHDOG is not set
# CONFIG_HW_RANDOM is not set
# CONFIG_NVRAM is not set
# CONFIG_RTC is not set
# CONFIG_GEN_RTC is not set
# CONFIG_DTLK is not set
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set
# CONFIG_SONYPI is not set
# CONFIG_AGP is not set
# CONFIG_DRM is not set
# CONFIG_MWAVE is not set
# CONFIG_PC8736x_GPIO is not set
# CONFIG_NSC_GPIO is not set
# CONFIG_CS5535_GPIO is not set
CONFIG_RAW_DRIVER=y
CONFIG_MAX_RAW_DEVS=256
# CONFIG_HPET is not set
# CONFIG_HANGCHECK_TIMER is not set

#
# TPM devices
#
# CONFIG_TCG_TPM is not set
# CONFIG_TELCLOCK is not set

#
# I2C support
#
# CONFIG_I2C is not set

#
# SPI support
#
# CONFIG_SPI is not set
# CONFIG_SPI_MASTER is not set

#
# Dallas's 1-wire bus
#
# CONFIG_W1 is not set

#
# Hardware Monitoring support
#
# CONFIG_HWMON is not set
# CONFIG_HWMON_VID is not set

#
# Multifunction device drivers
#
# CONFIG_MFD_SM501 is not set

#
# Multimedia devices
#
# CONFIG_VIDEO_DEV is not set

#
# Digital Video Broadcasting Devices
#
# CONFIG_DVB is not set

#
# Graphics support
#
# CONFIG_BACKLIGHT_LCD_SUPPORT is not set
CONFIG_BACKLIGHT_CLASS_DEVICE=m
# CONFIG_BACKLIGHT_PROGEAR is not set
# CONFIG_FB is not set

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
# CONFIG_VGACON_SOFT_SCROLLBACK is not set
CONFIG_VIDEO_SELECT=y
# CONFIG_MDA_CONSOLE is not set
CONFIG_DUMMY_CONSOLE=y

#
# Sound
#
# CONFIG_SOUND is not set

#
# HID Devices
#
CONFIG_HID=y
# CONFIG_HID_DEBUG is not set

#
# USB support
#
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
# CONFIG_USB is not set

#
# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support'
#

#
# USB Gadget Support
#
# CONFIG_USB_GADGET is not set

#
# MMC/SD Card support
#
# CONFIG_MMC is not set

#
# LED devices
#
# CONFIG_NEW_LEDS is not set

#
# LED drivers
#

#
# LED Triggers
#

#
# InfiniBand support
#
# CONFIG_INFINIBAND is not set

#
# EDAC - error detection and reporting (RAS) (EXPERIMENTAL)
#
# CONFIG_EDAC is not set

#
# Real Time Clock
#
# CONFIG_RTC_CLASS is not set

#
# DMA Engine support
#
# CONFIG_DMA_ENGINE is not set

#
# DMA Clients
#

#
# DMA Devices
#

#
# Auxiliary Display support
#

#
# Virtualization
#
# CONFIG_KVM is not set

#
# File systems
#
CONFIG_EXT2_FS=y
# CONFIG_EXT2_FS_XATTR is not set
# CONFIG_EXT2_FS_XIP is not set
CONFIG_EXT3_FS=y
CONFIG_EXT3_FS_XATTR=y
# CONFIG_EXT3_FS_POSIX_ACL is not set
# CONFIG_EXT3_FS_SECURITY is not set
# CONFIG_EXT4DEV_FS is not set
CONFIG_JBD=y
# CONFIG_JBD_DEBUG is not set
CONFIG_FS_MBCACHE=y
CONFIG_REISERFS_FS=y
# CONFIG_REISERFS_CHECK is not set
# CONFIG_REISERFS_PROC_INFO is not set
# CONFIG_REISERFS_FS_XATTR is not set
# CONFIG_JFS_FS is not set
# CONFIG_FS_POSIX_ACL is not set
CONFIG_XFS_FS=y
# CONFIG_XFS_QUOTA is not set
# CONFIG_XFS_SECURITY is not set
# CONFIG_XFS_POSIX_ACL is not set
# CONFIG_XFS_RT is not set
# CONFIG_GFS2_FS is not set
# CONFIG_OCFS2_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_ROMFS_FS is not set
# CONFIG_INOTIFY is not set
# CONFIG_QUOTA is not set
CONFIG_DNOTIFY=y
# CONFIG_AUTOFS_FS is not set
# CONFIG_AUTOFS4_FS is not set
# CONFIG_FUSE_FS is not set

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
# CONFIG_ZISOFS is not set
# CONFIG_UDF_FS is not set

#
# DOS/FAT/NT Filesystems
#
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
# CONFIG_TMPFS_POSIX_ACL is not set
# CONFIG_HUGETLBFS is not set
# CONFIG_HUGETLB_PAGE is not set
CONFIG_RAMFS=y
# CONFIG_CONFIGFS_FS is not set

#
# Miscellaneous filesystems
#
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
# CONFIG_CRAMFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set

#
# Network File Systems
#
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
# CONFIG_NFS_V3_ACL is not set
# CONFIG_NFS_V4 is not set
# CONFIG_NFS_DIRECTIO is not set
CONFIG_NFSD=y
# CONFIG_NFSD_V3 is not set
# CONFIG_NFSD_TCP is not set
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_EXPORTFS=y
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=y
# CONFIG_RPCSEC_GSS_KRB5 is not set
# CONFIG_RPCSEC_GSS_SPKM3 is not set
# CONFIG_SMB_FS is not set
# CONFIG_CIFS is not set
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set
# CONFIG_9P_FS is not set

#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y

#
# Native Language Support
#
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=y
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
# CONFIG_NLS_ASCII is not set
CONFIG_NLS_ISO8859_1=y
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
# CONFIG_NLS_ISO8859_15 is not set
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
# CONFIG_NLS_UTF8 is not set

#
# Distributed Lock Manager
#
# CONFIG_DLM is not set

#
# Instrumentation Support
#
CONFIG_PROFILING=y
CONFIG_OPROFILE=y
# CONFIG_KPROBES is not set

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_PRINTK_TIME is not set
# CONFIG_ENABLE_MUST_CHECK is not set
CONFIG_MAGIC_SYSRQ=y
CONFIG_UNUSED_SYMBOLS=y
CONFIG_DEBUG_FS=y
# CONFIG_HEADERS_CHECK is not set
CONFIG_DEBUG_KERNEL=y
# CONFIG_DEBUG_SHIRQ is not set
CONFIG_LOG_BUF_SHIFT=15
CONFIG_DETECT_SOFTLOCKUP=y
# CONFIG_SCHEDSTATS is not set
# CONFIG_TIMER_STATS is not set
# CONFIG_DEBUG_SLAB is not set
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_RT_MUTEX_TESTER is not set
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y
# CONFIG_DEBUG_LOCKDEP is not set
CONFIG_TRACE_IRQFLAGS=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
CONFIG_STACKTRACE=y
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_LIST is not set
CONFIG_FRAME_POINTER=y
# CONFIG_FORCED_INLINING is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_FAULT_INJECTION is not set
CONFIG_EARLY_PRINTK=y
# CONFIG_DEBUG_STACKOVERFLOW is not set
# CONFIG_DEBUG_STACK_USAGE is not set

#
# Page alloc debug is incompatible with Software Suspend on i386
#
# CONFIG_DEBUG_RODATA is not set
# CONFIG_4KSTACKS is not set
CONFIG_X86_FIND_SMP_CONFIG=y
CONFIG_X86_MPPARSE=y
CONFIG_DOUBLEFAULT=y

#
# Security options
#
# CONFIG_KEYS is not set
# CONFIG_SECURITY is not set

#
# Cryptographic options
#
# CONFIG_CRYPTO is not set

#
# Library routines
#
CONFIG_BITREVERSE=y
# CONFIG_CRC_CCITT is not set
# CONFIG_CRC16 is not set
CONFIG_CRC32=y
# CONFIG_LIBCRC32C is not set
CONFIG_PLIST=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_X86_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_BIOS_REBOOT=y
CONFIG_X86_TRAMPOLINE=y
CONFIG_KTIME_SCALAR=y

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dio_get_page() lockdep complaints
  2007-04-19 12:49           ` Jens Axboe
@ 2007-04-19 12:52             ` Jens Axboe
  2007-04-19 13:53               ` Roland Dreier
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2007-04-19 12:52 UTC (permalink / raw)
  To: Vladimir V. Saveliev
  Cc: Andrew Morton, linux-kernel, linux-aio, reiserfs-dev, linux-mm

On Thu, Apr 19 2007, Jens Axboe wrote:
> > I tried fio (1.15) with this job file and did not get the possible
> > circular locking dependency detected
> 
> Perhaps some of the preempt settings? The box is an emc centera, it's a
> lowly p4/ht.

As I mentioned, the rootfs is on reiser. So something in the boot up
scripts may trigger something that gets reiser to run through that path
with the wrong locking order. After the box is done booting, the dmesg
is clean. I then mount the ext3 fs and run the fio test, the lockdep
trace shows up immediately.

The distro is SLES9.

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dio_get_page() lockdep complaints
  2007-04-19 12:52             ` Jens Axboe
@ 2007-04-19 13:53               ` Roland Dreier
  2007-04-19 14:20                 ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Roland Dreier @ 2007-04-19 13:53 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Vladimir V. Saveliev, Andrew Morton, linux-kernel, linux-aio,
	reiserfs-dev, linux-mm

Maybe you could add some hack really early on (say at the beginning of
the reiserfs mount code) that took instances of the locks in the
correct order, so you would get a lockdep trace of where the ordering
is violated when it first happens?

 - R.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dio_get_page() lockdep complaints
  2007-04-19  8:34       ` Jens Axboe
  2007-04-19 12:43         ` Vladimir V. Saveliev
@ 2007-04-19 14:15         ` Jens Axboe
  2007-04-19 14:55           ` Vladimir V. Saveliev
  1 sibling, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2007-04-19 14:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-aio, reiserfs-dev, Vladimir V. Saveliev, linux-mm

On Thu, Apr 19 2007, Jens Axboe wrote:
> > Is it possible that fio was changed?  That it was changed to close() the fd
> > before doing the munmapping whereas it used to hold the file open?
> 
> It's been a while since I tested on this box, so I don't really recall.
> But fio does close() the fd before doing munmap(). This particular test
> case doesn't use mmap(), though.

Ah wait, but it does use mmap! Fio sets up a semaphore my mmap'ing a
file in /tmp (which is reiserfs). Here's a test case that triggers it
100% reliably, adjust /tmp to some other location that is reiserfs.
lockdep from that run attached.

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>

int main(int argc, char *argv[])
{
	char fname[] = "/tmp/some_file";	/* /tmp on reiserfs */
	void *p;
	int fd;

	fd = open(fname, O_RDWR|O_CREAT, 0644);
	if (fd < 0) {
		perror("open");
		return 1;
	}

	if (ftruncate(fd, 64) < 0) {
		perror("ftruncate");
		return 1;
	}

	p = mmap(NULL, 64, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
	if (p == MAP_FAILED) {
		perror("mmap");
		return 1;
	}

	unlink(fname);
	close(fd);
	munmap(p, 64);
	return 0;
}


=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.21-rc7 #18
-------------------------------------------------------
reiser-mmap/9643 is trying to acquire lock:
 (&inode->i_mutex){--..}, at: [<b038c625>] mutex_lock+0x1c/0x1f

but task is already holding lock:
 (&mm->mmap_sem){----}, at: [<b015c6cf>] sys_munmap+0x26/0x42

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&mm->mmap_sem){----}:
       [<b013e3fb>] __lock_acquire+0xdee/0xf9c
       [<b013e600>] lock_acquire+0x57/0x70
       [<b0137b92>] down_read+0x3a/0x4c
       [<b01b6b88>] reiserfs_remount+0x176/0x42a
       [<b016ba21>] do_remount_sb+0xb9/0x10f
       [<b017ebe7>] do_mount+0x1b6/0x616
       [<b017f0b6>] sys_mount+0x6f/0xa9
       [<b0103f04>] sysenter_past_esp+0x5d/0x99
       [<ffffffff>] 0xffffffff

-> #0 (&inode->i_mutex){--..}:
       [<b013e259>] __lock_acquire+0xc4c/0xf9c
       [<b013e600>] lock_acquire+0x57/0x70
       [<b038c3e5>] __mutex_lock_slowpath+0x73/0x297
       [<b038c625>] mutex_lock+0x1c/0x1f
       [<b01b17e9>] reiserfs_file_release+0x54/0x447
       [<b016afe7>] __fput+0x53/0x101
       [<b016b0ee>] fput+0x19/0x1c
       [<b015bcd5>] remove_vma+0x3b/0x4d
       [<b015c659>] do_munmap+0x17f/0x1cf
       [<b015c6db>] sys_munmap+0x32/0x42
       [<b0103f04>] sysenter_past_esp+0x5d/0x99
       [<ffffffff>] 0xffffffff

other info that might help us debug this:

1 lock held by reiser-mmap/9643:
 #0:  (&mm->mmap_sem){----}, at: [<b015c6cf>] sys_munmap+0x26/0x42

stack backtrace:
 [<b0104f54>] show_trace_log_lvl+0x1a/0x30
 [<b0105626>] show_trace+0x12/0x14
 [<b01056ad>] dump_stack+0x16/0x18
 [<b013c48d>] print_circular_bug_tail+0x68/0x71
 [<b013e259>] __lock_acquire+0xc4c/0xf9c
 [<b013e600>] lock_acquire+0x57/0x70
 [<b038c3e5>] __mutex_lock_slowpath+0x73/0x297
 [<b038c625>] mutex_lock+0x1c/0x1f
 [<b01b17e9>] reiserfs_file_release+0x54/0x447
 [<b016afe7>] __fput+0x53/0x101
 [<b016b0ee>] fput+0x19/0x1c
 [<b015bcd5>] remove_vma+0x3b/0x4d
 [<b015c659>] do_munmap+0x17f/0x1cf
 [<b015c6db>] sys_munmap+0x32/0x42
 [<b0103f04>] sysenter_past_esp+0x5d/0x99
 =======================

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dio_get_page() lockdep complaints
  2007-04-19 13:53               ` Roland Dreier
@ 2007-04-19 14:20                 ` Jens Axboe
  0 siblings, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2007-04-19 14:20 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Vladimir V. Saveliev, Andrew Morton, linux-kernel, linux-aio,
	reiserfs-dev, linux-mm

On Thu, Apr 19 2007, Roland Dreier wrote:
> Maybe you could add some hack really early on (say at the beginning of
> the reiserfs mount code) that took instances of the locks in the
> correct order, so you would get a lockdep trace of where the ordering
> is violated when it first happens?

See the test case mail I sent out.

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dio_get_page() lockdep complaints
  2007-04-19  8:01 ` dio_get_page() lockdep complaints Andrew Morton
  2007-04-19  8:01   ` Jens Axboe
@ 2007-04-19 14:36   ` Chris Mason
  1 sibling, 0 replies; 17+ messages in thread
From: Chris Mason @ 2007-04-19 14:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jens Axboe, linux-kernel, linux-aio, reiserfs-dev,
	Vladimir V. Saveliev, linux-mm

On Thu, Apr 19, 2007 at 01:01:42AM -0700, Andrew Morton wrote:
> On Thu, 19 Apr 2007 09:38:30 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> > Hi,
> > 
> > Doing some testing on CFQ, I ran into this 100% reproducible report:
> > 
> > =======================================================
> > [ INFO: possible circular locking dependency detected ]
> > 2.6.21-rc7 #5
> > -------------------------------------------------------
> > fio/9741 is trying to acquire lock:
> >  (&mm->mmap_sem){----}, at: [<b018cb34>] dio_get_page+0x54/0x161
> > 
> > but task is already holding lock:
> >  (&inode->i_mutex){--..}, at: [<b038c6e5>] mutex_lock+0x1c/0x1f
> > 
> > which lock already depends on the new lock.
> > 
> 
> This is the correct ranking: i_mutex outside mmap_sem.

[ ... ]

> But here reiserfs is taking i_mutex in its file_operations.release(), which
> can be called under mmap_sem.
> 
> Vladimir's recent de14569f94513279e3d44d9571a421e9da1759ae.  "resierfs:
> avoid tail packing if an inode was ever mmapped" comes real close to this
> code, but afaict it did not cause this bug.
> 
> I can't think of anything which we've done in the 2.6.21 cycle which would have
> caused this to start happening.  Odd.

In this case, reiserfs is taking i_mutex to safely discard the
preallocation blocks.  The best solution would probably be to just put
in a preallocation mutex other than i_sem (even i_mmap would probably
work).

This shouldn't be a new regression, the file_release prelloc stuff
hasn't changed in ages.

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dio_get_page() lockdep complaints
  2007-04-19 14:15         ` Jens Axboe
@ 2007-04-19 14:55           ` Vladimir V. Saveliev
  0 siblings, 0 replies; 17+ messages in thread
From: Vladimir V. Saveliev @ 2007-04-19 14:55 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, linux-kernel, linux-aio, reiserfs-dev, linux-mm

Hello

On Thursday 19 April 2007 18:15, Jens Axboe wrote:
> On Thu, Apr 19 2007, Jens Axboe wrote:
> > > Is it possible that fio was changed?  That it was changed to close() the fd
> > > before doing the munmapping whereas it used to hold the file open?
> > 
> > It's been a while since I tested on this box, so I don't really recall.
> > But fio does close() the fd before doing munmap(). This particular test
> > case doesn't use mmap(), though.
> 
> Ah wait, but it does use mmap! Fio sets up a semaphore my mmap'ing a
> file in /tmp (which is reiserfs). Here's a test case that triggers it
> 100% reliably, adjust /tmp to some other location that is reiserfs.
> lockdep from that run attached.
> 
> #include <stdlib.h>
> #include <stdio.h>
> #include <unistd.h>
> #include <fcntl.h>
> #include <sys/mman.h>
> 
> int main(int argc, char *argv[])
> {
> 	char fname[] = "/tmp/some_file";	/* /tmp on reiserfs */
> 	void *p;
> 	int fd;
> 
> 	fd = open(fname, O_RDWR|O_CREAT, 0644);
> 	if (fd < 0) {
> 		perror("open");
> 		return 1;
> 	}
> 
> 	if (ftruncate(fd, 64) < 0) {
> 		perror("ftruncate");
> 		return 1;
> 	}
> 
> 	p = mmap(NULL, 64, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
> 	if (p == MAP_FAILED) {
> 		perror("mmap");
> 		return 1;
> 	}
> 
> 	unlink(fname);
> 	close(fd);
> 	munmap(p, 64);
> 	return 0;
> }
> 
> 
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.21-rc7 #18
> -------------------------------------------------------
> reiser-mmap/9643 is trying to acquire lock:
>  (&inode->i_mutex){--..}, at: [<b038c625>] mutex_lock+0x1c/0x1f
> 
> but task is already holding lock:
>  (&mm->mmap_sem){----}, at: [<b015c6cf>] sys_munmap+0x26/0x42
> 

So, it looks like the problem is that reiserfs_file_release() locks inode's mutex while mm's mmap_sem is locked?


> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (&mm->mmap_sem){----}:
>        [<b013e3fb>] __lock_acquire+0xdee/0xf9c
>        [<b013e600>] lock_acquire+0x57/0x70
>        [<b0137b92>] down_read+0x3a/0x4c
>        [<b01b6b88>] reiserfs_remount+0x176/0x42a
>        [<b016ba21>] do_remount_sb+0xb9/0x10f
>        [<b017ebe7>] do_mount+0x1b6/0x616
>        [<b017f0b6>] sys_mount+0x6f/0xa9
>        [<b0103f04>] sysenter_past_esp+0x5d/0x99
>        [<ffffffff>] 0xffffffff
> 

> -> #0 (&inode->i_mutex){--..}:
>        [<b013e259>] __lock_acquire+0xc4c/0xf9c
>        [<b013e600>] lock_acquire+0x57/0x70
>        [<b038c3e5>] __mutex_lock_slowpath+0x73/0x297
>        [<b038c625>] mutex_lock+0x1c/0x1f
>        [<b01b17e9>] reiserfs_file_release+0x54/0x447
>        [<b016afe7>] __fput+0x53/0x101
>        [<b016b0ee>] fput+0x19/0x1c
>        [<b015bcd5>] remove_vma+0x3b/0x4d
>        [<b015c659>] do_munmap+0x17f/0x1cf
>        [<b015c6db>] sys_munmap+0x32/0x42
>        [<b0103f04>] sysenter_past_esp+0x5d/0x99
>        [<ffffffff>] 0xffffffff
> 
> other info that might help us debug this:
> 
> 1 lock held by reiser-mmap/9643:
>  #0:  (&mm->mmap_sem){----}, at: [<b015c6cf>] sys_munmap+0x26/0x42
> 
> stack backtrace:
>  [<b0104f54>] show_trace_log_lvl+0x1a/0x30
>  [<b0105626>] show_trace+0x12/0x14
>  [<b01056ad>] dump_stack+0x16/0x18
>  [<b013c48d>] print_circular_bug_tail+0x68/0x71
>  [<b013e259>] __lock_acquire+0xc4c/0xf9c
>  [<b013e600>] lock_acquire+0x57/0x70
>  [<b038c3e5>] __mutex_lock_slowpath+0x73/0x297
>  [<b038c625>] mutex_lock+0x1c/0x1f
>  [<b01b17e9>] reiserfs_file_release+0x54/0x447
>  [<b016afe7>] __fput+0x53/0x101
>  [<b016b0ee>] fput+0x19/0x1c
>  [<b015bcd5>] remove_vma+0x3b/0x4d
>  [<b015c659>] do_munmap+0x17f/0x1cf
>  [<b015c6db>] sys_munmap+0x32/0x42
>  [<b0103f04>] sysenter_past_esp+0x5d/0x99
>  =======================
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dio_get_page() lockdep complaints
  2007-04-19  8:25     ` Andrew Morton
  2007-04-19  8:34       ` Jens Axboe
@ 2007-04-19 14:57       ` Vladimir V. Saveliev
  2007-04-19 16:42         ` Andrew Morton
  1 sibling, 1 reply; 17+ messages in thread
From: Vladimir V. Saveliev @ 2007-04-19 14:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jens Axboe, linux-kernel, linux-aio, reiserfs-dev, linux-mm

Hello

On Thursday 19 April 2007 12:25, Andrew Morton wrote:
> On Thu, 19 Apr 2007 10:01:57 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> > On Thu, Apr 19 2007, Andrew Morton wrote:
> > > On Thu, 19 Apr 2007 09:38:30 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> > > 
> > > > Hi,
> > > > 
> > > > Doing some testing on CFQ, I ran into this 100% reproducible report:
> > > > 
> > > > =======================================================
> > > > [ INFO: possible circular locking dependency detected ]
> > > > 2.6.21-rc7 #5
> > > > -------------------------------------------------------
> > > > fio/9741 is trying to acquire lock:
> > > >  (&mm->mmap_sem){----}, at: [<b018cb34>] dio_get_page+0x54/0x161
> > > > 
> > > > but task is already holding lock:
> > > >  (&inode->i_mutex){--..}, at: [<b038c6e5>] mutex_lock+0x1c/0x1f
> > > > 
> > > > which lock already depends on the new lock.
> > > > 
> > > 
> > > This is the correct ranking: i_mutex outside mmap_sem.
> > > 
> > > > 
> > > > the existing dependency chain (in reverse order) is:
> > > > 
> > > > -> #1 (&inode->i_mutex){--..}:
> > > >        [<b013e3fb>] __lock_acquire+0xdee/0xf9c
> > > >        [<b013e600>] lock_acquire+0x57/0x70
> > > >        [<b038c4a5>] __mutex_lock_slowpath+0x73/0x297
> > > >        [<b038c6e5>] mutex_lock+0x1c/0x1f
> > > >        [<b01b17e9>] reiserfs_file_release+0x54/0x447
> > > >        [<b016afe7>] __fput+0x53/0x101
> > > >        [<b016b0ee>] fput+0x19/0x1c
> > > >        [<b015bcd5>] remove_vma+0x3b/0x4d
> > > >        [<b015c659>] do_munmap+0x17f/0x1cf
> > > >        [<b015c6db>] sys_munmap+0x32/0x42
> > > >        [<b0103f04>] sysenter_past_esp+0x5d/0x99
> > > >        [<ffffffff>] 0xffffffff
> > > > 
> > > > -> #0 (&mm->mmap_sem){----}:
> > > >        [<b013e259>] __lock_acquire+0xc4c/0xf9c
> > > >        [<b013e600>] lock_acquire+0x57/0x70
> > > >        [<b0137b92>] down_read+0x3a/0x4c
> > > >        [<b018cb34>] dio_get_page+0x54/0x161
> > > >        [<b018d7a9>] __blockdev_direct_IO+0x514/0xe2a
> > > >        [<b01cf449>] ext3_direct_IO+0x98/0x1e5
> > > >        [<b014e8df>] generic_file_direct_IO+0x63/0x133
> > > >        [<b01500e9>] generic_file_aio_read+0x16b/0x222
> > > >        [<b017f8b6>] aio_rw_vect_retry+0x5a/0x116
> > > >        [<b0180147>] aio_run_iocb+0x69/0x129
> > > >        [<b0180a78>] io_submit_one+0x194/0x2eb
> > > >        [<b0181331>] sys_io_submit+0x92/0xe7
> > > >        [<b0103f90>] syscall_call+0x7/0xb
> > > >        [<ffffffff>] 0xffffffff
> > > 
> > > But here reiserfs is taking i_mutex in its file_operations.release(),
> > > which can be called under mmap_sem.
> > > 
> > > Vladimir's recent de14569f94513279e3d44d9571a421e9da1759ae.
> > > "resierfs: avoid tail packing if an inode was ever mmapped" comes real
> > > close to this code, but afaict it did not cause this bug.
> > > 
> > > I can't think of anything which we've done in the 2.6.21 cycle which
> > > would have caused this to start happening.  Odd.
> > 
> > The bug may be holder, let me know if you want me to check 2.6.20 or
> > earlier.
> 
> Would be great if you could test 2.6.20.  I have a feeling that I missed
> something, but what?  We didn't change the refcounting of lifetime of
> vma.vm_file...
> 
> 
> > > > The test run was fio, the job file used is:
> > > > 
> > > > # fio job file snip below
> > > > [global]
> > > > bs=4k
> > > > buffered=0
> > > > ioengine=libaio
> > > > iodepth=4
> > > > thread
> > > > 
> > > > [readers]
> > > > numjobs=8
> > > > size=128m
> > > > rw=read
> > > > # fio job file snip above
> > > > 
> > > > Filesystem was ext3, default mkfs and mount options. Kernel was
> > > > 2.6.21-rc7 as of this morning, with some CFQ patches applied.
> > > > 
> > > 
> > > It's interesting that lockdep learned the (wrong) ranking from a reiserfs
> > > operation then later detected it being violated by ext3.
> > 
> > It's a scratch test box, which for some reason has reiserfs as the
> > rootfs. So reiser gets to run first :-)
> 
> direct-io reads against reiserfs also will take i_mutex outside mmap_sem. 
> As will pagefaults inside generic_file_write() (which is where this ranking
> is primarily defined).
> 
> So an all-reiserfs system should be getting the same reports.  Obviously,
> that isn't happening.
> 
> It's a bit odd that reiserfs is playing with file contents within
> file_operations.release(): there could be other files open against that
> inode.  One would expect this sort of thing to be happening in an
> inode_operation.  But it's been like that for a long time.
> 

reiserfs needs to "pack" file tail when last process which opened a file closes it.
Can you see more suitable place where that could be performed?

> Is it possible that fio was changed?  That it was changed to close() the fd
> before doing the munmapping whereas it used to hold the file open?
> 
> 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dio_get_page() lockdep complaints
  2007-04-19 14:57       ` Vladimir V. Saveliev
@ 2007-04-19 16:42         ` Andrew Morton
  0 siblings, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2007-04-19 16:42 UTC (permalink / raw)
  To: Vladimir V. Saveliev
  Cc: Jens Axboe, linux-kernel, linux-aio, reiserfs-dev, linux-mm

On Thu, 19 Apr 2007 18:57:41 +0400 "Vladimir V. Saveliev" <vs@namesys.com> wrote:

> > It's a bit odd that reiserfs is playing with file contents within
> > file_operations.release(): there could be other files open against that
> > inode.  One would expect this sort of thing to be happening in an
> > inode_operation.  But it's been like that for a long time.
> > 
> 
> reiserfs needs to "pack" file tail when last process which opened a file closes it.
> Can you see more suitable place where that could be performed?

No, you're right - I got my ->release() and ->flush() mixed up.

Possibly one could perform this operation on the final iput(), but I suspect the
locking situation there would be even more complex.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dio_get_page() lockdep complaints
       [not found]     ` <1194630300.7459.65.camel@heimdal.trondhjem.org>
@ 2007-11-11 19:49       ` Peter Zijlstra
  2007-11-12  8:45         ` Martin Schwidefsky
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2007-11-11 19:49 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Zach Brown, Jens Axboe, linux-kernel, akpm, linux-aio,
	Chris Mason, linux-mm, Hugh Dickins, Linus Torvalds,
	Martin Schwidefsky

On Fri, 2007-11-09 at 12:45 -0500, Trond Myklebust wrote:
> On Fri, 2007-11-09 at 09:30 -0800, Zach Brown wrote:
> > So, reiserfs and NFS are nesting i_mutex inside the mmap_sem.
> > 
> > >>        [<b038c6e5>] mutex_lock+0x1c/0x1f
> > >>        [<b01b17e9>] reiserfs_file_release+0x54/0x447
> > >>        [<b016afe7>] __fput+0x53/0x101
> > >>        [<b016b0ee>] fput+0x19/0x1c
> > >>        [<b015bcd5>] remove_vma+0x3b/0x4d
> > >>        [<b015c659>] do_munmap+0x17f/0x1cf
> > 
> > >        [<ffffffff802686a1>] _mutex_lock+0x28/0x34
> > >        [<ffffffff883e71d0>] nfs_revalidate_mapping+0x6d/0xac [nfs]
> > >        [<ffffffff883e4b51>] nfs_file_mmap+0x5c/0x74 [nfs]
> > >        [<ffffffff8020df7e>] do_mmap_pgoff+0x51a/0x817
> > >        [<ffffffff80225d19>] sys_mmap+0x90/0x119
> > 
> > I think i_mutex is fundamentally nested outside of the mmap_sem because
> > of faulting in the buffered write path.  I think these warnings could be
> > reproduced with a careful test app which tries buffered writes from an
> > address which will fault.
> > 
> > DIO just tripped it up because it *always* performs get_user_pages() on
> > the memory.
> > 
> > So reiser and NFS need to be fixed.  No?
> 
> Actually, it is rather mmap() needs to be fixed. It is cold calling the
> filesystem while holding all sorts of nasty locks. It needs to be
> migrated to the same sort of syscall layout as read() and write().
> 
> You _first_ call the filesystem so that it can make whatever
> preparations it needs outside the lock. The filesystem then calls the
> VM, which can then call the filesystem back if needed.

Right, which gets us into all kinds of trouble because some sites need
mmap_sem to resolve some races, notably s390 31-bit and shm.

Quick proto-type that moves mmap_sem into do_mmap{,_pgoff} and provides
_locked functions for those few icky sites.

The !_locked functions also call f_op->mmap_prepare() before taking the
mmap_sem. Which makes for some ugly asymetry :-/

Anyway, I'm not comming up with anything nicer atm, hopefully a nice
idea will present itself soon.

(compile tested only - mostly for illustrational purposes)

Not-signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/alpha/kernel/osf_sys.c         |    2 -
 arch/arm/kernel/sys_arm.c           |    2 -
 arch/avr32/kernel/sys_avr32.c       |    2 -
 arch/blackfin/kernel/sys_bfin.c     |    2 -
 arch/cris/kernel/sys_cris.c         |    2 -
 arch/frv/kernel/sys_frv.c           |    4 --
 arch/h8300/kernel/sys_h8300.c       |    4 --
 arch/ia64/ia32/sys_ia32.c           |   30 +++++-----------
 arch/ia64/kernel/sys_ia64.c         |    2 -
 arch/m32r/kernel/sys_m32r.c         |    2 -
 arch/m68k/kernel/sys_m68k.c         |    4 --
 arch/m68knommu/kernel/sys_m68k.c    |    2 -
 arch/mips/kernel/irixelf.c          |   10 -----
 arch/mips/kernel/linux32.c          |    2 -
 arch/mips/kernel/syscall.c          |    2 -
 arch/mips/kernel/sysirix.c          |    5 --
 arch/parisc/kernel/sys_parisc.c     |    2 -
 arch/powerpc/kernel/syscalls.c      |    2 -
 arch/s390/kernel/compat_linux.c     |    5 +-
 arch/s390/kernel/sys_s390.c         |    2 -
 arch/sh/kernel/sys_sh.c             |    2 -
 arch/sh64/kernel/sys_sh64.c         |    2 -
 arch/sparc/kernel/sys_sparc.c       |    2 -
 arch/sparc/kernel/sys_sunos.c       |    2 -
 arch/sparc64/kernel/binfmt_aout32.c |    6 ---
 arch/sparc64/kernel/sys_sparc.c     |    2 -
 arch/sparc64/kernel/sys_sunos32.c   |    2 -
 arch/sparc64/solaris/misc.c         |    2 -
 arch/um/kernel/syscall.c            |    2 -
 arch/v850/kernel/syscalls.c         |    2 -
 arch/x86/ia32/ia32_aout.c           |    6 ---
 arch/x86/ia32/sys_ia32.c            |    5 --
 arch/x86/kernel/sys_i386_32.c       |    2 -
 arch/x86/kernel/sys_x86_64.c        |    2 -
 arch/xtensa/kernel/syscall.c        |    2 -
 drivers/char/drm/drm_bufs.c         |    4 --
 drivers/char/drm/i810_dma.c         |    2 -
 fs/aio.c                            |    4 +-
 fs/binfmt_aout.c                    |    6 ---
 fs/binfmt_elf.c                     |    6 ---
 fs/binfmt_elf_fdpic.c               |    9 ----
 fs/binfmt_flat.c                    |    6 +--
 fs/binfmt_som.c                     |    6 ---
 fs/nfs/file.c                       |   25 +++++++++----
 include/linux/fs.h                  |    1 
 include/linux/mm.h                  |   13 ++++++-
 ipc/shm.c                           |    2 -
 mm/mmap.c                           |   65 +++++++++++++++++++++++++++++++++---
 48 files changed, 109 insertions(+), 169 deletions(-)

Index: linux-2.6-2/arch/alpha/kernel/osf_sys.c
===================================================================
--- linux-2.6-2.orig/arch/alpha/kernel/osf_sys.c
+++ linux-2.6-2/arch/alpha/kernel/osf_sys.c
@@ -193,9 +193,7 @@ osf_mmap(unsigned long addr, unsigned lo
 			goto out;
 	}
 	flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
-	down_write(&current->mm->mmap_sem);
 	ret = do_mmap(file, addr, len, prot, flags, off);
-	up_write(&current->mm->mmap_sem);
 	if (file)
 		fput(file);
  out:
Index: linux-2.6-2/arch/arm/kernel/sys_arm.c
===================================================================
--- linux-2.6-2.orig/arch/arm/kernel/sys_arm.c
+++ linux-2.6-2/arch/arm/kernel/sys_arm.c
@@ -72,9 +72,7 @@ inline long do_mmap2(
 			goto out;
 	}
 
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
-	up_write(&current->mm->mmap_sem);
 
 	if (file)
 		fput(file);
Index: linux-2.6-2/arch/avr32/kernel/sys_avr32.c
===================================================================
--- linux-2.6-2.orig/arch/avr32/kernel/sys_avr32.c
+++ linux-2.6-2/arch/avr32/kernel/sys_avr32.c
@@ -41,9 +41,7 @@ asmlinkage long sys_mmap2(unsigned long 
 			return error;
 	}
 
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap_pgoff(file, addr, len, prot, flags, offset);
-	up_write(&current->mm->mmap_sem);
 
 	if (file)
 		fput(file);
Index: linux-2.6-2/arch/blackfin/kernel/sys_bfin.c
===================================================================
--- linux-2.6-2.orig/arch/blackfin/kernel/sys_bfin.c
+++ linux-2.6-2/arch/blackfin/kernel/sys_bfin.c
@@ -78,9 +78,7 @@ do_mmap2(unsigned long addr, unsigned lo
 			goto out;
 	}
 
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
-	up_write(&current->mm->mmap_sem);
 
 	if (file)
 		fput(file);
Index: linux-2.6-2/arch/cris/kernel/sys_cris.c
===================================================================
--- linux-2.6-2.orig/arch/cris/kernel/sys_cris.c
+++ linux-2.6-2/arch/cris/kernel/sys_cris.c
@@ -60,9 +60,7 @@ do_mmap2(unsigned long addr, unsigned lo
                         goto out;
         }
 
-        down_write(&current->mm->mmap_sem);
         error = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
-        up_write(&current->mm->mmap_sem);
 
         if (file)
                 fput(file);
Index: linux-2.6-2/arch/frv/kernel/sys_frv.c
===================================================================
--- linux-2.6-2.orig/arch/frv/kernel/sys_frv.c
+++ linux-2.6-2/arch/frv/kernel/sys_frv.c
@@ -69,9 +69,7 @@ asmlinkage long sys_mmap2(unsigned long 
 
 	pgoff >>= (PAGE_SHIFT - 12);
 
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
-	up_write(&current->mm->mmap_sem);
 
 	if (file)
 		fput(file);
@@ -114,9 +112,7 @@ asmlinkage long sys_mmap64(struct mmap_a
 	}
 	a.flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
 
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap_pgoff(file, a.addr, a.len, a.prot, a.flags, pgoff);
-	up_write(&current->mm->mmap_sem);
 	if (file)
 		fput(file);
 out:
Index: linux-2.6-2/arch/h8300/kernel/sys_h8300.c
===================================================================
--- linux-2.6-2.orig/arch/h8300/kernel/sys_h8300.c
+++ linux-2.6-2/arch/h8300/kernel/sys_h8300.c
@@ -60,9 +60,7 @@ static inline long do_mmap2(
 			goto out;
 	}
 
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
-	up_write(&current->mm->mmap_sem);
 
 	if (file)
 		fput(file);
@@ -147,9 +145,7 @@ asmlinkage long sys_mmap64(struct mmap_a
 	}
 	a.flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
 
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap_pgoff(file, a.addr, a.len, a.prot, a.flags, pgoff);
-	up_write(&current->mm->mmap_sem);
 	if (file)
 		fput(file);
 out:
Index: linux-2.6-2/arch/ia64/ia32/sys_ia32.c
===================================================================
--- linux-2.6-2.orig/arch/ia64/ia32/sys_ia32.c
+++ linux-2.6-2/arch/ia64/ia32/sys_ia32.c
@@ -212,12 +212,8 @@ mmap_subpage (struct file *file, unsigne
 	if (old_prot)
 		copy_from_user(page, (void __user *) PAGE_START(start), PAGE_SIZE);
 
-	down_write(&current->mm->mmap_sem);
-	{
-		ret = do_mmap(NULL, PAGE_START(start), PAGE_SIZE, prot | PROT_WRITE,
-			      flags | MAP_FIXED | MAP_ANONYMOUS, 0);
-	}
-	up_write(&current->mm->mmap_sem);
+	ret = do_mmap(NULL, PAGE_START(start), PAGE_SIZE, prot | PROT_WRITE,
+			flags | MAP_FIXED | MAP_ANONYMOUS, 0);
 
 	if (IS_ERR((void *) ret))
 		goto out;
@@ -821,16 +817,14 @@ emulate_mmap (struct file *file, unsigne
 	DBG("mmap_body: mapping [0x%lx-0x%lx) %s with poff 0x%llx\n", pstart, pend,
 	    is_congruent ? "congruent" : "not congruent", poff);
 
-	down_write(&current->mm->mmap_sem);
-	{
-		if (!(flags & MAP_ANONYMOUS) && is_congruent)
-			ret = do_mmap(file, pstart, pend - pstart, prot, flags | MAP_FIXED, poff);
-		else
-			ret = do_mmap(NULL, pstart, pend - pstart,
-				      prot | ((flags & MAP_ANONYMOUS) ? 0 : PROT_WRITE),
-				      flags | MAP_FIXED | MAP_ANONYMOUS, 0);
+	if (!(flags & MAP_ANONYMOUS) && is_congruent) {
+		ret = do_mmap(file, pstart, pend - pstart, prot,
+			       	flags | MAP_FIXED, poff);
+	} else {
+		ret = do_mmap(NULL, pstart, pend - pstart,
+			prot | ((flags & MAP_ANONYMOUS) ? 0 : PROT_WRITE),
+			flags | MAP_FIXED | MAP_ANONYMOUS, 0);
 	}
-	up_write(&current->mm->mmap_sem);
 
 	if (IS_ERR((void *) ret))
 		return ret;
@@ -904,11 +898,7 @@ ia32_do_mmap (struct file *file, unsigne
 	}
 	mutex_unlock(&ia32_mmap_mutex);
 #else
-	down_write(&current->mm->mmap_sem);
-	{
-		addr = do_mmap(file, addr, len, prot, flags, offset);
-	}
-	up_write(&current->mm->mmap_sem);
+	addr = do_mmap(file, addr, len, prot, flags, offset);
 #endif
 	DBG("ia32_do_mmap: returning 0x%lx\n", addr);
 	return addr;
Index: linux-2.6-2/arch/ia64/kernel/sys_ia64.c
===================================================================
--- linux-2.6-2.orig/arch/ia64/kernel/sys_ia64.c
+++ linux-2.6-2/arch/ia64/kernel/sys_ia64.c
@@ -209,9 +209,7 @@ do_mmap2 (unsigned long addr, unsigned l
 		goto out;
 	}
 
-	down_write(&current->mm->mmap_sem);
 	addr = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
-	up_write(&current->mm->mmap_sem);
 
 out:	if (file)
 		fput(file);
Index: linux-2.6-2/arch/m32r/kernel/sys_m32r.c
===================================================================
--- linux-2.6-2.orig/arch/m32r/kernel/sys_m32r.c
+++ linux-2.6-2/arch/m32r/kernel/sys_m32r.c
@@ -110,9 +110,7 @@ asmlinkage long sys_mmap2(unsigned long 
 			goto out;
 	}
 
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
-	up_write(&current->mm->mmap_sem);
 
 	if (file)
 		fput(file);
Index: linux-2.6-2/arch/m68k/kernel/sys_m68k.c
===================================================================
--- linux-2.6-2.orig/arch/m68k/kernel/sys_m68k.c
+++ linux-2.6-2/arch/m68k/kernel/sys_m68k.c
@@ -63,9 +63,7 @@ static inline long do_mmap2(
 			goto out;
 	}
 
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
-	up_write(&current->mm->mmap_sem);
 
 	if (file)
 		fput(file);
@@ -150,9 +148,7 @@ asmlinkage long sys_mmap64(struct mmap_a
 	}
 	a.flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
 
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap_pgoff(file, a.addr, a.len, a.prot, a.flags, pgoff);
-	up_write(&current->mm->mmap_sem);
 	if (file)
 		fput(file);
 out:
Index: linux-2.6-2/arch/m68knommu/kernel/sys_m68k.c
===================================================================
--- linux-2.6-2.orig/arch/m68knommu/kernel/sys_m68k.c
+++ linux-2.6-2/arch/m68knommu/kernel/sys_m68k.c
@@ -61,9 +61,7 @@ static inline long do_mmap2(
 			goto out;
 	}
 
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
-	up_write(&current->mm->mmap_sem);
 
 	if (file)
 		fput(file);
Index: linux-2.6-2/arch/mips/kernel/irixelf.c
===================================================================
--- linux-2.6-2.orig/arch/mips/kernel/irixelf.c
+++ linux-2.6-2/arch/mips/kernel/irixelf.c
@@ -342,12 +342,10 @@ static unsigned int load_irix_interp(str
 			         (unsigned long)
 			         (eppnt->p_offset & 0xfffff000));
 
-			down_write(&current->mm->mmap_sem);
 			error = do_mmap(interpreter, vaddr,
 			eppnt->p_filesz + (eppnt->p_vaddr & 0xfff),
 			elf_prot, elf_type,
 			eppnt->p_offset & 0xfffff000);
-			up_write(&current->mm->mmap_sem);
 
 			if (error < 0 && error > -1024) {
 				printk("Aieee IRIX interp mmap error=%d\n",
@@ -514,12 +512,10 @@ static inline void map_executable(struct
 		prot  = (epp->p_flags & PF_R) ? PROT_READ : 0;
 		prot |= (epp->p_flags & PF_W) ? PROT_WRITE : 0;
 		prot |= (epp->p_flags & PF_X) ? PROT_EXEC : 0;
-	        down_write(&current->mm->mmap_sem);
 		(void) do_mmap(fp, (epp->p_vaddr & 0xfffff000),
 			       (epp->p_filesz + (epp->p_vaddr & 0xfff)),
 			       prot, EXEC_MAP_FLAGS,
 			       (epp->p_offset & 0xfffff000));
-	        up_write(&current->mm->mmap_sem);
 
 		/* Fixup location tracking vars. */
 		if ((epp->p_vaddr & 0xfffff000) < *estack)
@@ -798,10 +794,8 @@ static int load_irix_binary(struct linux
 	 * Since we do not have the power to recompile these, we
 	 * emulate the SVr4 behavior.  Sigh.
 	 */
-	down_write(&current->mm->mmap_sem);
 	(void) do_mmap(NULL, 0, 4096, PROT_READ | PROT_EXEC,
 		       MAP_FIXED | MAP_PRIVATE, 0);
-	up_write(&current->mm->mmap_sem);
 #endif
 
 	start_thread(regs, elf_entry, bprm->p);
@@ -871,14 +865,12 @@ static int load_irix_library(struct file
 	while (elf_phdata->p_type != PT_LOAD) elf_phdata++;
 
 	/* Now use mmap to map the library into memory. */
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap(file,
 			elf_phdata->p_vaddr & 0xfffff000,
 			elf_phdata->p_filesz + (elf_phdata->p_vaddr & 0xfff),
 			PROT_READ | PROT_WRITE | PROT_EXEC,
 			MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE,
 			elf_phdata->p_offset & 0xfffff000);
-	up_write(&current->mm->mmap_sem);
 
 	k = elf_phdata->p_vaddr + elf_phdata->p_filesz;
 	if (k > elf_bss) elf_bss = k;
@@ -959,12 +951,10 @@ unsigned long irix_mapelf(int fd, struct
 		prot |= (flags & PF_W) ? PROT_WRITE : 0;
 		prot |= (flags & PF_X) ? PROT_EXEC : 0;
 
-		down_write(&current->mm->mmap_sem);
 		retval = do_mmap(filp, (vaddr & 0xfffff000),
 				 (filesz + (vaddr & 0xfff)),
 				 prot, (MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE),
 				 (offset & 0xfffff000));
-		up_write(&current->mm->mmap_sem);
 
 		if (retval != (vaddr & 0xfffff000)) {
 			printk("irix_mapelf: do_mmap fails with %d!\n", retval);
Index: linux-2.6-2/arch/mips/kernel/linux32.c
===================================================================
--- linux-2.6-2.orig/arch/mips/kernel/linux32.c
+++ linux-2.6-2/arch/mips/kernel/linux32.c
@@ -119,9 +119,7 @@ sys32_mmap2(unsigned long addr, unsigned
 	}
 	flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
 
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
-	up_write(&current->mm->mmap_sem);
 	if (file)
 		fput(file);
 
Index: linux-2.6-2/arch/mips/kernel/syscall.c
===================================================================
--- linux-2.6-2.orig/arch/mips/kernel/syscall.c
+++ linux-2.6-2/arch/mips/kernel/syscall.c
@@ -136,9 +136,7 @@ do_mmap2(unsigned long addr, unsigned lo
 			goto out;
 	}
 
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
-	up_write(&current->mm->mmap_sem);
 
 	if (file)
 		fput(file);
Index: linux-2.6-2/arch/mips/kernel/sysirix.c
===================================================================
--- linux-2.6-2.orig/arch/mips/kernel/sysirix.c
+++ linux-2.6-2/arch/mips/kernel/sysirix.c
@@ -1051,9 +1051,7 @@ asmlinkage unsigned long irix_mmap32(uns
 
 	flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
 
-	down_write(&current->mm->mmap_sem);
 	retval = do_mmap(file, addr, len, prot, flags, offset);
-	up_write(&current->mm->mmap_sem);
 	if (file)
 		fput(file);
 
@@ -1536,10 +1534,7 @@ asmlinkage int irix_mmap64(struct pt_reg
 
 	flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
 
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
-	up_write(&current->mm->mmap_sem);
-
 	if (file)
 		fput(file);
 
Index: linux-2.6-2/arch/parisc/kernel/sys_parisc.c
===================================================================
--- linux-2.6-2.orig/arch/parisc/kernel/sys_parisc.c
+++ linux-2.6-2/arch/parisc/kernel/sys_parisc.c
@@ -137,9 +137,7 @@ static unsigned long do_mmap2(unsigned l
 
 	flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
 
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
-	up_write(&current->mm->mmap_sem);
 
 	if (file != NULL)
 		fput(file);
Index: linux-2.6-2/arch/powerpc/kernel/syscalls.c
===================================================================
--- linux-2.6-2.orig/arch/powerpc/kernel/syscalls.c
+++ linux-2.6-2/arch/powerpc/kernel/syscalls.c
@@ -175,9 +175,7 @@ static inline unsigned long do_mmap2(uns
 
 	flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
 
-	down_write(&current->mm->mmap_sem);
 	ret = do_mmap_pgoff(file, addr, len, prot, flags, off);
-	up_write(&current->mm->mmap_sem);
 	if (file)
 		fput(file);
 out:
Index: linux-2.6-2/arch/s390/kernel/compat_linux.c
===================================================================
--- linux-2.6-2.orig/arch/s390/kernel/compat_linux.c
+++ linux-2.6-2/arch/s390/kernel/compat_linux.c
@@ -860,14 +860,15 @@ static inline long do_mmap2(
 			goto out;
 	}
 
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
 	if (!IS_ERR((void *) error) && error + len >= 0x80000000ULL) {
 		/* Result is out of bounds.  */
+		/* XXX fix race - APZ */
+		down_write(&current->mm->mmap_sem);
 		do_munmap(current->mm, addr, len);
+		up_write(&current->mm->mmap_sem);
 		error = -ENOMEM;
 	}
-	up_write(&current->mm->mmap_sem);
 
 	if (file)
 		fput(file);
Index: linux-2.6-2/arch/s390/kernel/sys_s390.c
===================================================================
--- linux-2.6-2.orig/arch/s390/kernel/sys_s390.c
+++ linux-2.6-2/arch/s390/kernel/sys_s390.c
@@ -65,9 +65,7 @@ static inline long do_mmap2(
 			goto out;
 	}
 
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
-	up_write(&current->mm->mmap_sem);
 
 	if (file)
 		fput(file);
Index: linux-2.6-2/arch/sh/kernel/sys_sh.c
===================================================================
--- linux-2.6-2.orig/arch/sh/kernel/sys_sh.c
+++ linux-2.6-2/arch/sh/kernel/sys_sh.c
@@ -153,9 +153,7 @@ do_mmap2(unsigned long addr, unsigned lo
 			goto out;
 	}
 
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
-	up_write(&current->mm->mmap_sem);
 
 	if (file)
 		fput(file);
Index: linux-2.6-2/arch/sh64/kernel/sys_sh64.c
===================================================================
--- linux-2.6-2.orig/arch/sh64/kernel/sys_sh64.c
+++ linux-2.6-2/arch/sh64/kernel/sys_sh64.c
@@ -148,9 +148,7 @@ static inline long do_mmap2(
 			goto out;
 	}
 
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
-	up_write(&current->mm->mmap_sem);
 
 	if (file)
 		fput(file);
Index: linux-2.6-2/arch/sparc/kernel/sys_sparc.c
===================================================================
--- linux-2.6-2.orig/arch/sparc/kernel/sys_sparc.c
+++ linux-2.6-2/arch/sparc/kernel/sys_sparc.c
@@ -252,9 +252,7 @@ static unsigned long do_mmap2(unsigned l
 	len = PAGE_ALIGN(len);
 	flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
 
-	down_write(&current->mm->mmap_sem);
 	retval = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
-	up_write(&current->mm->mmap_sem);
 
 	if (file)
 		fput(file);
Index: linux-2.6-2/arch/sparc/kernel/sys_sunos.c
===================================================================
--- linux-2.6-2.orig/arch/sparc/kernel/sys_sunos.c
+++ linux-2.6-2/arch/sparc/kernel/sys_sunos.c
@@ -120,9 +120,7 @@ asmlinkage unsigned long sunos_mmap(unsi
 	}
 
 	flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
-	down_write(&current->mm->mmap_sem);
 	retval = do_mmap(file, addr, len, prot, flags, off);
-	up_write(&current->mm->mmap_sem);
 	if (!ret_type)
 		retval = ((retval < PAGE_OFFSET) ? 0 : retval);
 
Index: linux-2.6-2/arch/sparc64/kernel/binfmt_aout32.c
===================================================================
--- linux-2.6-2.orig/arch/sparc64/kernel/binfmt_aout32.c
+++ linux-2.6-2/arch/sparc64/kernel/binfmt_aout32.c
@@ -290,24 +290,20 @@ static int load_aout32_binary(struct lin
 			goto beyond_if;
 		}
 
-	        down_write(&current->mm->mmap_sem);
 		error = do_mmap(bprm->file, N_TXTADDR(ex), ex.a_text,
 			PROT_READ | PROT_EXEC,
 			MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE,
 			fd_offset);
-	        up_write(&current->mm->mmap_sem);
 
 		if (error != N_TXTADDR(ex)) {
 			send_sig(SIGKILL, current, 0);
 			return error;
 		}
 
-	        down_write(&current->mm->mmap_sem);
  		error = do_mmap(bprm->file, N_DATADDR(ex), ex.a_data,
 				PROT_READ | PROT_WRITE | PROT_EXEC,
 				MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE,
 				fd_offset + ex.a_text);
-	        up_write(&current->mm->mmap_sem);
 		if (error != N_DATADDR(ex)) {
 			send_sig(SIGKILL, current, 0);
 			return error;
@@ -379,12 +375,10 @@ static int load_aout32_library(struct fi
 	start_addr =  ex.a_entry & 0xfffff000;
 
 	/* Now use mmap to map the library into memory. */
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap(file, start_addr, ex.a_text + ex.a_data,
 			PROT_READ | PROT_WRITE | PROT_EXEC,
 			MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE,
 			N_TXTOFF(ex));
-	up_write(&current->mm->mmap_sem);
 	retval = error;
 	if (error != start_addr)
 		goto out;
Index: linux-2.6-2/arch/sparc64/kernel/sys_sparc.c
===================================================================
--- linux-2.6-2.orig/arch/sparc64/kernel/sys_sparc.c
+++ linux-2.6-2/arch/sparc64/kernel/sys_sparc.c
@@ -576,9 +576,7 @@ asmlinkage unsigned long sys_mmap(unsign
 	flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
 	len = PAGE_ALIGN(len);
 
-	down_write(&current->mm->mmap_sem);
 	retval = do_mmap(file, addr, len, prot, flags, off);
-	up_write(&current->mm->mmap_sem);
 
 	if (file)
 		fput(file);
Index: linux-2.6-2/arch/sparc64/kernel/sys_sunos32.c
===================================================================
--- linux-2.6-2.orig/arch/sparc64/kernel/sys_sunos32.c
+++ linux-2.6-2/arch/sparc64/kernel/sys_sunos32.c
@@ -99,12 +99,10 @@ asmlinkage u32 sunos_mmap(u32 addr, u32 
 	flags &= ~_MAP_NEW;
 
 	flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
-	down_write(&current->mm->mmap_sem);
 	retval = do_mmap(file,
 			 (unsigned long) addr, (unsigned long) len,
 			 (unsigned long) prot, (unsigned long) flags,
 			 (unsigned long) off);
-	up_write(&current->mm->mmap_sem);
 	if (!ret_type)
 		retval = ((retval < 0xf0000000) ? 0 : retval);
 out_putf:
Index: linux-2.6-2/arch/sparc64/solaris/misc.c
===================================================================
--- linux-2.6-2.orig/arch/sparc64/solaris/misc.c
+++ linux-2.6-2/arch/sparc64/solaris/misc.c
@@ -95,12 +95,10 @@ static u32 do_solaris_mmap(u32 addr, u32
 	ret_type = flags & _MAP_NEW;
 	flags &= ~_MAP_NEW;
 
-	down_write(&current->mm->mmap_sem);
 	flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
 	retval = do_mmap(file,
 			 (unsigned long) addr, (unsigned long) len,
 			 (unsigned long) prot, (unsigned long) flags, off);
-	up_write(&current->mm->mmap_sem);
 	if(!ret_type)
 		retval = ((retval < STACK_TOP32) ? 0 : retval);
 	                        
Index: linux-2.6-2/arch/um/kernel/syscall.c
===================================================================
--- linux-2.6-2.orig/arch/um/kernel/syscall.c
+++ linux-2.6-2/arch/um/kernel/syscall.c
@@ -54,9 +54,7 @@ long sys_mmap2(unsigned long addr, unsig
 			goto out;
 	}
 
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
-	up_write(&current->mm->mmap_sem);
 
 	if (file)
 		fput(file);
Index: linux-2.6-2/arch/v850/kernel/syscalls.c
===================================================================
--- linux-2.6-2.orig/arch/v850/kernel/syscalls.c
+++ linux-2.6-2/arch/v850/kernel/syscalls.c
@@ -164,9 +164,7 @@ do_mmap2 (unsigned long addr, size_t len
 			goto out;
 	}
 	
-	down_write (&current->mm->mmap_sem);
 	ret = do_mmap_pgoff (file, addr, len, prot, flags, pgoff);
-	up_write (&current->mm->mmap_sem);
 	if (file)
 		fput (file);
 out:
Index: linux-2.6-2/arch/x86/ia32/ia32_aout.c
===================================================================
--- linux-2.6-2.orig/arch/x86/ia32/ia32_aout.c
+++ linux-2.6-2/arch/x86/ia32/ia32_aout.c
@@ -374,24 +374,20 @@ static int load_aout_binary(struct linux
 			goto beyond_if;
 		}
 
-		down_write(&current->mm->mmap_sem);
 		error = do_mmap(bprm->file, N_TXTADDR(ex), ex.a_text,
 			PROT_READ | PROT_EXEC,
 			MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE | MAP_32BIT,
 			fd_offset);
-		up_write(&current->mm->mmap_sem);
 
 		if (error != N_TXTADDR(ex)) {
 			send_sig(SIGKILL, current, 0);
 			return error;
 		}
 
-		down_write(&current->mm->mmap_sem);
  		error = do_mmap(bprm->file, N_DATADDR(ex), ex.a_data,
 				PROT_READ | PROT_WRITE | PROT_EXEC,
 				MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE | MAP_32BIT,
 				fd_offset + ex.a_text);
-		up_write(&current->mm->mmap_sem);
 		if (error != N_DATADDR(ex)) {
 			send_sig(SIGKILL, current, 0);
 			return error;
@@ -488,12 +484,10 @@ static int load_aout_library(struct file
 		goto out;
 	}
 	/* Now use mmap to map the library into memory. */
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap(file, start_addr, ex.a_text + ex.a_data,
 			PROT_READ | PROT_WRITE | PROT_EXEC,
 			MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE | MAP_32BIT,
 			N_TXTOFF(ex));
-	up_write(&current->mm->mmap_sem);
 	retval = error;
 	if (error != start_addr)
 		goto out;
Index: linux-2.6-2/arch/x86/ia32/sys_ia32.c
===================================================================
--- linux-2.6-2.orig/arch/x86/ia32/sys_ia32.c
+++ linux-2.6-2/arch/x86/ia32/sys_ia32.c
@@ -242,13 +242,10 @@ sys32_mmap(struct mmap_arg_struct __user
 	}
 	
 	mm = current->mm; 
-	down_write(&mm->mmap_sem); 
 	retval = do_mmap_pgoff(file, a.addr, a.len, a.prot, a.flags, a.offset>>PAGE_SHIFT);
 	if (file)
 		fput(file);
 
-	up_write(&mm->mmap_sem); 
-
 	return retval;
 }
 
@@ -708,9 +705,7 @@ asmlinkage long sys32_mmap2(unsigned lon
 			return -EBADF;
 	}
 
-	down_write(&mm->mmap_sem);
 	error = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
-	up_write(&mm->mmap_sem);
 
 	if (file)
 		fput(file);
Index: linux-2.6-2/arch/x86/kernel/sys_i386_32.c
===================================================================
--- linux-2.6-2.orig/arch/x86/kernel/sys_i386_32.c
+++ linux-2.6-2/arch/x86/kernel/sys_i386_32.c
@@ -54,9 +54,7 @@ asmlinkage long sys_mmap2(unsigned long 
 			goto out;
 	}
 
-	down_write(&mm->mmap_sem);
 	error = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
-	up_write(&mm->mmap_sem);
 
 	if (file)
 		fput(file);
Index: linux-2.6-2/arch/x86/kernel/sys_x86_64.c
===================================================================
--- linux-2.6-2.orig/arch/x86/kernel/sys_x86_64.c
+++ linux-2.6-2/arch/x86/kernel/sys_x86_64.c
@@ -51,9 +51,7 @@ asmlinkage long sys_mmap(unsigned long a
 		if (!file)
 			goto out;
 	}
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap_pgoff(file, addr, len, prot, flags, off >> PAGE_SHIFT);
-	up_write(&current->mm->mmap_sem);
 
 	if (file)
 		fput(file);
Index: linux-2.6-2/arch/xtensa/kernel/syscall.c
===================================================================
--- linux-2.6-2.orig/arch/xtensa/kernel/syscall.c
+++ linux-2.6-2/arch/xtensa/kernel/syscall.c
@@ -72,9 +72,7 @@ asmlinkage long xtensa_mmap2(unsigned lo
 			goto out;
 	}
 
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
-	up_write(&current->mm->mmap_sem);
 
 	if (file)
 		fput(file);
Index: linux-2.6-2/drivers/char/drm/drm_bufs.c
===================================================================
--- linux-2.6-2.orig/drivers/char/drm/drm_bufs.c
+++ linux-2.6-2/drivers/char/drm/drm_bufs.c
@@ -1517,18 +1517,14 @@ int drm_mapbufs(struct drm_device *dev, 
 				retcode = -EINVAL;
 				goto done;
 			}
-			down_write(&current->mm->mmap_sem);
 			virtual = do_mmap(file_priv->filp, 0, map->size,
 					  PROT_READ | PROT_WRITE,
 					  MAP_SHARED,
 					  token);
-			up_write(&current->mm->mmap_sem);
 		} else {
-			down_write(&current->mm->mmap_sem);
 			virtual = do_mmap(file_priv->filp, 0, dma->byte_count,
 					  PROT_READ | PROT_WRITE,
 					  MAP_SHARED, 0);
-			up_write(&current->mm->mmap_sem);
 		}
 		if (virtual > -1024UL) {
 			/* Real error */
Index: linux-2.6-2/drivers/char/drm/i810_dma.c
===================================================================
--- linux-2.6-2.orig/drivers/char/drm/i810_dma.c
+++ linux-2.6-2/drivers/char/drm/i810_dma.c
@@ -131,7 +131,6 @@ static int i810_map_buffer(struct drm_bu
 	if (buf_priv->currently_mapped == I810_BUF_MAPPED)
 		return -EINVAL;
 
-	down_write(&current->mm->mmap_sem);
 	old_fops = file_priv->filp->f_op;
 	file_priv->filp->f_op = &i810_buffer_fops;
 	dev_priv->mmap_buffer = buf;
@@ -146,7 +145,6 @@ static int i810_map_buffer(struct drm_bu
 		retcode = PTR_ERR(buf_priv->virtual);
 		buf_priv->virtual = NULL;
 	}
-	up_write(&current->mm->mmap_sem);
 
 	return retcode;
 }
Index: linux-2.6-2/fs/aio.c
===================================================================
--- linux-2.6-2.orig/fs/aio.c
+++ linux-2.6-2/fs/aio.c
@@ -129,18 +129,18 @@ static int aio_setup_ring(struct kioctx 
 
 	info->mmap_size = nr_pages * PAGE_SIZE;
 	dprintk("attempting mmap of %lu bytes\n", info->mmap_size);
-	down_write(&ctx->mm->mmap_sem);
+	WARN_ON(ctx->mm != current->mm);
 	info->mmap_base = do_mmap(NULL, 0, info->mmap_size, 
 				  PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE,
 				  0);
 	if (IS_ERR((void *)info->mmap_base)) {
-		up_write(&ctx->mm->mmap_sem);
 		info->mmap_size = 0;
 		aio_free_ring(ctx);
 		return -EAGAIN;
 	}
 
 	dprintk("mmap address: 0x%08lx\n", info->mmap_base);
+	down_write(&ctx->mm->mmap_sem);
 	info->nr_pages = get_user_pages(current, ctx->mm,
 					info->mmap_base, nr_pages, 
 					1, 0, info->ring_pages, NULL);
Index: linux-2.6-2/fs/binfmt_aout.c
===================================================================
--- linux-2.6-2.orig/fs/binfmt_aout.c
+++ linux-2.6-2/fs/binfmt_aout.c
@@ -403,24 +403,20 @@ static int load_aout_binary(struct linux
 			goto beyond_if;
 		}
 
-		down_write(&current->mm->mmap_sem);
 		error = do_mmap(bprm->file, N_TXTADDR(ex), ex.a_text,
 			PROT_READ | PROT_EXEC,
 			MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE,
 			fd_offset);
-		up_write(&current->mm->mmap_sem);
 
 		if (error != N_TXTADDR(ex)) {
 			send_sig(SIGKILL, current, 0);
 			return error;
 		}
 
-		down_write(&current->mm->mmap_sem);
  		error = do_mmap(bprm->file, N_DATADDR(ex), ex.a_data,
 				PROT_READ | PROT_WRITE | PROT_EXEC,
 				MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE,
 				fd_offset + ex.a_text);
-		up_write(&current->mm->mmap_sem);
 		if (error != N_DATADDR(ex)) {
 			send_sig(SIGKILL, current, 0);
 			return error;
@@ -518,12 +514,10 @@ static int load_aout_library(struct file
 		goto out;
 	}
 	/* Now use mmap to map the library into memory. */
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap(file, start_addr, ex.a_text + ex.a_data,
 			PROT_READ | PROT_WRITE | PROT_EXEC,
 			MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE,
 			N_TXTOFF(ex));
-	up_write(&current->mm->mmap_sem);
 	retval = error;
 	if (error != start_addr)
 		goto out;
Index: linux-2.6-2/fs/binfmt_elf.c
===================================================================
--- linux-2.6-2.orig/fs/binfmt_elf.c
+++ linux-2.6-2/fs/binfmt_elf.c
@@ -303,7 +303,6 @@ static unsigned long elf_map(struct file
 	unsigned long map_addr;
 	unsigned long pageoffset = ELF_PAGEOFFSET(eppnt->p_vaddr);
 
-	down_write(&current->mm->mmap_sem);
 	/* mmap() will return -EINVAL if given a zero size, but a
 	 * segment with zero filesize is perfectly valid */
 	if (eppnt->p_filesz + pageoffset)
@@ -312,7 +311,6 @@ static unsigned long elf_map(struct file
 				   eppnt->p_offset - pageoffset);
 	else
 		map_addr = ELF_PAGESTART(addr);
-	up_write(&current->mm->mmap_sem);
 	return(map_addr);
 }
 
@@ -1026,10 +1024,8 @@ static int load_elf_binary(struct linux_
 		   and some applications "depend" upon this behavior.
 		   Since we do not have the power to recompile these, we
 		   emulate the SVr4 behavior. Sigh. */
-		down_write(&current->mm->mmap_sem);
 		error = do_mmap(NULL, 0, PAGE_SIZE, PROT_READ | PROT_EXEC,
 				MAP_FIXED | MAP_PRIVATE, 0);
-		up_write(&current->mm->mmap_sem);
 	}
 
 #ifdef ELF_PLAT_INIT
@@ -1125,7 +1121,6 @@ static int load_elf_library(struct file 
 		eppnt++;
 
 	/* Now use mmap to map the library into memory. */
-	down_write(&current->mm->mmap_sem);
 	error = do_mmap(file,
 			ELF_PAGESTART(eppnt->p_vaddr),
 			(eppnt->p_filesz +
@@ -1134,7 +1129,6 @@ static int load_elf_library(struct file 
 			MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE,
 			(eppnt->p_offset -
 			 ELF_PAGEOFFSET(eppnt->p_vaddr)));
-	up_write(&current->mm->mmap_sem);
 	if (error != ELF_PAGESTART(eppnt->p_vaddr))
 		goto out_free_ph;
 
Index: linux-2.6-2/fs/binfmt_elf_fdpic.c
===================================================================
--- linux-2.6-2.orig/fs/binfmt_elf_fdpic.c
+++ linux-2.6-2/fs/binfmt_elf_fdpic.c
@@ -370,14 +370,12 @@ static int load_elf_fdpic_binary(struct 
 	if (stack_size < PAGE_SIZE * 2)
 		stack_size = PAGE_SIZE * 2;
 
-	down_write(&current->mm->mmap_sem);
 	current->mm->start_brk = do_mmap(NULL, 0, stack_size,
 					 PROT_READ | PROT_WRITE | PROT_EXEC,
 					 MAP_PRIVATE | MAP_ANONYMOUS | MAP_GROWSDOWN,
 					 0);
 
 	if (IS_ERR_VALUE(current->mm->start_brk)) {
-		up_write(&current->mm->mmap_sem);
 		retval = current->mm->start_brk;
 		current->mm->start_brk = 0;
 		goto error_kill;
@@ -385,6 +383,7 @@ static int load_elf_fdpic_binary(struct 
 
 	/* expand the stack mapping to use up the entire allocation granule */
 	fullsize = ksize((char *) current->mm->start_brk);
+	down_write(&current->mm->mmap_sem);
 	if (!IS_ERR_VALUE(do_mremap(current->mm->start_brk, stack_size,
 				    fullsize, 0, 0)))
 		stack_size = fullsize;
@@ -904,10 +903,8 @@ static int elf_fdpic_map_file_constdisp_
 	if (params->flags & ELF_FDPIC_FLAG_EXECUTABLE)
 		mflags |= MAP_EXECUTABLE;
 
-	down_write(&mm->mmap_sem);
 	maddr = do_mmap(NULL, load_addr, top - base,
 			PROT_READ | PROT_WRITE | PROT_EXEC, mflags, 0);
-	up_write(&mm->mmap_sem);
 	if (IS_ERR_VALUE(maddr))
 		return (int) maddr;
 
@@ -1050,10 +1047,8 @@ static int elf_fdpic_map_file_by_direct_
 
 		/* create the mapping */
 		disp = phdr->p_vaddr & ~PAGE_MASK;
-		down_write(&mm->mmap_sem);
 		maddr = do_mmap(file, maddr, phdr->p_memsz + disp, prot, flags,
 				phdr->p_offset - disp);
-		up_write(&mm->mmap_sem);
 
 		kdebug("mmap[%d] <file> sz=%lx pr=%x fl=%x of=%lx --> %08lx",
 		       loop, phdr->p_memsz + disp, prot, flags,
@@ -1096,10 +1091,8 @@ static int elf_fdpic_map_file_by_direct_
 			unsigned long xmaddr;
 
 			flags |= MAP_FIXED | MAP_ANONYMOUS;
-			down_write(&mm->mmap_sem);
 			xmaddr = do_mmap(NULL, xaddr, excess - excess1,
 					 prot, flags, 0);
-			up_write(&mm->mmap_sem);
 
 			kdebug("mmap[%d] <anon>"
 			       " ad=%lx sz=%lx pr=%x fl=%x of=0 --> %08lx",
Index: linux-2.6-2/fs/binfmt_flat.c
===================================================================
--- linux-2.6-2.orig/fs/binfmt_flat.c
+++ linux-2.6-2/fs/binfmt_flat.c
@@ -531,9 +531,7 @@ static int load_flat_file(struct linux_b
 		 */
 		DBG_FLT("BINFMT_FLAT: ROM mapping of file (we hope)\n");
 
-		down_write(&current->mm->mmap_sem);
 		textpos = do_mmap(bprm->file, 0, text_len, PROT_READ|PROT_EXEC, MAP_PRIVATE, 0);
-		up_write(&current->mm->mmap_sem);
 		if (!textpos  || textpos >= (unsigned long) -4096) {
 			if (!textpos)
 				textpos = (unsigned long) -ENOMEM;
@@ -544,7 +542,7 @@ static int load_flat_file(struct linux_b
 
 		len = data_len + extra + MAX_SHARED_LIBS * sizeof(unsigned long);
 		down_write(&current->mm->mmap_sem);
-		realdatastart = do_mmap(0, 0, len,
+		realdatastart = do_mmap_locked(0, 0, len,
 			PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, 0);
 		/* Remap to use all availabe slack region space */
 		if (realdatastart && (realdatastart < (unsigned long)-4096)) {
@@ -596,7 +594,7 @@ static int load_flat_file(struct linux_b
 
 		len = text_len + data_len + extra + MAX_SHARED_LIBS * sizeof(unsigned long);
 		down_write(&current->mm->mmap_sem);
-		textpos = do_mmap(0, 0, len,
+		textpos = do_mmap_locked(0, 0, len,
 			PROT_READ | PROT_EXEC | PROT_WRITE, MAP_PRIVATE, 0);
 		/* Remap to use all availabe slack region space */
 		if (textpos && (textpos < (unsigned long) -4096)) {
Index: linux-2.6-2/fs/binfmt_som.c
===================================================================
--- linux-2.6-2.orig/fs/binfmt_som.c
+++ linux-2.6-2/fs/binfmt_som.c
@@ -148,10 +148,8 @@ static int map_som_binary(struct file *f
 	code_size = SOM_PAGEALIGN(hpuxhdr->exec_tsize);
 	current->mm->start_code = code_start;
 	current->mm->end_code = code_start + code_size;
-	down_write(&current->mm->mmap_sem);
 	retval = do_mmap(file, code_start, code_size, prot,
 			flags, SOM_PAGESTART(hpuxhdr->exec_tfile));
-	up_write(&current->mm->mmap_sem);
 	if (retval < 0 && retval > -1024)
 		goto out;
 
@@ -159,20 +157,16 @@ static int map_som_binary(struct file *f
 	data_size = SOM_PAGEALIGN(hpuxhdr->exec_dsize);
 	current->mm->start_data = data_start;
 	current->mm->end_data = bss_start = data_start + data_size;
-	down_write(&current->mm->mmap_sem);
 	retval = do_mmap(file, data_start, data_size,
 			prot | PROT_WRITE, flags,
 			SOM_PAGESTART(hpuxhdr->exec_dfile));
-	up_write(&current->mm->mmap_sem);
 	if (retval < 0 && retval > -1024)
 		goto out;
 
 	som_brk = bss_start + SOM_PAGEALIGN(hpuxhdr->exec_bsize);
 	current->mm->start_brk = current->mm->brk = som_brk;
-	down_write(&current->mm->mmap_sem);
 	retval = do_mmap(NULL, bss_start, som_brk - bss_start,
 			prot | PROT_WRITE, MAP_FIXED | MAP_PRIVATE, 0);
-	up_write(&current->mm->mmap_sem);
 	if (retval > 0 || retval < -1024)
 		retval = 0;
 out:
Index: linux-2.6-2/fs/nfs/file.c
===================================================================
--- linux-2.6-2.orig/fs/nfs/file.c
+++ linux-2.6-2/fs/nfs/file.c
@@ -41,6 +41,9 @@
 static int nfs_file_open(struct inode *, struct file *);
 static int nfs_file_release(struct inode *, struct file *);
 static loff_t nfs_file_llseek(struct file *file, loff_t offset, int origin);
+static int
+nfs_file_mmap_prepare(struct file * file, unsigned long addr, unsigned long len,
+		unsigned long prot, unsigned long flags, unsigned long pgoff);
 static int  nfs_file_mmap(struct file *, struct vm_area_struct *);
 static ssize_t nfs_file_splice_read(struct file *filp, loff_t *ppos,
 					struct pipe_inode_info *pipe,
@@ -64,6 +67,7 @@ const struct file_operations nfs_file_op
 	.write		= do_sync_write,
 	.aio_read	= nfs_file_read,
 	.aio_write	= nfs_file_write,
+	.mmap_prepare	= nfs_file_mmap_prepare,
 	.mmap		= nfs_file_mmap,
 	.open		= nfs_file_open,
 	.flush		= nfs_file_flush,
@@ -270,7 +274,8 @@ nfs_file_splice_read(struct file *filp, 
 }
 
 static int
-nfs_file_mmap(struct file * file, struct vm_area_struct * vma)
+nfs_file_mmap_prepare(struct file * file, unsigned long addr, unsigned long len,
+		unsigned long prot, unsigned long flags, unsigned long pgoff)
 {
 	struct dentry *dentry = file->f_path.dentry;
 	struct inode *inode = dentry->d_inode;
@@ -279,13 +284,17 @@ nfs_file_mmap(struct file * file, struct
 	dfprintk(VFS, "nfs: mmap(%s/%s)\n",
 		dentry->d_parent->d_name.name, dentry->d_name.name);
 
-	status = nfs_revalidate_mapping(inode, file->f_mapping);
-	if (!status) {
-		vma->vm_ops = &nfs_file_vm_ops;
-		vma->vm_flags |= VM_CAN_NONLINEAR;
-		file_accessed(file);
-	}
-	return status;
+	return nfs_revalidate_mapping(inode, file->f_mapping);
+}
+
+static int
+nfs_file_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	vma->vm_ops = &nfs_file_vm_ops;
+	vma->vm_flags |= VM_CAN_NONLINEAR;
+	file_accessed(file);
+
+	return 0;
 }
 
 /*
Index: linux-2.6-2/include/linux/fs.h
===================================================================
--- linux-2.6-2.orig/include/linux/fs.h
+++ linux-2.6-2/include/linux/fs.h
@@ -1172,6 +1172,7 @@ struct file_operations {
 	int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
+	int (*mmap_prepare) (struct file *, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long pgoff);
 	int (*mmap) (struct file *, struct vm_area_struct *);
 	int (*open) (struct inode *, struct file *);
 	int (*flush) (struct file *, fl_owner_t id);
Index: linux-2.6-2/include/linux/mm.h
===================================================================
--- linux-2.6-2.orig/include/linux/mm.h
+++ linux-2.6-2/include/linux/mm.h
@@ -980,6 +980,10 @@ extern int install_special_mapping(struc
 
 extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
 
+extern unsigned long do_mmap_pgoff_locked(
+	struct file *file, unsigned long addr,
+	unsigned long len, unsigned long prot,
+	unsigned long flag, unsigned long pgoff);
 extern unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot,
 	unsigned long flag, unsigned long pgoff);
@@ -988,7 +992,7 @@ extern unsigned long mmap_region(struct 
 	unsigned int vm_flags, unsigned long pgoff,
 	int accountable);
 
-static inline unsigned long do_mmap(struct file *file, unsigned long addr,
+static inline unsigned long do_mmap_locked(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot,
 	unsigned long flag, unsigned long offset)
 {
@@ -996,11 +1000,16 @@ static inline unsigned long do_mmap(stru
 	if ((offset + PAGE_ALIGN(len)) < offset)
 		goto out;
 	if (!(offset & ~PAGE_MASK))
-		ret = do_mmap_pgoff(file, addr, len, prot, flag, offset >> PAGE_SHIFT);
+		ret = do_mmap_pgoff_locked(file, addr, len, prot,
+				flag, offset >> PAGE_SHIFT);
 out:
 	return ret;
 }
 
+extern unsigned long do_mmap(struct file *file, unsigned long addr,
+	unsigned long len, unsigned long prot,
+	unsigned long flag, unsigned long offset);
+
 extern int do_munmap(struct mm_struct *, unsigned long, size_t);
 
 extern unsigned long do_brk(unsigned long, unsigned long);
Index: linux-2.6-2/ipc/shm.c
===================================================================
--- linux-2.6-2.orig/ipc/shm.c
+++ linux-2.6-2/ipc/shm.c
@@ -1012,7 +1012,7 @@ long do_shmat(int shmid, char __user *sh
 			goto invalid;
 	}
 		
-	user_addr = do_mmap (file, addr, size, prot, flags, 0);
+	user_addr = do_mmap_locked (file, addr, size, prot, flags, 0);
 	*raddr = user_addr;
 	err = 0;
 	if (IS_ERR_VALUE(user_addr))
Index: linux-2.6-2/mm/mmap.c
===================================================================
--- linux-2.6-2.orig/mm/mmap.c
+++ linux-2.6-2/mm/mmap.c
@@ -888,9 +888,10 @@ void vm_stat_account(struct mm_struct *m
  * The caller must hold down_write(current->mm->mmap_sem).
  */
 
-unsigned long do_mmap_pgoff(struct file * file, unsigned long addr,
-			unsigned long len, unsigned long prot,
-			unsigned long flags, unsigned long pgoff)
+static unsigned long __do_mmap_pgoff_locked(
+		struct file * file, unsigned long addr,
+		unsigned long len, unsigned long prot,
+		unsigned long flags, unsigned long pgoff)
 {
 	struct mm_struct * mm = current->mm;
 	struct inode *inode;
@@ -1026,10 +1027,66 @@ unsigned long do_mmap_pgoff(struct file 
 	return mmap_region(file, addr, len, flags, vm_flags, pgoff,
 			   accountable);
 }
+
+unsigned long do_mmap_pgoff_locked(struct file * file, unsigned long addr,
+		unsigned long len, unsigned long prot,
+		unsigned long flags, unsigned long pgoff)
+{
+	WARN_ON(file && file->f_op->mmap_prepare);
+	return __do_mmap_pgoff_locked(file, addr, len, prot, flags, pgoff);
+}
+EXPORT_SYMBOL(do_mmap_pgoff_locked);
+
+unsigned long do_mmap_pgoff(struct file * file, unsigned long addr,
+			unsigned long len, unsigned long prot,
+			unsigned long flags, unsigned long pgoff)
+{
+	struct mm_struct *mm = current->mm;
+	unsigned long ret;
+
+	if (file && file->f_op->mmap_prepare) {
+		ret = file->f_op->mmap_prepare(file, addr,
+				len, prot, flags, pgoff);
+		if (ret)
+			return ret;
+	}
+
+	down_write(&mm->mmap_sem);
+	ret = __do_mmap_pgoff_locked(file, addr, len, prot, flags, pgoff);
+	up_write(&mm->mmap_sem);
+
+	return ret;
+}
 EXPORT_SYMBOL(do_mmap_pgoff);
 
+unsigned long do_mmap(struct file *file, unsigned long addr,
+		unsigned long len, unsigned long prot,
+		unsigned long flags, unsigned long offset)
+{
+	struct mm_struct *mm = current->mm;
+	unsigned long ret = -EINVAL;
+	unsigned long pgoff = offset >> PAGE_SHIFT;
+
+	if ((offset + PAGE_ALIGN(len)) < offset || (offset & ~PAGE_MASK))
+		return ret;
+
+	if (file && file->f_op->mmap_prepare) {
+		ret = file->f_op->mmap_prepare(file, addr,
+				len, prot, flags, pgoff);
+		if (ret)
+			return ret;
+	}
+
+	down_write(&mm->mmap_sem);
+	ret = __do_mmap_pgoff_locked(file, addr, len, prot, flags, pgoff);
+	up_write(&mm->mmap_sem);
+
+	return ret;
+}
+EXPORT_SYMBOL(do_mmap);
+
 /*
- * Some shared mappigns will want the pages marked read-only
+ * Some shared mappings will want the pages marked read-only
  * to track write events. If so, we'll downgrade vm_page_prot
  * to the private version (using protection_map[] without the
  * VM_SHARED bit).


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dio_get_page() lockdep complaints
  2007-11-11 19:49       ` Peter Zijlstra
@ 2007-11-12  8:45         ` Martin Schwidefsky
  2007-11-12  9:27           ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Martin Schwidefsky @ 2007-11-12  8:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Trond Myklebust, Zach Brown, Jens Axboe, linux-kernel, akpm,
	linux-aio, Chris Mason, linux-mm, Hugh Dickins, Linus Torvalds

On Sun, 2007-11-11 at 20:49 +0100, Peter Zijlstra wrote:
> Right, which gets us into all kinds of trouble because some sites need
> mmap_sem to resolve some races, notably s390 31-bit and shm.

You are refering to the mmap_sem use in compat_linux.c:do_mmap2, aren't
you? That check for adresses > 2GB after the call to do_mmap_pgoff can
be removed since arch_get_unmapped_area already checks against
TASK_SIZE. The result of the do_mmap_pgoff call will never be out of
range. This check is a left-over from the early days of the s390 compat
code.

-- 
blue skies,
  Martin.

"Reality continues to ruin my life." - Calvin.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dio_get_page() lockdep complaints
  2007-11-12  8:45         ` Martin Schwidefsky
@ 2007-11-12  9:27           ` Peter Zijlstra
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2007-11-12  9:27 UTC (permalink / raw)
  To: schwidefsky
  Cc: Trond Myklebust, Zach Brown, Jens Axboe, linux-kernel, akpm,
	linux-aio, Chris Mason, linux-mm, Hugh Dickins, Linus Torvalds

On Mon, 2007-11-12 at 09:45 +0100, Martin Schwidefsky wrote:
> On Sun, 2007-11-11 at 20:49 +0100, Peter Zijlstra wrote:
> > Right, which gets us into all kinds of trouble because some sites need
> > mmap_sem to resolve some races, notably s390 31-bit and shm.
> 
> You are refering to the mmap_sem use in compat_linux.c:do_mmap2, aren't
> you? That check for adresses > 2GB after the call to do_mmap_pgoff can
> be removed since arch_get_unmapped_area already checks against
> TASK_SIZE. The result of the do_mmap_pgoff call will never be out of
> range. This check is a left-over from the early days of the s390 compat
> code.

Correct, that is the one I was referring to. Thanks for the explanation,
I'll clean it up when I take this patch forward.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2007-11-12  9:27 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20070419073828.GB20928@kernel.dk>
2007-04-19  8:01 ` dio_get_page() lockdep complaints Andrew Morton
2007-04-19  8:01   ` Jens Axboe
2007-04-19  8:25     ` Andrew Morton
2007-04-19  8:34       ` Jens Axboe
2007-04-19 12:43         ` Vladimir V. Saveliev
2007-04-19 12:49           ` Jens Axboe
2007-04-19 12:52             ` Jens Axboe
2007-04-19 13:53               ` Roland Dreier
2007-04-19 14:20                 ` Jens Axboe
2007-04-19 14:15         ` Jens Axboe
2007-04-19 14:55           ` Vladimir V. Saveliev
2007-04-19 14:57       ` Vladimir V. Saveliev
2007-04-19 16:42         ` Andrew Morton
2007-04-19 14:36   ` Chris Mason
     [not found] ` <1194627742.6289.175.camel@twins>
     [not found]   ` <4734992C.7000408@oracle.com>
     [not found]     ` <1194630300.7459.65.camel@heimdal.trondhjem.org>
2007-11-11 19:49       ` Peter Zijlstra
2007-11-12  8:45         ` Martin Schwidefsky
2007-11-12  9:27           ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox