Re: More 2.2.17pre9 VM issues

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: More 2.2.17pre9 VM issues
       [not found] ` <Pine.LNX.4.21.0007031314190.12740-100000@inspiron.random>
@ 2000-07-03 13:56   ` Stephen C. Tweedie
  2000-07-03 14:56     ` Andrea Arcangeli
  0 siblings, 1 reply; 3+ messages in thread
From: Stephen C. Tweedie @ 2000-07-03 13:56 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Stephen C. Tweedie, Rik van Riel, Marcelo Tosatti, Jens Axboe,
	Alan Cox, Derek Martin, Linux Kernel, linux-mm

Hi,

On Mon, Jul 03, 2000 at 01:28:48PM +0200, Andrea Arcangeli wrote:

> Stephen, are we really sure we still need kpiod?i

Yes.

> Isn't GFP_IO meant to be
> clear if anybody is helding any filesystem lock (like superblock lock)?

That has never been a requirement, and I'd think it would be
dangerous to make such a new rule so close to 2.4.  

Certainly, in 2.2 we would do all sorts of stuff while holding
filesystem locks (in particular the inode and superblock locks).  Any
file write, including mm page writes, would take the inode lock.  

Right now, in 2.4, the mm locks less but write(2) still takes the
inode lock.  That means that we _must_ be able to allocate with GFP_IO
while holding the inode lock, since (a) write()s go through the page
cache, and (b) touching the user's buffer during the write() can cause
pages to be swapped in, invoking parts of the VM which assume they are
able to use GFP_IO safely.

Sure, you could audit every single path through every filesystem to
make sure that there are no possible deadlocks here, but the whole
point of kpiod is to separate out the pageout from the process doing
the write() in such situations to make deadlock impossible.

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: More 2.2.17pre9 VM issues
  2000-07-03 13:56   ` More 2.2.17pre9 VM issues Stephen C. Tweedie
@ 2000-07-03 14:56     ` Andrea Arcangeli
  2000-07-03 16:09       ` Stephen C. Tweedie
  0 siblings, 1 reply; 3+ messages in thread
From: Andrea Arcangeli @ 2000-07-03 14:56 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Rik van Riel, Marcelo Tosatti, Jens Axboe, Alan Cox,
	Derek Martin, Linux Kernel, linux-mm

On Mon, 3 Jul 2000, Stephen C. Tweedie wrote:

>That has never been a requirement, and I'd think it would be
>dangerous to make such a new rule so close to 2.4.  

Note that 2.4.0-test* doesn't have kpiod just now (it's been dropped in
the 2.3.x cycle). Here I was wondering only about 2.2.x.

>Certainly, in 2.2 we would do all sorts of stuff while holding
>filesystem locks (in particular the inode and superblock locks).  Any
>file write, including mm page writes, would take the inode lock.  
>
>Right now, in 2.4, the mm locks less but write(2) still takes the
>inode lock.  That means that we _must_ be able to allocate with GFP_IO
>while holding the inode lock, since (a) write()s go through the page
>cache, and (b) touching the user's buffer during the write() can cause

writes in 2.2.x goes through the buffer cache that is for this reason
allocated with GFP_BUFFER.

>pages to be swapped in, invoking parts of the VM which assume they are
>able to use GFP_IO safely.

arghh b is a problem. I could workaround that with per per-process bitflag
set before down(&inode->i_sem) that reminds me not to write on any fs
because I would risk to recurse on the inode->i_sem.

The main problem I have with kpiod is that while it obviously avoids any
kind of deadlocks on the fs since make_pio_request is completly
asynchronous, it also introduces a problem in the swap_out code where we
have no way to know if we did some progress or not and if we should wait
some buffer to be written to disk. Waiting in shrink_mmap isn't even
enough since kpiod could have not yet been started writing and so there
may be none locked or dirty buffer in first place... We can't even wait on
kpiod ala wakeup_bdflush(1) since it would re-introduce the same deadlock
problem making kpiod useless for its anti-deadlock purpose.

So the only sane solution looked to me to skip the filemap_swapout when we
can't do it, and to do it ourselfs when we can. The problem is the "when
we can". Do you think it would be insane to implement the workaround I
mentioned above around the write semaphore?

>Sure, you could audit every single path through every filesystem to
>make sure that there are no possible deadlocks here, but the whole
>point of kpiod is to separate out the pageout from the process doing
>the write() in such situations to make deadlock impossible.

I'll think some more on it.

Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: More 2.2.17pre9 VM issues
  2000-07-03 14:56     ` Andrea Arcangeli
@ 2000-07-03 16:09       ` Stephen C. Tweedie
  0 siblings, 0 replies; 3+ messages in thread
From: Stephen C. Tweedie @ 2000-07-03 16:09 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Stephen C. Tweedie, Rik van Riel, Marcelo Tosatti, Jens Axboe,
	Alan Cox, Derek Martin, Linux Kernel, linux-mm

Hi,

On Mon, Jul 03, 2000 at 04:56:46PM +0200, Andrea Arcangeli wrote:
> 
> >pages to be swapped in, invoking parts of the VM which assume they are
> >able to use GFP_IO safely.
> 
> arghh b is a problem. I could workaround that with per per-process bitflag
> set before down(&inode->i_sem) that reminds me not to write on any fs
> because I would risk to recurse on the inode->i_sem.

It's not necessarily a problem, as the file paging routines don't take
the inode semaphore any more (at least on ext2).  But unless we want
to explicitly ban the read/writepage routines from invoking that
semaphore, we have to be prepared for this to happen.

Given that a write() syscall takes the semaphore for its whole
duration, that's an *awefully* long time to be preventing paging on
that inode.  So this is in fact probably the way forward --- document
that only write() can use the semaphore, but VM-invoked functions like
*writepage must not.

> The main problem I have with kpiod is that while it obviously avoids any
> kind of deadlocks on the fs since make_pio_request is completly
> asynchronous, it also introduces a problem in the swap_out code where we
> have no way to know if we did some progress or not and if we should wait
> some buffer to be written to disk.

Sure, but I've already said that I think we need multiple separate
paging queues, with the process of aging and cleaning pages made
separate from the process of evicting pages.  If you do that, then you
can always tell, from the length of the queues, whether or not you
still have work to do.

But I still agree that getting rid of kpiod is probably a good thing.
We just can't do it in 2.2.  In 2.4, keep it away by all means, but we
have to be aware of the implications when you do write() to an mmaped
file.

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2000-07-03 16:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20000703111813.D2699@redhat.com>
     [not found] ` <Pine.LNX.4.21.0007031314190.12740-100000@inspiron.random>
2000-07-03 13:56   ` More 2.2.17pre9 VM issues Stephen C. Tweedie
2000-07-03 14:56     ` Andrea Arcangeli
2000-07-03 16:09       ` Stephen C. Tweedie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox