* Re: More 2.2.17pre9 VM issues [not found] ` <Pine.LNX.4.21.0007031314190.12740-100000@inspiron.random> @ 2000-07-03 13:56 ` Stephen C. Tweedie 2000-07-03 14:56 ` Andrea Arcangeli 0 siblings, 1 reply; 3+ messages in thread From: Stephen C. Tweedie @ 2000-07-03 13:56 UTC (permalink / raw) To: Andrea Arcangeli Cc: Stephen C. Tweedie, Rik van Riel, Marcelo Tosatti, Jens Axboe, Alan Cox, Derek Martin, Linux Kernel, linux-mm Hi, On Mon, Jul 03, 2000 at 01:28:48PM +0200, Andrea Arcangeli wrote: > Stephen, are we really sure we still need kpiod?i Yes. > Isn't GFP_IO meant to be > clear if anybody is helding any filesystem lock (like superblock lock)? That has never been a requirement, and I'd think it would be dangerous to make such a new rule so close to 2.4. Certainly, in 2.2 we would do all sorts of stuff while holding filesystem locks (in particular the inode and superblock locks). Any file write, including mm page writes, would take the inode lock. Right now, in 2.4, the mm locks less but write(2) still takes the inode lock. That means that we _must_ be able to allocate with GFP_IO while holding the inode lock, since (a) write()s go through the page cache, and (b) touching the user's buffer during the write() can cause pages to be swapped in, invoking parts of the VM which assume they are able to use GFP_IO safely. Sure, you could audit every single path through every filesystem to make sure that there are no possible deadlocks here, but the whole point of kpiod is to separate out the pageout from the process doing the write() in such situations to make deadlock impossible. Cheers, Stephen -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: More 2.2.17pre9 VM issues 2000-07-03 13:56 ` More 2.2.17pre9 VM issues Stephen C. Tweedie @ 2000-07-03 14:56 ` Andrea Arcangeli 2000-07-03 16:09 ` Stephen C. Tweedie 0 siblings, 1 reply; 3+ messages in thread From: Andrea Arcangeli @ 2000-07-03 14:56 UTC (permalink / raw) To: Stephen C. Tweedie Cc: Rik van Riel, Marcelo Tosatti, Jens Axboe, Alan Cox, Derek Martin, Linux Kernel, linux-mm On Mon, 3 Jul 2000, Stephen C. Tweedie wrote: >That has never been a requirement, and I'd think it would be >dangerous to make such a new rule so close to 2.4. Note that 2.4.0-test* doesn't have kpiod just now (it's been dropped in the 2.3.x cycle). Here I was wondering only about 2.2.x. >Certainly, in 2.2 we would do all sorts of stuff while holding >filesystem locks (in particular the inode and superblock locks). Any >file write, including mm page writes, would take the inode lock. > >Right now, in 2.4, the mm locks less but write(2) still takes the >inode lock. That means that we _must_ be able to allocate with GFP_IO >while holding the inode lock, since (a) write()s go through the page >cache, and (b) touching the user's buffer during the write() can cause writes in 2.2.x goes through the buffer cache that is for this reason allocated with GFP_BUFFER. >pages to be swapped in, invoking parts of the VM which assume they are >able to use GFP_IO safely. arghh b is a problem. I could workaround that with per per-process bitflag set before down(&inode->i_sem) that reminds me not to write on any fs because I would risk to recurse on the inode->i_sem. The main problem I have with kpiod is that while it obviously avoids any kind of deadlocks on the fs since make_pio_request is completly asynchronous, it also introduces a problem in the swap_out code where we have no way to know if we did some progress or not and if we should wait some buffer to be written to disk. Waiting in shrink_mmap isn't even enough since kpiod could have not yet been started writing and so there may be none locked or dirty buffer in first place... We can't even wait on kpiod ala wakeup_bdflush(1) since it would re-introduce the same deadlock problem making kpiod useless for its anti-deadlock purpose. So the only sane solution looked to me to skip the filemap_swapout when we can't do it, and to do it ourselfs when we can. The problem is the "when we can". Do you think it would be insane to implement the workaround I mentioned above around the write semaphore? >Sure, you could audit every single path through every filesystem to >make sure that there are no possible deadlocks here, but the whole >point of kpiod is to separate out the pageout from the process doing >the write() in such situations to make deadlock impossible. I'll think some more on it. Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: More 2.2.17pre9 VM issues 2000-07-03 14:56 ` Andrea Arcangeli @ 2000-07-03 16:09 ` Stephen C. Tweedie 0 siblings, 0 replies; 3+ messages in thread From: Stephen C. Tweedie @ 2000-07-03 16:09 UTC (permalink / raw) To: Andrea Arcangeli Cc: Stephen C. Tweedie, Rik van Riel, Marcelo Tosatti, Jens Axboe, Alan Cox, Derek Martin, Linux Kernel, linux-mm Hi, On Mon, Jul 03, 2000 at 04:56:46PM +0200, Andrea Arcangeli wrote: > > >pages to be swapped in, invoking parts of the VM which assume they are > >able to use GFP_IO safely. > > arghh b is a problem. I could workaround that with per per-process bitflag > set before down(&inode->i_sem) that reminds me not to write on any fs > because I would risk to recurse on the inode->i_sem. It's not necessarily a problem, as the file paging routines don't take the inode semaphore any more (at least on ext2). But unless we want to explicitly ban the read/writepage routines from invoking that semaphore, we have to be prepared for this to happen. Given that a write() syscall takes the semaphore for its whole duration, that's an *awefully* long time to be preventing paging on that inode. So this is in fact probably the way forward --- document that only write() can use the semaphore, but VM-invoked functions like *writepage must not. > The main problem I have with kpiod is that while it obviously avoids any > kind of deadlocks on the fs since make_pio_request is completly > asynchronous, it also introduces a problem in the swap_out code where we > have no way to know if we did some progress or not and if we should wait > some buffer to be written to disk. Sure, but I've already said that I think we need multiple separate paging queues, with the process of aging and cleaning pages made separate from the process of evicting pages. If you do that, then you can always tell, from the length of the queues, whether or not you still have work to do. But I still agree that getting rid of kpiod is probably a good thing. We just can't do it in 2.2. In 2.4, keep it away by all means, but we have to be aware of the implications when you do write() to an mmaped file. Cheers, Stephen -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2000-07-03 16:09 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20000703111813.D2699@redhat.com>
[not found] ` <Pine.LNX.4.21.0007031314190.12740-100000@inspiron.random>
2000-07-03 13:56 ` More 2.2.17pre9 VM issues Stephen C. Tweedie
2000-07-03 14:56 ` Andrea Arcangeli
2000-07-03 16:09 ` Stephen C. Tweedie
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox