* Re: (reiserfs) Re: More on Re: (reiserfs) Reiserfs and ext2fs (was Re: (reiserfs) Sum Benchmarks (these look typical?)) [not found] ` <m11zse6ecw.fsf@flinx.npwt.net> @ 1998-06-25 11:00 ` Stephen C. Tweedie 1998-06-26 15:56 ` Eric W. Biederman 0 siblings, 1 reply; 11+ messages in thread From: Stephen C. Tweedie @ 1998-06-25 11:00 UTC (permalink / raw) To: Eric W. Biederman Cc: Stephen C. Tweedie, Hans Reiser, Shawn Leas, Reiserfs, Ken Tetrick, linux-mm Hi, [CC:ed to linux-mm, who also have a great deal of interest in this stuff.] On 24 Jun 1998 09:53:03 -0500, ebiederm+eric@npwt.net (Eric W. Biederman) said: ST> However, there's a lot of overlap, so I'd like to look at what we can do ST> with this for 2.3. In particular, I'd like 2.3's standard file writing ST> mechanism to work essentially as write-through from the page cache, > The current system is write-through. I hope you mean write back. The current system is write-through from the buffer cache. The data is copied into the page cache only if there is already a page mapping that data. That is really ugly, using the buffer cache both as an IO buffer and as a data cache. THAT is what we need to fix. The ideal solution IMHO would be something which does write-through from the page cache to the buffer cache and write-back from the buffer cache to disk; in other words, when you write to a page, buffers are generated to map that dirty data (without copying) there and then. The IO is then left to the buffer cache, as currently happens, but the buffer is deleted after IO (just like other temporary buffer_heads behave right now). That leaves the IO buffering to the buffer cache and the caching to the page cache, which is the distinction that the the current scheme approaches but does not quite achieve. > This functionality is essentially what is implemented with brw_page, > and I have written the generic_page_write that does essentially > this. There is no data copying however. The fun angle is mapped > pages need to be unmapped (or at least read only mapped) for a write > to be successful. Indeed; however, it might be a reasonable compromise to do a copy out from the page cache to the buffer cache in this situation (we already have a copy in there, so this would not hurt performance relative to the current system). Doing COW at the page cache level is something we can implement later; there are other reasons for it to be desirable anyway. For example, it lets you convert all read(2) and write(2) requests on whole pages into mmap()s, transparently, giving automatic zero-copy IO to user space. > I should have a working patch this weekend (the code compiles now, I > just need to make sure it works) and we can discuss it more when that > has been released. Excellent. I look forward to seeing it. --Stephen ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: (reiserfs) Re: More on Re: (reiserfs) Reiserfs and ext2fs (was Re: (reiserfs) Sum Benchmarks (these look typical?)) 1998-06-25 11:00 ` (reiserfs) Re: More on Re: (reiserfs) Reiserfs and ext2fs (was Re: (reiserfs) Sum Benchmarks (these look typical?)) Stephen C. Tweedie @ 1998-06-26 15:56 ` Eric W. Biederman 1998-06-29 10:35 ` Stephen C. Tweedie 0 siblings, 1 reply; 11+ messages in thread From: Eric W. Biederman @ 1998-06-26 15:56 UTC (permalink / raw) To: Stephen C. Tweedie Cc: Hans Reiser, Shawn Leas, Reiserfs, Ken Tetrick, linux-mm >>>>> "ST" == Stephen C Tweedie <sct@dcs.ed.ac.uk> writes: ST> Hi, ST> [CC:ed to linux-mm, who also have a great deal of interest in this ST> stuff.] ST> On 24 Jun 1998 09:53:03 -0500, ebiederm+eric@npwt.net (Eric ST> W. Biederman) said: ST> However, there's a lot of overlap, so I'd like to look at what we can do ST> with this for 2.3. In particular, I'd like 2.3's standard file writing ST> mechanism to work essentially as write-through from the page cache, >> The current system is write-through. I hope you mean write back. ST> The current system is write-through from the buffer cache. The data ST> is copied into the page cache only if there is already a page mapping ST> that data. That is really ugly, using the buffer cache both as an IO ST> buffer and as a data cache. THAT is what we need to fix. You're right. But if you implement the appropriate routines so you can use generic_file_write we do a proper write through the page cache now. ST> The ideal solution IMHO would be something which does write-through ST> from the page cache to the buffer cache and write-back from the buffer ST> cache to disk; in other words, when you write to a page, buffers are ST> generated to map that dirty data (without copying) there and then. ST> The IO is then left to the buffer cache, as currently happens, but the ST> buffer is deleted after IO (just like other temporary buffer_heads ST> behave right now). That leaves the IO buffering to the buffer cache ST> and the caching to the page cache, which is the distinction that the ST> the current scheme approaches but does not quite achieve. Unless I have missed something write-back from the page cache is important, because then when you delete a file you haven't written yet you can completely avoid I/O. For short lived files this should be a performance win. Coping the few pages that are actively engaged in being written into the buffer cache may not be a bad idea, as it removes the lock from the page cache page much sooner, and frees if for use again. >> This functionality is essentially what is implemented with brw_page, >> and I have written the generic_page_write that does essentially >> this. There is no data copying however. The fun angle is mapped >> pages need to be unmapped (or at least read only mapped) for a write >> to be successful. ST> Indeed; however, it might be a reasonable compromise to do a copy out ST> from the page cache to the buffer cache in this situation (we already ST> have a copy in there, so this would not hurt performance relative to ST> the current system). Agreed. But it takes more work to write. ST> Doing COW at the page cache level is something we can implement later; ST> there are other reasons for it to be desirable anyway. For example, ST> it lets you convert all read(2) and write(2) requests on whole pages ST> into mmap()s, transparently, giving automatic zero-copy IO to user ST> space. Sounds neat but I wasn't advocating it, in this context. >> I should have a working patch this weekend (the code compiles now, I >> just need to make sure it works) and we can discuss it more when that >> has been released. ST> Excellent. I look forward to seeing it. I need to clean the patch up a bit (I built it on top of a patched kernel, but I have it working right now!). I have successfully performaned two simultaneous kernel compiles which is a pretty good test for races ;). Hopefully I'll have a little time this weekend, to make a good patch, otherwise I'll just release my mess. Eric ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: (reiserfs) Re: More on Re: (reiserfs) Reiserfs and ext2fs (was Re: (reiserfs) Sum Benchmarks (these look typical?)) 1998-06-26 15:56 ` Eric W. Biederman @ 1998-06-29 10:35 ` Stephen C. Tweedie 1998-06-29 19:59 ` Eric W. Biederman 0 siblings, 1 reply; 11+ messages in thread From: Stephen C. Tweedie @ 1998-06-29 10:35 UTC (permalink / raw) To: Eric W. Biederman Cc: Stephen C. Tweedie, Hans Reiser, Shawn Leas, Reiserfs, Ken Tetrick, linux-mm Hi, In article <m1emwcf97d.fsf@flinx.npwt.net>, ebiederm+eric@npwt.net (Eric W. Biederman) writes: >>>>>> "ST" == Stephen C Tweedie <sct@dcs.ed.ac.uk> writes: ST> The ideal solution IMHO would be something which does write-through ST> from the page cache to the buffer cache and write-back from the buffer ST> cache to disk; in other words, when you write to a page, buffers are ST> generated to map that dirty data (without copying) there and then. ST> The IO is then left to the buffer cache, as currently happens, but the ST> buffer is deleted after IO (just like other temporary buffer_heads ST> behave right now). That leaves the IO buffering to the buffer cache ST> and the caching to the page cache, which is the distinction that the ST> the current scheme approaches but does not quite achieve. > Unless I have missed something write-back from the page cache is > important, because then when you delete a file you haven't written yet > you can completely avoid I/O. For short lived files this should be a > performance win. We already do bforget() to deal with this in the buffer cache. Having the outstanding IO labelled in the buffer cache will not result in redundant writes in this case. >>> This functionality is essentially what is implemented with brw_page, >>> and I have written the generic_page_write that does essentially >>> this. There is no data copying however. The fun angle is mapped >>> pages need to be unmapped (or at least read only mapped) for a write >>> to be successful. ST> Indeed; however, it might be a reasonable compromise to do a copy out ST> from the page cache to the buffer cache in this situation (we already ST> have a copy in there, so this would not hurt performance relative to ST> the current system). > Agreed. But it takes more work to write. On reflection, it's not an issue. Mapped pages do not have to be unmapped at all. We can continue to share between cache and buffers as long as we want. Later modifications to the data in the cache page will update the buffer contents, true, but that's irrelevant as we will still be writing valid file contents to disk when the IO arrives. Those semantics are just fine. --Stephen ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: (reiserfs) Re: More on Re: (reiserfs) Reiserfs and ext2fs (was Re: (reiserfs) Sum Benchmarks (these look typical?)) 1998-06-29 10:35 ` Stephen C. Tweedie @ 1998-06-29 19:59 ` Eric W. Biederman 1998-06-30 16:10 ` Stephen C. Tweedie 0 siblings, 1 reply; 11+ messages in thread From: Eric W. Biederman @ 1998-06-29 19:59 UTC (permalink / raw) To: Stephen C. Tweedie Cc: Hans Reiser, Shawn Leas, Reiserfs, Ken Tetrick, linux-mm >>>>> "ST" == Stephen C Tweedie <sct@dcs.ed.ac.uk> writes: ST> Hi, ST> In article <m1emwcf97d.fsf@flinx.npwt.net>, ebiederm+eric@npwt.net (Eric ST> W. Biederman) writes: >> Unless I have missed something write-back from the page cache is >> important, because then when you delete a file you haven't written yet >> you can completely avoid I/O. For short lived files this should be a >> performance win. ST> We already do bforget() to deal with this in the buffer cache. Having ST> the outstanding IO labelled in the buffer cache will not result in ST> redundant writes in this case. That's good to know. It doesn't suprise me but I hadn't been through the code enough to see that one. I knew about bforget I just hadn't seen it used. >>>> This functionality is essentially what is implemented with brw_page, >>>> and I have written the generic_page_write that does essentially >>>> this. There is no data copying however. The fun angle is mapped >>>> pages need to be unmapped (or at least read only mapped) for a write >>>> to be successful. ST> Indeed; however, it might be a reasonable compromise to do a copy out ST> from the page cache to the buffer cache in this situation (we already ST> have a copy in there, so this would not hurt performance relative to ST> the current system). >> Agreed. But it takes more work to write. ST> On reflection, it's not an issue. Mapped pages do not have to be ST> unmapped at all. We can continue to share between cache and buffers as ST> long as we want. Later modifications to the data in the cache page will ST> update the buffer contents, true, but that's irrelevant as we will still ST> be writing valid file contents to disk when the IO arrives. Those ST> semantics are just fine. There are two problems I see. 1) A DMA controller actively access the same memory the CPU is accessing could be a problem. Recall video flicker on old video cards. 2) More importantly the cpu writes to the _cache_, and the DMA controller reads from the RAM. I don't see any consistency garnatees there. We may be able solve these problems on a per architecture or device basis however. Eric ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: (reiserfs) Re: More on Re: (reiserfs) Reiserfs and ext2fs (was Re: (reiserfs) Sum Benchmarks (these look typical?)) 1998-06-29 19:59 ` Eric W. Biederman @ 1998-06-30 16:10 ` Stephen C. Tweedie 1998-07-01 0:17 ` Eric W. Biederman 0 siblings, 1 reply; 11+ messages in thread From: Stephen C. Tweedie @ 1998-06-30 16:10 UTC (permalink / raw) To: Eric W. Biederman Cc: Stephen C. Tweedie, Hans Reiser, Shawn Leas, Reiserfs, Ken Tetrick, linux-mm Hi, On 29 Jun 1998 14:59:37 -0500, ebiederm+eric@npwt.net (Eric W. Biederman) said: > There are two problems I see. > 1) A DMA controller actively access the same memory the CPU is > accessing could be a problem. Recall video flicker on old video > cards. Shouldn't be a problem. > 2) More importantly the cpu writes to the _cache_, and the DMA > controller reads from the RAM. I don't see any consistency garnatees > there. We may be able solve these problems on a per architecture or > device basis however. Again, not important. If we ever modify a page which is already being written out to a device, then we mark that page dirty. On write, we mark it clean (but locked) _before_ starting the IO, not after. So, if there is ever an overlap of a filesystem/mmap write with an IO to disk, we will always schedule another IO later to clean the re-dirtied buffers. --Stephen ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: (reiserfs) Re: More on Re: (reiserfs) Reiserfs and ext2fs (was Re: (reiserfs) Sum Benchmarks (these look typical?)) 1998-06-30 16:10 ` Stephen C. Tweedie @ 1998-07-01 0:17 ` Eric W. Biederman 1998-07-01 9:12 ` Stephen C. Tweedie 0 siblings, 1 reply; 11+ messages in thread From: Eric W. Biederman @ 1998-07-01 0:17 UTC (permalink / raw) To: Stephen C. Tweedie Cc: Hans Reiser, Shawn Leas, Reiserfs, Ken Tetrick, linux-mm >>>>> "ST" == Stephen C Tweedie <sct@dcs.ed.ac.uk> writes: ST> Hi, ST> On 29 Jun 1998 14:59:37 -0500, ebiederm+eric@npwt.net (Eric ST> W. Biederman) said: >> There are two problems I see. >> 1) A DMA controller actively access the same memory the CPU is >> accessing could be a problem. Recall video flicker on old video >> cards. ST> Shouldn't be a problem. When either I trace through the code, or a hardware guy convinces me, that it is safe to both write to a page, and do DMA from a page simultaneously I'll believe it. >> 2) More importantly the cpu writes to the _cache_, and the DMA >> controller reads from the RAM. I don't see any consistency garnatees >> there. We may be able solve these problems on a per architecture or >> device basis however. ST> Again, not important. If we ever modify a page which is already being ST> written out to a device, then we mark that page dirty. On write, we ST> mark it clean (but locked) _before_ starting the IO, not after. So, if ST> there is ever an overlap of a filesystem/mmap write with an IO to disk, ST> we will always schedule another IO later to clean the re-dirtied ST> buffers. Duh. I wonder what I was thinking... Anyhow I've implemented the conservative version. The only change needed is to change from unmapping pages to removing the dirty bit, and the basic code stands. The most important change needed would be to tell unuse_page it can't remove a a locked page from the page cache. Either that or I need to worry about incrementing the count for page writes, which wouldn't be a bad idea either. Eric ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: (reiserfs) Re: More on Re: (reiserfs) Reiserfs and ext2fs (was Re: (reiserfs) Sum Benchmarks (these look typical?)) 1998-07-01 0:17 ` Eric W. Biederman @ 1998-07-01 9:12 ` Stephen C. Tweedie 1998-07-01 12:45 ` Eric W. Biederman 1998-07-01 13:11 ` Eric W. Biederman 0 siblings, 2 replies; 11+ messages in thread From: Stephen C. Tweedie @ 1998-07-01 9:12 UTC (permalink / raw) To: Eric W. Biederman Cc: Stephen C. Tweedie, Hans Reiser, Shawn Leas, Reiserfs, Ken Tetrick, linux-mm Hi, On 30 Jun 1998 19:17:15 -0500, ebiederm+eric@npwt.net (Eric W. Biederman) said: > When either I trace through the code, or a hardware guy convinces me, > that it is safe to both write to a page, and do DMA from a page > simultaneously I'll believe it. Read the source code! We already do this. If one process or thread msync()s a mapped file, its dirty pages get written to disk, independently of any other processes on the same or other CPUs which may still have the pages mapped and may still be writing to them. We don't unmap pages for write; we just mark them non-dirty around all ptes. --Stephen ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: (reiserfs) Re: More on Re: (reiserfs) Reiserfs and ext2fs (was Re: (reiserfs) Sum Benchmarks (these look typical?)) 1998-07-01 9:12 ` Stephen C. Tweedie @ 1998-07-01 12:45 ` Eric W. Biederman 1998-07-01 13:11 ` Eric W. Biederman 1 sibling, 0 replies; 11+ messages in thread From: Eric W. Biederman @ 1998-07-01 12:45 UTC (permalink / raw) To: Stephen C. Tweedie Cc: Hans Reiser, Shawn Leas, Reiserfs, Ken Tetrick, linux-mm >>>>> "ST" == Stephen C Tweedie <sct@redhat.com> writes: ST> Hi, ST> On 30 Jun 1998 19:17:15 -0500, ebiederm+eric@npwt.net (Eric ST> W. Biederman) said: >> When either I trace through the code, or a hardware guy convinces me, >> that it is safe to both write to a page, and do DMA from a page >> simultaneously I'll believe it. ST> Read the source code! We already do this. If one process or thread ST> msync()s a mapped file, its dirty pages get written to disk, ST> independently of any other processes on the same or other CPUs which ST> may still have the pages mapped and may still be writing to them. We ST> don't unmap pages for write; we just mark them non-dirty around all ST> ptes. Which is fine but, it still (currently) gets copied to the buffer cache. As the buffer cache leaves the picture... Eric ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: (reiserfs) Re: More on Re: (reiserfs) Reiserfs and ext2fs (was Re: (reiserfs) Sum Benchmarks (these look typical?)) 1998-07-01 9:12 ` Stephen C. Tweedie 1998-07-01 12:45 ` Eric W. Biederman @ 1998-07-01 13:11 ` Eric W. Biederman 1998-07-01 20:07 ` Stephen C. Tweedie 1 sibling, 1 reply; 11+ messages in thread From: Eric W. Biederman @ 1998-07-01 13:11 UTC (permalink / raw) To: Stephen C. Tweedie Cc: Hans Reiser, Shawn Leas, Reiserfs, Ken Tetrick, linux-mm >>>>> "ST" == Stephen C Tweedie <sct@redhat.com> writes: ST> Hi, ST> On 30 Jun 1998 19:17:15 -0500, ebiederm+eric@npwt.net (Eric ST> W. Biederman) said: >> When either I trace through the code, or a hardware guy convinces me, >> that it is safe to both write to a page, and do DMA from a page >> simultaneously I'll believe it. ST> Read the source code! We already do this. If one process or thread ST> msync()s a mapped file, its dirty pages get written to disk, ST> independently of any other processes on the same or other CPUs which ST> may still have the pages mapped and may still be writing to them. We ST> don't unmap pages for write; we just mark them non-dirty around all ST> ptes. I just took the time and looked. And in buffer.c in get_hash_table if we are returning a locked buffer, we always wait on that buffer until it is unlocked. So to date we I don't see us tempting fate, with writing to locked buffers. It may be harmless but I have't seen that yet. Eric ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: (reiserfs) Re: More on Re: (reiserfs) Reiserfs and ext2fs (was Re: (reiserfs) Sum Benchmarks (these look typical?)) 1998-07-01 13:11 ` Eric W. Biederman @ 1998-07-01 20:07 ` Stephen C. Tweedie 1998-07-02 15:17 ` Eric W. Biederman 0 siblings, 1 reply; 11+ messages in thread From: Stephen C. Tweedie @ 1998-07-01 20:07 UTC (permalink / raw) To: Eric W. Biederman Cc: Stephen C. Tweedie, Hans Reiser, Shawn Leas, Reiserfs, Ken Tetrick, linux-mm Hi, On 01 Jul 1998 08:11:46 -0500, ebiederm+eric@npwt.net (Eric W. Biederman) said: ST> Read the source code! We already do this. If one process or thread ST> msync()s a mapped file, its dirty pages get written to disk, ST> independently of any other processes on the same or other CPUs which ST> may still have the pages mapped and may still be writing to them. We ST> don't unmap pages for write; we just mark them non-dirty around all ST> ptes. > I just took the time and looked. > And in buffer.c in get_hash_table if we are returning a locked buffer, > we always wait on that buffer until it is unlocked. So to date we I > don't see us tempting fate, with writing to locked buffers. Whoops, yes, we do currently do copies for msync(). It's been too long since I was digging in that code... --Stephen ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: (reiserfs) Re: More on Re: (reiserfs) Reiserfs and ext2fs (was Re: (reiserfs) Sum Benchmarks (these look typical?)) 1998-07-01 20:07 ` Stephen C. Tweedie @ 1998-07-02 15:17 ` Eric W. Biederman 0 siblings, 0 replies; 11+ messages in thread From: Eric W. Biederman @ 1998-07-02 15:17 UTC (permalink / raw) To: Stephen C. Tweedie Cc: Hans Reiser, Shawn Leas, Reiserfs, Ken Tetrick, linux-mm >>>>> "ST" == Stephen C Tweedie <sct@redhat.com> writes: >> I just took the time and looked. >> And in buffer.c in get_hash_table if we are returning a locked buffer, >> we always wait on that buffer until it is unlocked. So to date we I >> don't see us tempting fate, with writing to locked buffers. ST> Whoops, yes, we do currently do copies for msync(). It's been too long ST> since I was digging in that code... Well I asked on Linux kernel and talked a little bit about this with Alan Cox. He figures if we try and stop something like DMA half way through we are in trouble but otherwise we should be o.k. So for the next round I'll implement the cheap clear the dirty bit, on the page tables trick. Eric ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~1998-07-02 15:36 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <Pine.HPP.3.96.980617035608.29950A-100000@ixion.honeywell.com>
[not found] ` <199806221138.MAA00852@dax.dcs.ed.ac.uk>
[not found] ` <358F4FBE.821B333C@ricochet.net>
[not found] ` <m11zsgrvnf.fsf@flinx.npwt.net>
[not found] ` <199806241154.MAA03544@dax.dcs.ed.ac.uk>
[not found] ` <m11zse6ecw.fsf@flinx.npwt.net>
1998-06-25 11:00 ` (reiserfs) Re: More on Re: (reiserfs) Reiserfs and ext2fs (was Re: (reiserfs) Sum Benchmarks (these look typical?)) Stephen C. Tweedie
1998-06-26 15:56 ` Eric W. Biederman
1998-06-29 10:35 ` Stephen C. Tweedie
1998-06-29 19:59 ` Eric W. Biederman
1998-06-30 16:10 ` Stephen C. Tweedie
1998-07-01 0:17 ` Eric W. Biederman
1998-07-01 9:12 ` Stephen C. Tweedie
1998-07-01 12:45 ` Eric W. Biederman
1998-07-01 13:11 ` Eric W. Biederman
1998-07-01 20:07 ` Stephen C. Tweedie
1998-07-02 15:17 ` Eric W. Biederman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox