(reiserfs) Re: Maybe we can do 40 bits in June/July. (fwd)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* (reiserfs) Re: Maybe we can do 40 bits in June/July. (fwd)
@ 1998-04-22 13:18 Rik van Riel
  1998-04-22 17:57 ` Eric W. Biederman
  0 siblings, 1 reply; 4+ messages in thread
From: Rik van Riel @ 1998-04-22 13:18 UTC (permalink / raw)
  To: linux-mm

Hi guys,

I just got this message from Hans Reiser (the main
ReiserFS coordinator), who says that ReiserFS will
be 40-bits (1TB filesize) ready by june/juli this
year.
Now we (the MM guys) need to get together and make
the MM layer 40-bit transparent too (or 41-bit).

Any takers?

Rik.

---------- Forwarded message ----------
Date: Wed, 22 Apr 1998 00:53:03 -0700
From: Hans Reiser <reiser@ricochet.net>
To: H.H.vanRiel@phys.uu.nl
Cc: reiserfs <reiserfs@devlinux.com>
Subject: (reiserfs) Re: Maybe we can do 40 bits in June/July.

Hi Rik,

Ok, I propose the following.  After we stabilize reiserfs but before we
ship it to users we will send you an email saying we are ready to move
to 40 bits.  Then, working in parallel, we will convert both mm and
reiserfs to 40 bits.  You (or somebody you name) will coordinate the mm
portion, and I (or Vladimir) will coordinate the reiserfs portion. 

I anticipate that we will be able to convert ~June, not later than
July.  I anticipate that it will be easy for reiserfs to convert, and
take not long to debug any reiserfs problems that occur.  Since changing
mm will inconvenience other things besides reiserfs, I imagine that you
will want it to be deferred until reiserfs is stable enough for users to
benefit from using 40 bit reiserfs.  Maybe we can implement 40 bits as a
#define in the reiserfs code.

Incidentally, I prefer 40 bits for reiserfs for yet another reason: 64
bits would make our keys overly large.

I am sure that there are a lot of details which we can work out in
June.   

Does this plan sound good to you?

Hans

Rik van Riel wrote:
> 
> On Tue, 21 Apr 1998, Hans Reiser wrote:
> 
> > My current thinking is that we should only worry about 2GB files when
> > Linus and the MM guys indicate they want to deal with making offsets
> > 40bits.  I think it is more work for them than for us, so we should let
> > them tell us when they want it.  I will have us do it whenever they
> > decide they want it.
> 
> I know Linus doesn't mind 40bit offsets. Mj and davem are
> likely to work on it when some FS supports it (they both
> work with large server systems).
> 
> Rik.
> +-------------------------------------------+--------------------------+
> | Linux: - LinuxHQ MM-patches page          | Scouting       webmaster |
> |        - kswapd ask-him & complain-to guy | Vries    cubscout leader |
> |     http://www.phys.uu.nl/~riel/          | <H.H.vanRiel@phys.uu.nl> |
> +-------------------------------------------+--------------------------+

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: (reiserfs) Re: Maybe we can do 40 bits in June/July. (fwd)
  1998-04-22 13:18 (reiserfs) Re: Maybe we can do 40 bits in June/July. (fwd) Rik van Riel
@ 1998-04-22 17:57 ` Eric W. Biederman
  1998-04-22 18:29   ` Benjamin C.R. LaHaise
  0 siblings, 1 reply; 4+ messages in thread
From: Eric W. Biederman @ 1998-04-22 17:57 UTC (permalink / raw)
  To: H.H.vanRiel; +Cc: linux-mm

>>>>> "RR" == Rik van Riel <H.H.vanRiel@phys.uu.nl> writes:

RR> Hi guys,
RR> I just got this message from Hans Reiser (the main
RR> ReiserFS coordinator), who says that ReiserFS will
RR> be 40-bits (1TB filesize) ready by june/juli this
RR> year.
RR> Now we (the MM guys) need to get together and make
RR> the MM layer 40-bit transparent too (or 41-bit).

RR> Any takers?

I will make at least a preliminary patch.  

I have already started.

My design:
As I understand it the buffer cache is fine, so it is just a matter
getting the page cache and the vma and the glue working.

My thought is to make the page cache use generic keys. 
This should help support things like the swapper inode a little
better.  Still need a bit somewhere so we can coallese VMA's that have
an inode but don't need continous keys.  That's for later.

For the common case of inodes have the those keys:
page->key == page->offset >> PAGE_SHIFT.

And of course get rid of page->offset.  The field name changes will to
catch any old code that is out there.

Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: (reiserfs) Re: Maybe we can do 40 bits in June/July. (fwd)
  1998-04-22 17:57 ` Eric W. Biederman
@ 1998-04-22 18:29   ` Benjamin C.R. LaHaise
  1998-04-22 21:08     ` Eric W. Biederman
  0 siblings, 1 reply; 4+ messages in thread
From: Benjamin C.R. LaHaise @ 1998-04-22 18:29 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-mm

On 22 Apr 1998, Eric W. Biederman wrote:
...
> My design:
> As I understand it the buffer cache is fine, so it is just a matter
> getting the page cache and the vma and the glue working.

The buffer cache is currently fine, but we do want to get rid of it...

> My thought is to make the page cache use generic keys. 
> This should help support things like the swapper inode a little
> better.  Still need a bit somewhere so we can coallese VMA's that have
> an inode but don't need continous keys.  That's for later.

Hmmm, if you've seen my rev_pte patch then you'll notice that *all* vmas
will soon need continuous keys... 

> For the common case of inodes have the those keys:
> page->key == page->offset >> PAGE_SHIFT.

Not a good idea unless support for a.out is dropped completely -- a better
choice would be to use 512 as a divisor; then pages can at least be at the
block offset as needed by a.out.

Something else to keep in mind is that we also need a mechanism to keep
metadata in the page cache (rather, per-inode metadata; fixed metadata can
just use its own inode).

> And of course get rid of page->offset.  The field name changes will to
> catch any old code that is out there.

That's a good idea.

		-ben

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: (reiserfs) Re: Maybe we can do 40 bits in June/July. (fwd)
  1998-04-22 18:29   ` Benjamin C.R. LaHaise
@ 1998-04-22 21:08     ` Eric W. Biederman
  0 siblings, 0 replies; 4+ messages in thread
From: Eric W. Biederman @ 1998-04-22 21:08 UTC (permalink / raw)
  To: "Benjamin C.R. LaHaise" <blah@kvack.org> linux-mm

>>>>> "BL" == Benjamin C R LaHaise <blah@kvack.org> writes:

BL> On 22 Apr 1998, Eric W. Biederman wrote:
BL> ...
>> My design:
>> As I understand it the buffer cache is fine, so it is just a matter
>> getting the page cache and the vma and the glue working.

BL> The buffer cache is currently fine, but we do want to get rid of it...

We already don't use it for reading files.  
We just need to get the writing working better.  Hopefully there can
come a merger between my write dirty pages in the page cache, and the
revamp of the current swapping code.  Resulting in good write
performance. 

Demand clearing of writes from the page cache when we need the page
works but it would be better to write the page first.

We probably also need a filesystem sync system call...

>> My thought is to make the page cache use generic keys. 
>> This should help support things like the swapper inode a little
>> better.  Still need a bit somewhere so we can coallese VMA's that have
>> an inode but don't need continous keys.  That's for later.

BL> Hmmm, if you've seen my rev_pte patch then you'll notice that *all* vmas
BL> will soon need continuous keys... 

I was recalling something about not putting an inode in private shared
mappings so the VMA's can be merged...  That would be my definition of
non-continous keys.  The keys don't have to be continous.

It would also be worth checking on your patch to see what happens when
someone calls mremap on a private shared area.  I think you may have
key assignment problems...

This is from memory so I may be a little off.  I think I also read an
older patch so I may be mixing my memories.

>> For the common case of inodes have the those keys:
page-> key == page->offset >> PAGE_SHIFT.

BL> Not a good idea unless support for a.out is dropped completely -- a better
BL> choice would be to use 512 as a divisor; then pages can at least be at the
BL> block offset as needed by a.out.

How is a.out read in?  Private mapping of any alignment can still 
(theoretically) be handled.  Sharing private mappings is more
complicated, we need a COW scheme (for the page cache).  Currently
generic_file_mmap is broken with regard to private mappings...

BL> Something else to keep in mind is that we also need a mechanism to keep
BL> metadata in the page cache (rather, per-inode metadata; fixed metadata can
BL> just use its own inode).

In the short run for large file support I'd like to keep the changes
as small as possible. 

In the long run I want a generic backing store object (which an inode
could be a superset of) being the backing store for the page cache.

per-inode metada...  The indirect blocks... ACL.. Got it.

Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~1998-04-22 21:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-04-22 13:18 (reiserfs) Re: Maybe we can do 40 bits in June/July. (fwd) Rik van Riel
1998-04-22 17:57 ` Eric W. Biederman
1998-04-22 18:29   ` Benjamin C.R. LaHaise
1998-04-22 21:08     ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox