From: Linus Torvalds <torvalds@transmeta.com>
To: Hugh Dickins <hugh@veritas.com>
Cc: Ben LaHaise <bcrl@redhat.com>
Subject: Re: Large PAGE_SIZE
Date: Thu, 5 Jul 2001 10:13:25 -0700 (PDT) [thread overview]
Message-ID: <Pine.LNX.4.33.0107050957010.22305-100000@penguin.transmeta.com> (raw)
In-Reply-To: <Pine.LNX.4.21.0107051737340.1577-100000@localhost.localdomain>
On Thu, 5 Jul 2001, Hugh Dickins wrote:
>
> I'm interested in larger pages, but wary of multipage PAGE_CACHE_SIZE:
> partly because it relies on non-0-order page allocations, partly because
> it seems a shame then to break I/O into smaller units below the cache.
Note that once PAGE_CACHE_SIZE is of a higher order, then they effectively
become the same as the current order-0 pages - it's just that the buddy
system can always allocate "fractional" pages too.
We shouldn't get the same fragmentation issues, as the new order-N
allocation should be the common one, and the sub-oder-N fragments should
clump nicely together.
Also note that the I/O _would_ happen in PAGE_CACHE_SIZE - you'd never
break it into smaller chunks. That's the whole point of having a bigger
PAGE_CACHE_SIZE.
Now, I actually think your approach basically does the very same thing,
and I don't think there are necessarily any real differences between the
two. It's more of a perception issue: which "direction" do you look at it
from.
You take the approach that pages are bigger, but that you can map partial
pages into VM spaces. That is 100% equivalent to saying that the caching
fragment size is a order-N page, I think.
Obviously your world-view ends up very much impacting how you actually
implement it, so in that sense perception certainly does matter.
> * One subpage is represented by one Page Table Entry at the MMU level,
> * and corresponds to one page at the user process level: its size is
> * the same as param.h EXEC_PAGESIZE (for getpagesize(2) and mmap(2)).
> */
> #define SUBPAGE_SHIFT 12
> #define SUBPAGE_SIZE (1UL << SUBPAGE_SHIFT)
> #define SUBPAGE_MASK (~(SUBPAGE_SIZE-1))
I would _really_ prefer to make it clear that "SUBPAGE" is a VM mapping
issue and nothing more (which is your approach), and would much prefer
that to be made very explicit. So I'd not call them "SUBPAGES", but
something like
#define VM_PAGE_SHIFT 12
#define VM_PAGE_SIZE ..
However, once you do this, who cares about "PAGE_SIZE" at all? In the end,
PAGE_SIZE has no meaning except for the internal VM memory management:
it's nothing but the smallest fragment-size that the buddy system works
with.
What does that matter? It makes a huge difference for page accounting.
That's really the only thing that should care about PAGE_SIZE, and the
difference here between the two approaches isn't all that big:
- in your approach, PAGE_SIZE equals PAGE_CACHE_SIZE, so a PAGE_CACHE
page only has one page count arrociated with it. That's good, because
it simplifies "release_page_cache()" and friends.
- going the other way, each VM "dirty" entity has a "struct page *"
associated with it. That makes page count handling a bit nastier, but
on the other hand it makes VM attributes much easier to handle, notably
things like "dirty" bits.
Which is the right one? Frankly, don't know. It may be quite acceptable to
have just a single dirty bit for bigger regions. That would simplify
things, for sure.
On the other hand, maybe we will eventually have a per-mapping "page
size". That would be pretty much impossible with your approach, while the
"page size is the smallest VM granularity, PAGE_CACHE_SIZE is something
else" approach lends itself to that extension (just add a "size_shift" to
"struct address_space", and make the #defines use that instead. "Small
matter of programming").
> I've said enough for now: either you're already disgusted, and will
> reply "Never!", or you'll sometime want to cast an eye over the patch
> itself (or nominate someone else to do so), to get the measure of it.
I'd really like both of you to think about both of the approaches as the
same thing, but with different mindsets. Maybe there is something that
clearly makes one mindset better. And maybe there is some way to just make
the two be completely equivalent..
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
next prev parent reply other threads:[~2001-07-09 2:44 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-07-05 5:06 [wip-PATCH] rfi: PAGE_CACHE_SIZE suppoort Ben LaHaise
2001-07-05 5:55 ` Linus Torvalds
2001-07-05 16:45 ` Large PAGE_SIZE Hugh Dickins
2001-07-05 17:13 ` Linus Torvalds [this message]
2001-07-05 18:38 ` Hugh Dickins
2001-07-05 18:53 ` Linus Torvalds
2001-07-05 20:41 ` Ben LaHaise
2001-07-05 20:59 ` Hugh Dickins
2001-07-06 5:11 ` Linus Torvalds
2001-07-09 3:04 ` [wip-PATCH] " Ben LaHaise
2001-07-09 11:18 ` Hugh Dickins
2001-07-09 13:13 ` Jeff Garzik
2001-07-09 14:18 ` Hugh Dickins
2001-07-09 14:33 ` Jeff Garzik
2001-07-09 17:21 ` Hugh Dickins
2001-07-10 5:53 ` Ben LaHaise
2001-07-10 16:42 ` Hugh Dickins
2001-07-18 0:02 ` Hugh Dickins
2001-07-18 18:48 ` Hugh Dickins
2001-07-22 23:08 ` Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.33.0107050957010.22305-100000@penguin.transmeta.com \
--to=torvalds@transmeta.com \
--cc=bcrl@redhat.com \
--cc=hugh@veritas.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox