From: Hugh Dickins <hugh@veritas.com>
To: Linus Torvalds <torvalds@transmeta.com>
Cc: Ben LaHaise <bcrl@redhat.com>, linux-mm@kvack.org
Subject: Re: Large PAGE_SIZE
Date: Wed, 18 Jul 2001 01:02:52 +0100 (BST) [thread overview]
Message-ID: <Pine.LNX.4.21.0107172337340.1015-100000@localhost.localdomain> (raw)
In-Reply-To: <Pine.LNX.4.21.0107051737340.1577-100000@localhost.localdomain>
is the promised Large PAGE_SIZE patch against 2.4.6. If you'd like
to try these large pages, you'll have to edit include/asm-i386/page.h
PAGE_MMUSHIFT from 0 to 1 or 2 or 3: no configuration yet. There's
a sense in which the patch is now complete, but I'll probably be
ashamed of that claim tomorrow (several of the drivers haven't even
got compiled yet, much more remains untested). I'll update to 2.4.7
once it appears, but probably have to skip the -pres.
My original mail repeated below, to give a little explanation of what
you'll find; but I've changed it to match the current patch, saying
"MMU" where originally it said "SUB". You did suggest VM_PAGE_SIZE
to match vm_pgoff, but I soon found even that too ambiguous.
I've not merged Ben's multipage PAGE_CACHE_SIZE into this version:
I couldn't think coolly enough to decide page_cluster readahead as
PAGE_SIZE and PAGE_CACHE_SIZE vary; and some other issues I'll need
to settle with Ben first.
Hugh
On Thu, 5 Jul 2001, Hugh Dickins wrote:
>
> Linus,
>
> Ben's mail on multipage PAGE_CACHE_SIZE support prompts me to let you
> know now what I've been doing, and ask your opinion on this direction.
>
> Congratulations to Ben for working out multipage PAGE_CACHE_SIZE.
> I couldn't see where it was headed, and PAGE_CACHE_SIZE has been
> PAGE_SIZE for so long that I assumed everyone had given up on it.
>
> I'm interested in larger pages, but wary of multipage PAGE_CACHE_SIZE:
> partly because it relies on non-0-order page allocations, partly because
> it seems a shame then to break I/O into smaller units below the cache.
>
> So instead I'm using a larger PAGE_SIZE throughout the kernel: here's an
> extract from include/asm-i386/page.h (currently edited not configured):
>
> /*
> * One mmupage is represented by one Page Table Entry at the MMU level,
> * and corresponds to one page at the user process level: its size is
> * the same as param.h EXEC_PAGESIZE (for getpagesize(2) and mmap(2)).
> */
> #define MMUPAGE_SHIFT 12
> #define MMUPAGE_SIZE (1UL << MMUPAGE_SHIFT)
> #define MMUPAGE_MASK (~(MMUPAGE_SIZE-1))
>
> /*
> * 2**N adjacent mmupages may be clustered to make up one kernel page.
> * Reasonable and tested values for PAGE_MMUSHIFT are 0 (4k page),
> * 1 (8k page), 2 (16k page), 3 (32k page). Higher values will not
> * work without further changes e.g. to unsigned short b_size.
> */
> #define PAGE_MMUSHIFT 0
> #define PAGE_MMUCOUNT (1UL << PAGE_MMUSHIFT)
>
> /*
> * One kernel page is represented by one struct page (see mm.h),
> * and is the kernel's principal unit of memory allocation.
> */
> #define PAGE_SHIFT (PAGE_MMUSHIFT + MMUPAGE_SHIFT)
> #define PAGE_SIZE (1UL << PAGE_SHIFT)
> #define PAGE_MASK (~(PAGE_SIZE-1))
>
> The kernel patch which applies these definitions is, of course, much
> larger than Ben's multipage PAGE_CACHE_SIZE patch. Currently against
> 2.4.4 (I'm rebasing to 2.4.6 in the next week) plus some other patches
> we're using inhouse, it's about 350KB touching 160 files. Not quite
> complete yet (trivial macros still to be added to non-i386 arches; md
> readahead size not yet resolved; num_physpages in tuning to be checked;
> vmscan algorithms probably misscaled) and certainly undertested, but
> both 2GB SMP machine and 256MB laptop run stably with 32k pages (though
> 4k pages are better on the laptop, to keep kernel source tree in cache).
>
> Most of the patch is simple and straightforward, replacing PAGE_SIZE
> by MMUPAGE_SIZE where appropriate (in drivers that's usually only when
> handling vm_pgoff).
>
> Some of the patch is rather tangential: seemed right to implement proper
> flush_tlb_range() and flush_tlb_range_k() for flushing mmupages togther;
> hard to resist tidyups like changing zap_page_range() arg from size to
> end when it's always sandwiched between start,end functions. Unless
> PAGE_CACHE_SIZE definition were to be removed too, no change at all
> to most filesystems (cramfs, ncpfs, proc being exceptions).
>
> Kernel physical and virtual address space mostly in PAGE_SIZE units:
> __get_free_page(), vmalloc(), ioremap(), kmap_atomic(), kmap() pages;
> but early alloc_bootmem_pages() and fixmap.h slots in MMUPAGE_SIZE.
>
> User address space has to be in MMUPAGE_SIZE units (unless I want to
> rebuild all my userspace): so the difficult part of the patch is the
> mm/memory.c fault handlers, and preventing the anonymous MMUPAGE_SIZE
> pieces from degenerating into needing a PAGE_SIZE physical page each,
> and how to translate exclusive_swap_page().
>
> These page fault handlers now prepare and operate upon a
> pte_t *folio[PAGE_MMUCOUNT], different parts of the same large page
> expected at respective virtual offsets (yes, mremap() can spoil that,
> but it's exceptional). Anon mappings may have non-0 vm_pgoff, to share
> page with adjacent private mappings e.g. bss share large page with data,
> so KIO across data-bss boundary works (KIO page granularity troublesome,
> but would have been a shame to revert to the easier MMUPAGE_SIZE there).
> Hard to get the macros right, to melt away to efficient code in the
> PAGE_MMUSHIFT 0 case: I've done the best I can for now,
> you'll probably find them clunky and suggest better.
>
> Performance? Not yet determined, we're just getting around to that.
> Unless it performs significantly better than multipage PAGE_CACHE_SIZE,
> it should be forgotten: no point in extensive change for no gain.
>
> I've said enough for now: either you're already disgusted, and will
> reply "Never!", or you'll sometime want to cast an eye over the patch
> itself (or nominate someone else to do so), to get the measure of it.
> If the latter, please give me a few days to put it together against
> 2.4.6, minus our other inhouse pieces, then I can put the result on
> an ftp site for you.
>
> I would have preferred to wait a little longer before unveiling this,
> but it's appropriate to consider it with multipage PAGE_CACHE_SIZE.
>
> Thanks for your time!
> Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
next prev parent reply other threads:[~2001-07-18 0:02 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-07-05 5:06 [wip-PATCH] rfi: PAGE_CACHE_SIZE suppoort Ben LaHaise
2001-07-05 5:55 ` Linus Torvalds
2001-07-05 16:45 ` Large PAGE_SIZE Hugh Dickins
2001-07-05 17:13 ` Linus Torvalds
2001-07-05 18:38 ` Hugh Dickins
2001-07-05 18:53 ` Linus Torvalds
2001-07-05 20:41 ` Ben LaHaise
2001-07-05 20:59 ` Hugh Dickins
2001-07-06 5:11 ` Linus Torvalds
2001-07-09 3:04 ` [wip-PATCH] " Ben LaHaise
2001-07-09 11:18 ` Hugh Dickins
2001-07-09 13:13 ` Jeff Garzik
2001-07-09 14:18 ` Hugh Dickins
2001-07-09 14:33 ` Jeff Garzik
2001-07-09 17:21 ` Hugh Dickins
2001-07-10 5:53 ` Ben LaHaise
2001-07-10 16:42 ` Hugh Dickins
2001-07-18 0:02 ` Hugh Dickins [this message]
2001-07-18 18:48 ` Hugh Dickins
2001-07-22 23:08 ` Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.21.0107172337340.1015-100000@localhost.localdomain \
--to=hugh@veritas.com \
--cc=bcrl@redhat.com \
--cc=linux-mm@kvack.org \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox