From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: npiggin@nick.local0.net,
Linus Torvalds <torvalds@linux-foundation.org>,
Jared Hulbert <jaredeh@gmail.com>,
Carsten Otte <cotte@de.ibm.com>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
linux-mm@kvack.org
Subject: Re: [patch 0/7] [rfc] VM_MIXEDMAP, pte_special, xip work
Date: Wed, 12 Mar 2008 16:33:34 +1100 [thread overview]
Message-ID: <200803121633.34539.nickpiggin@yahoo.com.au> (raw)
In-Reply-To: <20080311213525.a5994894.akpm@linux-foundation.org>
On Wednesday 12 March 2008 15:35, Andrew Morton wrote:
> On Tue, 11 Mar 2008 21:46:53 +1100 npiggin@nick.local0.net wrote:
> > --
> >
> > (doh, please ignore the previous "x/6" patches, they're old. The
> > new ones are these x/7 set)
> >
> > Hi,
> >
> > I'm sorry for neglecting these patches for a few weeks :(
> >
> > I'd like to still get them into -mm and aim for the next merge window --
> > they've been gradually getting a pretty reasonable amount of review and
> > testing. I think the implementation of the pte_special path in
> > vm_normal_page and vm_insert_mixed was the only point left unresolved
> > since last time.
> >
> > I've included the dual kaddr/pfn API that we worked out with Jared, but
> > he hasn't yet tested my patch rollup... so this is an RFC only. If we all
> > agree on it, then I'll rebase to -mm and submit.
>
> umm, could we have some executive summary about what this is all supposed
> to achieve? I can see what each patch does, but what's the overall result?
The overall result is that:
1. We now support XIP backed filesystems using memory that have no
struct page allocated to them. And patches 6 and 7 actually implement
this for s390.
This is pretty important in a number of cases. As far as I understand,
in the case of virtualisation (eg. s390), each guest may mount a
readonly copy of the same filesystem (eg. the distro). Currently,
guests need to allocate struct pages for this image. So if you have
100 guests, you already need to allocate more memory for the struct
pages than the size of the image. I think. (Carsten?)
For other (eg. embedded) systems, you may have a very large non-
volatile filesystem. If you have to have struct pages for this, then
your RAM consumption will go up proportionally to fs size. Even
though it is just a small proportion, RAM can be much more costly
eg in terms of power.
2. VM_MIXEDMAP allows us to support mappings where you actually do want
to refcount _some_ pages in the mapping, but not others. I haven't
actually seen his code, but I understand Jared requires this for his
filesystem that can migrate pages between RAM and XIP/NVRAM
transparently. Obviously the filesystem isn't finished yet, but
Jared is relying on these changes for it to work.
3. pte_special also has a peripheral usage that I need for my lockless
get_user_pages patch. That was shown to speed up "oltp" on db2 by
10% on a 2 socket system, which is kind of significant because they
scrounge for months to try to find 0.1% improvement on these
workloads. I'm hoping we might finally be faster than AIX on
pSeries with that patch. This is not meant to justify the whole
patchset of course, but just to show that pte_special is not some
s390 specific thing that should be hidden in arch code or xip code:
I want to use it on x86 and powerpc as well, and in that case I
need to use it for VM_PFNMAP not only VM_MIXEDMAP.
> [1/7] says:
> > VM_MIXEDMAP achieves this by refcounting all pfn_valid pages, and not
> > refcounting !pfn_valid pages (which is not an option for VM_PFNMAP,
> > because it needs to avoid refcounting pfn_valid pages eg. for /dev/mem
> > mappings).
>
> I have this vague feeling that pfn_valid() isn't reliable - it can
> sometimes lie, and that making it truthful was considered too expensive.
>
> But maybe I'm thinking of something else?
As far as I'm aware, if pfn_valid is true, then we can refcount the page.
This is the condition used by the page allocator to initialize the page
arrays, and should be the case if we're using one of the standard memory
models.
s390 is slightly different because it doesn't use a standard memory model
but something more dynamic. It doesn't quite do the right thing here, so
it uses pte_special. It could possibly tighten up pfn_valid, however I
think there are various reasons why they don't want to (one is that they
need to take a global lock in order to search their list of extents;
which will suck for VM_MIXEDMAP performance).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-03-12 5:33 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-11 10:46 npiggin
2008-03-11 10:46 ` [patch 1/7] mm: introduce VM_MIXEDMAP npiggin, Jared Hulbert
2008-03-11 10:46 ` [patch 2/7] mm: introduce pte_special pte bit npiggin
2008-03-11 10:46 ` [patch 3/7] mm: add vm_insert_mixed npiggin
2008-03-11 10:46 ` [patch 4/7] Alter the block device ->direct_access() API to work with the new get_xip_mem() API (that requires both kaddr and pfn are returned) npiggin
2008-03-11 10:46 ` [patch 5/7] xip: support non-struct page backed memory npiggin
2008-03-11 11:44 ` [patch 0/7] [rfc] VM_MIXEDMAP, pte_special, xip work Nick Piggin
2008-03-11 21:12 ` Jared Hulbert
2008-03-11 23:21 ` Nick Piggin
2008-03-12 4:35 ` Andrew Morton
2008-03-12 5:33 ` Nick Piggin [this message]
2008-03-12 8:46 ` Martin Schwidefsky
2008-03-12 16:40 ` Jared Hulbert
2008-03-12 17:10 ` Jared Hulbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200803121633.34539.nickpiggin@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=akpm@linux-foundation.org \
--cc=cotte@de.ibm.com \
--cc=heiko.carstens@de.ibm.com \
--cc=jaredeh@gmail.com \
--cc=linux-mm@kvack.org \
--cc=npiggin@nick.local0.net \
--cc=schwidefsky@de.ibm.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox