linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: npiggin@nick.local0.net,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Jared Hulbert <jaredeh@gmail.com>,
	Carsten Otte <cotte@de.ibm.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	linux-mm@kvack.org
Subject: Re: [patch 0/7] [rfc] VM_MIXEDMAP, pte_special, xip work
Date: Wed, 12 Mar 2008 16:33:34 +1100	[thread overview]
Message-ID: <200803121633.34539.nickpiggin@yahoo.com.au> (raw)
In-Reply-To: <20080311213525.a5994894.akpm@linux-foundation.org>

On Wednesday 12 March 2008 15:35, Andrew Morton wrote:
> On Tue, 11 Mar 2008 21:46:53 +1100 npiggin@nick.local0.net wrote:
> > --
> >
> > (doh, please ignore the previous "x/6" patches, they're old. The
> > new ones are these x/7 set)
> >
> > Hi,
> >
> > I'm sorry for neglecting these patches for a few weeks :(
> >
> > I'd like to still get them into -mm and aim for the next merge window --
> > they've been gradually getting a pretty reasonable amount of review and
> > testing. I think the implementation of the pte_special path in
> > vm_normal_page and vm_insert_mixed was the only point left unresolved
> > since last time.
> >
> > I've included the dual kaddr/pfn API that we worked out with Jared, but
> > he hasn't yet tested my patch rollup... so this is an RFC only. If we all
> > agree on it, then I'll rebase to -mm and submit.
>
> umm, could we have some executive summary about what this is all supposed
> to achieve?  I can see what each patch does, but what's the overall result?

The overall result is that:
1. We now support XIP backed filesystems using memory that have no
   struct page allocated to them. And patches 6 and 7 actually implement
   this for s390.

   This is pretty important in a number of cases. As far as I understand,
   in the case of virtualisation (eg. s390), each guest may mount a
   readonly copy of the same filesystem (eg. the distro). Currently,
   guests need to allocate struct pages for this image. So if you have
   100 guests, you already need to allocate more memory for the struct
   pages than the size of the image. I think. (Carsten?)

   For other (eg. embedded) systems, you may have a very large non-
   volatile filesystem. If you have to have struct pages for this, then
   your RAM consumption will go up proportionally to fs size. Even
   though it is just a small proportion, RAM can be much more costly
   eg in terms of power.

2. VM_MIXEDMAP allows us to support mappings where you actually do want
   to refcount _some_ pages in the mapping, but not others. I haven't
   actually seen his code, but I understand Jared requires this for his
   filesystem that can migrate pages between RAM and XIP/NVRAM
   transparently. Obviously the filesystem isn't finished yet, but
   Jared is relying on these changes for it to work.

3. pte_special also has a peripheral usage that I need for my lockless
   get_user_pages patch. That was shown to speed up "oltp" on db2 by
   10% on a 2 socket system, which is kind of significant because they
   scrounge for months to try to find 0.1% improvement on these
   workloads. I'm hoping we might finally be faster than AIX on
   pSeries with that patch. This is not meant to justify the whole
   patchset of course, but just to show that pte_special is not some
   s390 specific thing that should be hidden in arch code or xip code:
   I want to use it on x86 and powerpc as well, and in that case I
   need to use it for VM_PFNMAP not only VM_MIXEDMAP.


> [1/7] says:
> > VM_MIXEDMAP achieves this by refcounting all pfn_valid pages, and not
> > refcounting !pfn_valid pages (which is not an option for VM_PFNMAP,
> > because it needs to avoid refcounting pfn_valid pages eg. for /dev/mem
> > mappings).
>
> I have this vague feeling that pfn_valid() isn't reliable - it can
> sometimes lie, and that making it truthful was considered too expensive.
>
> But maybe I'm thinking of something else?

As far as I'm aware, if pfn_valid is true, then we can refcount the page.
This is the condition used by the page allocator to initialize the page
arrays, and should be the case if we're using one of the standard memory
models.

s390 is slightly different because it doesn't use a standard memory model
but something more dynamic. It doesn't quite do the right thing here, so
it uses pte_special. It could possibly tighten up pfn_valid, however I
think there are various reasons why they don't want to (one is that they
need to take a global lock in order to search their list of extents;
which will suck for VM_MIXEDMAP performance).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-03-12  5:33 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-11 10:46 npiggin
2008-03-11 10:46 ` [patch 1/7] mm: introduce VM_MIXEDMAP npiggin, Jared Hulbert
2008-03-11 10:46 ` [patch 2/7] mm: introduce pte_special pte bit npiggin
2008-03-11 10:46 ` [patch 3/7] mm: add vm_insert_mixed npiggin
2008-03-11 10:46 ` [patch 4/7] Alter the block device ->direct_access() API to work with the new get_xip_mem() API (that requires both kaddr and pfn are returned) npiggin
2008-03-11 10:46 ` [patch 5/7] xip: support non-struct page backed memory npiggin
2008-03-11 11:44 ` [patch 0/7] [rfc] VM_MIXEDMAP, pte_special, xip work Nick Piggin
2008-03-11 21:12   ` Jared Hulbert
2008-03-11 23:21     ` Nick Piggin
2008-03-12  4:35 ` Andrew Morton
2008-03-12  5:33   ` Nick Piggin [this message]
2008-03-12  8:46     ` Martin Schwidefsky
2008-03-12 16:40     ` Jared Hulbert
2008-03-12 17:10   ` Jared Hulbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200803121633.34539.nickpiggin@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=akpm@linux-foundation.org \
    --cc=cotte@de.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=jaredeh@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@nick.local0.net \
    --cc=schwidefsky@de.ibm.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox