RE: large page patch - Seth, Rohit

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Seth, Rohit" <rohit.seth@intel.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: gh@us.ibm.com, riel@conectiva.com.br, akpm@zip.com.au, "Seth,
	Rohit" <rohit.seth@intel.com>,
	"Saxena, Sunil" <sunil.saxena@intel.com>,
	"Mallick, Asit K" <asit.k.mallick@intel.com>,
	"David S. Miller" <davem@redhat.com>,
	"'davidm@hpl.hp.com'" <davidm@hpl.hp.com>
Subject: RE: large page patch
Date: Fri, 2 Aug 2002 12:31:58 -0700	[thread overview]
Message-ID: <25282B06EFB8D31198BF00508B66D4FA03EA56C0@fmsmsx114.fm.intel.com> (raw)

We agree that there are few different ways to get this support implemented
in base kernel.  Also, the extent to which this support needs to go is also
debatable (like whether the large_pages could be made swapable etc.)  Just
to give little history, we also started with prototyping changes in kernel
that would get the large page support transparent to end user (as we wanted
to see the benefit of large apps like databases, spec benchmark and HPC
applications using different page sizes on IA-64).  And under some
conditions automagically user start using large pages for shm and private
anonymous pages.  But we would call this at best a kludge because there are
quite a number of conditions in these execution paths that one has to do
differently for large_pages.  For example,
make_pages_present/handle_mm_fault for anonymous or shmem type of pages need
to be modified to embed the knowledge of different page size in generic
kernel. Also, there are places where semantics of changes may not completely
match.  For example, doing a shm_lock/unlock on these segments were not
exactly doing the expected.  All those extra changes add cost in the normal
execution path (severity could differ from app to app). 

So, we needed to treat the large pages as a special case and want to make
sure that the application that will be using the large pages understand that
these pages are special (avoid transperent usage model until the large pages
are treated the same way as normal pages). This led to cleaner solution
(input for which also came from Linus himself).  The new APIs enable the
kernel to contain the changes to be architecture specific and limited to
very few kernel changes.  And above all it looks so much portable. Fact is,
the initial implementation was done for IA-64 and porting to x86 took couple
of hours. One of the other key advantage is that this design does not tie
the supported large_page size(s) to any specific size in the generic mm
code.  It supports all the underlying architecture supported page sizes
quite independent of generic code.  And architecture dependent code could
support multiple large_page sizes in the same kernel.

We presented our work to Oracle and they were acceptable to the new APIs
(not saying Oracle is the only DB in world that one has to worry about, but
it clearly indicates that the move from shm apis to this new APIs is easy.
Obviously the input from other big app vendors will be highly appreciated.).

Sceintific apps people who have the sources should also like this approach,
as there changes will be even more trivial (changes to malloc).  And above
all, for those people who really want to get this extra buck transparently,
the changes could be done to user land libraries to selectively map to these
new APIs.  LD_PRELOAD could be another way to do.  Ofcourse, there will be
changes that need to be done in user land.  But they are self contained
changes.  And one of the key point is that application knows what it is
demanding/getting form kernel.

Now to the point where the large_pages themselves could be made swapable. In
our opinion (and this may not be this API dependent), it is not a good idea
to look at these pages as swapable candidates.  Most of the big apps who are
going to use this feature will use them for the data that they really need
available all the time (prefereably in RAM if not on caches :-)).  And the
sysadm could easily configure the amount of large mem pool as per the needs
for a specific environment.

To the point where the whole kernel starts supporting (as David Mosberger
refered) superpages where support is built in kernel to basically treat
superpages as just another size the whole kernel supports will be great too.
But those need quite a lot of exhaustive changes in kernel layers as weill
as lot of tuning.....may be a little further away in future.

thanks,
asit & rohit
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

next             reply	other threads:[~2002-08-02 19:31 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-08-02 19:31 Seth, Rohit [this message]
  -- strict thread matches above, loose matches on Subject: below --
2002-08-02  1:34 Seth, Rohit
2002-08-02  0:37 Andrew Morton
2002-08-02  0:43 ` David S. Miller
2002-08-02  1:26   ` Andrew Morton
2002-08-02  1:19     ` David S. Miller
2002-08-02  1:55   ` Rik van Riel
2002-08-02  1:50     ` David S. Miller
2002-08-02  2:29     ` Gerrit Huizenga
2002-08-02  2:23       ` David S. Miller
2002-08-02  2:53         ` Gerrit Huizenga
2002-08-02  5:24       ` David Mosberger
2002-08-02  5:20         ` David S. Miller
2002-08-02  6:26           ` David Mosberger
2002-08-02  6:33             ` Martin J. Bligh
2002-08-02  6:44               ` David Mosberger
2002-08-02 10:00                 ` Marcin Dalecki
2002-08-02  7:08               ` Andrew Morton
2002-08-02  7:15                 ` William Lee Irwin III
2002-08-02  8:20             ` David S. Miller
2002-08-02  9:05               ` Ryan Cumming
2002-08-02  9:06                 ` David S. Miller
2002-08-02 12:52                 ` Rik van Riel
2002-08-02 15:27               ` David Mosberger
2002-08-02  1:09 ` Martin J. Bligh
2002-08-02  1:36 ` Andrew Morton
2002-08-02  4:31   ` Daniel Phillips
2002-08-02  4:47     ` Andrew Morton
2002-08-02  3:47 ` William Lee Irwin III
2002-08-02 23:40 ` Chris Wedgwood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=25282B06EFB8D31198BF00508B66D4FA03EA56C0@fmsmsx114.fm.intel.com \
    --to=rohit.seth@intel.com \
    --cc=akpm@zip.com.au \
    --cc=asit.k.mallick@intel.com \
    --cc=davem@redhat.com \
    --cc=davidm@hpl.hp.com \
    --cc=gh@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@conectiva.com.br \
    --cc=sunil.saxena@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox