linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Shaohua Li <shli@kernel.org>
To: Minchan Kim <minchan@kernel.org>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, hughd@google.com, riel@redhat.com
Subject: Re: [patch]mm: make madvise(MADV_WILLNEED) support swap file prefetch
Date: Tue, 8 Jan 2013 17:13:24 +0800	[thread overview]
Message-ID: <20130108091324.GA7966@kernel.org> (raw)
In-Reply-To: <20130108083853.GC4714@blaptop>

On Tue, Jan 08, 2013 at 05:38:53PM +0900, Minchan Kim wrote:
> On Tue, Jan 08, 2013 at 03:32:29PM +0800, Shaohua Li wrote:
> > On Tue, Jan 08, 2013 at 02:38:56PM +0900, Minchan Kim wrote:
> > > Hi Shaohua,
> > > 
> > > On Tue, Jan 08, 2013 at 12:26:09PM +0800, Shaohua Li wrote:
> > > > On Tue, Jan 08, 2013 at 10:16:07AM +0800, Wanpeng Li wrote:
> > > > > On Mon, Jan 07, 2013 at 12:06:30PM -0800, Andrew Morton wrote:
> > > > > >On Mon, 7 Jan 2013 16:12:37 +0800
> > > > > >Shaohua Li <shli@kernel.org> wrote:
> > > > > >
> > > > > >> 
> > > > > >> Make madvise(MADV_WILLNEED) support swap file prefetch. If memory is swapout,
> > > > > >> this syscall can do swapin prefetch. It has no impact if the memory isn't
> > > > > >> swapout.
> > > > > >
> > > > > >Seems sensible.
> > > > > 
> > > > > Hi Andrew and Shaohua,
> > > > > 
> > > > > What's the performance in the scenario of serious memory pressure? Since
> > > > > in this case pages in swap are highly fragmented and cache hit is most
> > > > > impossible. If WILLNEED path should add a check to skip readahead in
> > > > > this case since swapin only leads to unnecessary memory allocation. 
> > > > 
> > > > pages in swap are not highly fragmented if you access memory sequentially. In
> > > > that case, the pages you accessed will be added to lru list side by side. So if
> > > > app does swap prefetch, we can do sequential disk access and merge small
> > > > request to big one.
> > > 
> > > How can you make sure that the range of WILLNEED was always sequentially accesssed?
> > 
> > you can't guarantee this even for file access.
> 
> Indeed.
> 
> > 
> > > > Another advantage is prefetch can drive high disk iodepth.  For sequential
> > > 
> > > What does it mean 'iodepth'? I failed to grep it in google. :(
> > 
> > io depth. How many requests are inflight at a givin time.
> 
> Thanks for the info!
> 
> > 
> > > > access, this can cause big request. Even for random access, high iodepth has
> > > > much better performance especially for SSD.
> > > 
> > > So you mean WILLNEED is always good in where both random and sequential in "SSD"?
> > > Then, how about the "Disk"?
> > 
> > Hmm, even for hard disk, high iodepth random access is faster than single
> > iodepth access. Today's disk is NCQ disk. But the speedup isn't that
> > significant like a SSD. For sequential access, both harddisk and SSD have
> > better performance with higher iodepth.
> > 
> > > Wanpeng's comment makes sense to me so I guess others can have a same question
> > > about this patch. So it would be better to write your rationale in changelog.
> > 
> > I would, but the question is just like why app wants to prefetch file pages. I
> > thought it's commonsense. The problem like memory allocation exists in file
> > prefetch too. The advantages (better IO access, CPU and disk can operate in
> > parallel and so on) apply for both file and swap prefetch.
> 
> Agreed. But I have a question about semantic of madvise(DONTNEED) of anon vma.
> If Linux start to support it for anon, user can misunderstand it following as.
> 
> User might think we start to use anonymous pages in that range soon so he
> gives the hint to kernel to map all pages of the range to page table in advance.
> (ie, pre page fault like MAP_POPULATE) and if one of the page might be
> swapped out, readahead it. What do you think about it?
> For clarification, it would be better to add man page description with Ccing
> man page maintainer.

there is no confusion if the page exists or swapped. I thought what you are are
thinking about is the page isn't populated yet. The manpage declaims WILLNEED
"it might be a good idea to read some pages ahead." This sounds clear this
isn't to populate memory and matches what we did. But I'm not sure what's the
precise description.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-01-08  9:13 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-07  8:12 Shaohua Li
2013-01-07 20:06 ` Andrew Morton
2013-01-08  2:16   ` Wanpeng Li
2013-01-08  2:16   ` Wanpeng Li
     [not found]   ` <50eb8180.6887320a.3f90.58b0SMTPIN_ADDED_BROKEN@mx.google.com>
2013-01-08  4:26     ` Shaohua Li
2013-01-08  5:38       ` Minchan Kim
2013-01-08  7:32         ` Shaohua Li
2013-01-08  7:54           ` Simon Jeons
2013-01-08  8:38           ` Minchan Kim
2013-01-08  9:13             ` Shaohua Li [this message]
2013-01-09  7:28               ` Minchan Kim
2013-01-08  8:45       ` Simon Jeons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130108091324.GA7966@kernel.org \
    --to=shli@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-mm@kvack.org \
    --cc=liwanp@linux.vnet.ibm.com \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox