linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: kirill@shutemov.name
Cc: Konstantin Khlebnikov <koct9i@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Wu Fengguang <fengguang.wu@intel.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Borislav Petkov <bp@alien8.de>,
	gorcunov@openvz.org
Subject: Re: [PATCH 0/4] pagecache scanning with /proc/kpagecache
Date: Thu, 22 May 2014 13:47:48 -0400	[thread overview]
Message-ID: <537e385d.8764b40a.0a1f.ffffabccSMTPIN_ADDED_BROKEN@mx.google.com> (raw)
In-Reply-To: <20140522103632.GA23680@node.dhcp.inet.fi>

On Thu, May 22, 2014 at 01:36:32PM +0300, Kirill A. Shutemov wrote:
> On Thu, May 22, 2014 at 01:50:22PM +0400, Konstantin Khlebnikov wrote:
> > On Thu, May 22, 2014 at 6:33 AM, Andrew Morton
> > <akpm@linux-foundation.org> wrote:
> > > On Wed, 21 May 2014 22:19:55 -0400 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> > >
> > >> > A much nicer interface would be for us to (finally!) implement
> > >> > fincore(), perhaps with an enhanced per-present-page payload which
> > >> > presents the info which you need (although we don't actually know what
> > >> > that info is!).
> > >>
> > >> page/pfn of each page slot and its page cache tag as shown in patch 4/4.
> > >>
> > >> > This would require open() - it appears to be a requirement that the
> > >> > caller not open the file, but no reason was given for this.
> > >> >
> > >> > Requiring open() would address some of the obvious security concerns,
> > >> > but it will still be possible for processes to poke around and get some
> > >> > understanding of the behaviour of other processes.  Careful attention
> > >> > should be paid to this aspect of any such patchset.
> > >>
> > >> Sorry if I missed your point, but this interface defines fixed mapping
> > >> between file position in /proc/kpagecache and in-file page offset of
> > >> the target file. So we do not need to use seq_file mechanism, that's
> > >> why open() is not defined and default one is used.
> > >> The same thing is true for /proc/{kpagecount,kpageflags}, from which
> > >> I copied/pasted some basic code.
> > >
> > > I think you did miss my point ;) Please do a web search for fincore -
> > > it's a syscall similar to mincore(), only it queries pagecache:
> > > fincore(int fd, loff_t offset, ...).  In its simplest form it queries
> > > just for present/absent, but we could increase the query payload to
> > > incorporate additional per-page info.
> > >
> > > It would take a lot of thought and discussion to nail down the
> > > fincore() interface (we've already tried a couple of times).  But
> > > unfortunately, fincore() is probably going to be implemented one day
> > > and it will (or at least could) make /proc/kpagecache obsolete.
> > >
> > 
> > It seems fincore() also might obsolete /proc/kpageflags and /proc/pid/pagemap.
> > because it might be implemented for /dev/mem and /proc/pid/mem as well
> > as for normal files.
> 
> > Something like this:
> > int fincore(int fd, u64 *kpf, u64 *pfn, size_t length, off_t offset)
> 
> As always with new syscalls flags are missing ;)
> 
> u64 for kpf doesn't sound future proof enough. What about this:
> 
> int fincore(int fd, size_t length, off_t offset,
> 	unsigned long flags, void *records);
> 
> Format of records is defined by what user asks in flags. Like:
> 
>  - FINCORE_PFN: records are 64-bit each with pfn;
>  - FINCORE_PAGE_FLAGS: records are 64-bit each with flags;

I hope that the flags we get from this mode contains pagecache tag info
as well as KPF_*.

>  - FINCORE_PFN | FINCORE_PAGE_FLAGS: records are 128-bit each with pfns
>    followed by flags (or vice versa);
> 
> New flags can extend the format if we would want to expose more info.
> 
> Comments?

Maybe mincore()-like bitmap mode (FINCORE_BMAP) is also helpful who wants
minimum memory footprint?

Anyway I like this extensible interface you're suggesting.

> BTW, does everybody happy with mincore() interface? We report 1 there if
> pte is present, but it doesn't really say much about the page for cases
> like zero page...

According to manpage of mincore(2), 
  mincore()  returns a vector that indicates whether pages of the calling process's vir‐
  tual memory are resident in core (RAM), and so will not  cause  a  disk  access  (page
  fault) if referenced.  ...

so we can assume that the callers want to predict whether they will have
page faults. But it depends on whether the access is read or write.
So I think current mincore() is not enough to do this prediction precisely
for privately shared pages (including zero page and ksm page).
Maybe we need a new syscall to solving this problem.

Thanks,
Naoya

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-05-22 17:48 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-21  2:26 Naoya Horiguchi
2014-05-21  2:26 ` [PATCH 1/4] radix-tree: add end_index to support ranged iteration Naoya Horiguchi
2014-05-21  8:21   ` Konstantin Khlebnikov
2014-05-21 19:26     ` Naoya Horiguchi
2014-05-21  2:26 ` [PATCH 2/4] fs/proc/page.c: introduce /proc/kpagecache interface Naoya Horiguchi
2014-05-21  2:26 ` [PATCH 3/4] tools/vm/page-types.c: rework on file cache scanning mode Naoya Horiguchi
2014-05-21  2:26 ` [PATCH 4/4] Documentation: update Documentation/vm/pagemap.txt Naoya Horiguchi
2014-05-21 22:42 ` [PATCH 0/4] pagecache scanning with /proc/kpagecache Andrew Morton
2014-05-22  2:19   ` Naoya Horiguchi
     [not found]   ` <537d5ee4.4914e00a.5672.ffff85d5SMTPIN_ADDED_BROKEN@mx.google.com>
2014-05-22  2:33     ` Andrew Morton
2014-05-22  9:50       ` Konstantin Khlebnikov
2014-05-22 10:36         ` Kirill A. Shutemov
2014-05-22 17:47           ` Naoya Horiguchi [this message]
2014-05-22 21:02             ` Naoya Horiguchi
2014-06-02  5:24       ` [RFC][PATCH 0/3] mm: introduce fincore() Naoya Horiguchi
2014-06-02  5:24         ` [PATCH 1/3] replace PAGECACHE_TAG_* definition with enumeration Naoya Horiguchi
2014-06-02 16:12           ` Dave Hansen
2014-06-02 16:37             ` Naoya Horiguchi
     [not found]             ` <1401727052-f7v7kykv@n-horiguchi@ah.jp.nec.com>
2014-06-02 16:45               ` Dave Hansen
2014-06-02 17:14                 ` Naoya Horiguchi
2014-06-02 18:19                   ` Dave Hansen
2014-06-02 18:48                     ` Naoya Horiguchi
2014-06-02 21:16             ` Andrew Morton
2014-06-02 21:51               ` Naoya Horiguchi
2014-06-02  5:24         ` [PATCH 2/3] mm: introduce fincore() Naoya Horiguchi
2014-06-02  6:42           ` Christoph Hellwig
2014-06-02 14:19             ` Naoya Horiguchi
2014-06-02  7:06           ` Michael Kerrisk
2014-06-02 14:21             ` Naoya Horiguchi
2014-06-02 12:23           ` Kirill A. Shutemov
2014-06-02 14:52             ` Naoya Horiguchi
2014-06-02 16:11           ` Dave Hansen
2014-06-02 16:22             ` Naoya Horiguchi
2014-06-02  5:24         ` [PATCH 3/3] selftest: add test code for fincore() Naoya Horiguchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=537e385d.8764b40a.0a1f.ffffabccSMTPIN_ADDED_BROKEN@mx.google.com \
    --to=n-horiguchi@ah.jp.nec.com \
    --cc=acme@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=fengguang.wu@intel.com \
    --cc=gorcunov@openvz.org \
    --cc=kirill@shutemov.name \
    --cc=koct9i@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox