From: David Rientjes <rientjes@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
Fengguang Wu <fengguang.wu@intel.com>,
David Cohen <david.a.cohen@linux.intel.com>,
Al Viro <viro@zeniv.linux.org.uk>,
Damien Ramonda <damien.ramonda@intel.com>,
Jan Kara <jack@suse.cz>, Linus <torvalds@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH V5] mm readahead: Fix readahead fail for no local memory and limit readahead pages
Date: Thu, 6 Feb 2014 15:48:22 -0800 (PST) [thread overview]
Message-ID: <alpine.DEB.2.02.1402061537180.3441@chino.kir.corp.google.com> (raw)
In-Reply-To: <20140206152219.45c2039e5092c8ea1c31fd38@linux-foundation.org>
On Thu, 6 Feb 2014, Andrew Morton wrote:
> On Thu, 6 Feb 2014 14:58:21 -0800 (PST) David Rientjes <rientjes@google.com> wrote:
>
> > > > +#define MAX_REMOTE_READAHEAD 4096UL
> > > > /*
> > > > * Given a desired number of PAGE_CACHE_SIZE readahead pages, return a
> > > > * sensible upper limit.
> > > > */
> > > > unsigned long max_sane_readahead(unsigned long nr)
> > > > {
> > > > - return min(nr, (node_page_state(numa_node_id(), NR_INACTIVE_FILE)
> > > > - + node_page_state(numa_node_id(), NR_FREE_PAGES)) / 2);
> > > > + unsigned long local_free_page;
> > > > + int nid;
> > > > +
> > > > + nid = numa_node_id();
> >
> > If you're intending this to be cached for your calls into
> > node_page_state() you need nid = ACCESS_ONCE(numa_node_id()).
>
> ugh. That's too subtle and we didn't even document it.
>
> We could put the ACCESS_ONCE inside numa_node_id() I assume but we
> still have the same problem as smp_processor_id(): the numa_node_id()
> return value is wrong as soon as you obtain it if running preemptibly.
>
> We could plaster Big Fat Warnings all over the place or we could treat
> numa_node_id() and derivatives in the same way as smp_processor_id()
> (which is a huge pain). Or something else, but we've left a big hand
> grenade here and Raghavendra won't be the last one to pull the pin?
>
Normally it wouldn't matter because there's no significant downside to it
racing, things like mempolicies which use numa_node_id() extensively would
result in, oops, a page allocation on the wrong node.
This stands out to me, though, because you're expecting the calculation to
be correct for a specific node.
The patch is still wrong, though, it should just do
int node = ACCESS_ONCE(numa_mem_id());
return min(nr, (node_page_state(node, NR_INACTIVE_FILE) +
node_page_state(node, NR_FREE_PAGES)) / 2);
since we want to readahead based on the cpu's local node, the comment
saying we're reading ahead onto "remote memory" is wrong since a
memoryless node has local affinity to numa_mem_id().
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-02-06 23:48 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-22 10:53 Raghavendra K T
2014-02-03 8:30 ` Raghavendra K T
2014-02-06 22:51 ` Andrew Morton
2014-02-06 22:58 ` David Rientjes
2014-02-06 23:22 ` Andrew Morton
2014-02-06 23:48 ` David Rientjes [this message]
2014-02-06 23:58 ` David Rientjes
2014-02-07 10:42 ` Raghavendra K T
2014-02-07 20:41 ` David Rientjes
2014-02-10 8:21 ` Raghavendra K T
2014-02-10 10:05 ` David Rientjes
2014-02-10 12:25 ` Raghavendra K T
2014-02-10 21:35 ` David Rientjes
2014-02-13 7:07 ` Raghavendra K T
2014-02-13 8:05 ` David Rientjes
2014-02-13 10:04 ` Raghavendra K T
2014-02-13 22:41 ` David Rientjes
2014-02-14 0:14 ` Nishanth Aravamudan
2014-02-14 0:37 ` Linus Torvalds
2014-02-14 0:45 ` Andrew Morton
2014-02-14 4:32 ` Nishanth Aravamudan
2014-02-14 10:54 ` David Rientjes
2014-02-17 19:28 ` Nishanth Aravamudan
2014-02-17 23:14 ` David Rientjes
2014-02-18 1:31 ` Nishanth Aravamudan
2014-02-17 22:59 ` Linus Torvalds
2014-02-14 7:43 ` Jan Kara
2014-02-17 22:57 ` Linus Torvalds
2014-02-14 5:47 ` Nishanth Aravamudan
2014-02-13 21:06 ` Andrew Morton
2014-02-13 21:42 ` Nishanth Aravamudan
2014-02-10 8:29 ` [RFC PATCH V5 RESEND] " Raghavendra K T
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.02.1402061537180.3441@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=akpm@linux-foundation.org \
--cc=damien.ramonda@intel.com \
--cc=david.a.cohen@linux.intel.com \
--cc=fengguang.wu@intel.com \
--cc=jack@suse.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=raghavendra.kt@linux.vnet.ibm.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox