From: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Jiang Liu <jiang.liu@linux.intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
David Rientjes <rientjes@google.com>
Cc: "Patil, Kiran" <kiran.patil@intel.com>,
Mel Gorman <mgorman@suse.de>,
Mike Galbraith <umgwanakikbuti@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
"Wysocki, Rafael J" <rafael.j.wysocki@intel.com>,
Tang Chen <tangchen@cn.fujitsu.com>, Tejun Heo <tj@kernel.org>,
"Kirsher, Jeffrey T" <jeffrey.t.kirsher@intel.com>,
"Brandeburg, Jesse" <jesse.brandeburg@intel.com>,
"Nelson, Shannon" <shannon.nelson@intel.com>,
"Wyborny, Carolyn" <carolyn.wyborny@intel.com>,
"Skidmore, Donald C" <donald.c.skidmore@intel.com>,
"Vick, Matthew" <matthew.vick@intel.com>,
"Ronciak, John" <john.ronciak@intel.com>,
"Williams, Mitch A" <mitch.a.williams@intel.com>,
"Luck, Tony" <tony.luck@intel.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"x86@kernel.org" <x86@kernel.org>,
"linux-hotplug@vger.kernel.org" <linux-hotplug@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"intel-wired-lan@lists.osuosl.org"
<intel-wired-lan@lists.osuosl.org>
Subject: Re: [Intel-wired-lan] [Patch V3 5/9] i40e: Use numa_mem_id() to better support memoryless node
Date: Fri, 9 Oct 2015 18:08:41 +0900 [thread overview]
Message-ID: <56178419.6090503@jp.fujitsu.com> (raw)
In-Reply-To: <56175637.50102@linux.intel.com>
On 2015/10/09 14:52, Jiang Liu wrote:
> On 2015/10/9 4:20, Andrew Morton wrote:
>> On Wed, 19 Aug 2015 17:18:15 -0700 (PDT) David Rientjes <rientjes@google.com> wrote:
>>
>>> On Wed, 19 Aug 2015, Patil, Kiran wrote:
>>>
>>>> Acked-by: Kiran Patil <kiran.patil@intel.com>
>>>
>>> Where's the call to preempt_disable() to prevent kernels with preemption
>>> from making numa_node_id() invalid during this iteration?
>>
>> David asked this question twice, received no answer and now the patch
>> is in the maintainer tree, destined for mainline.
>>
>> If I was asked this question I would respond
>>
>> The use of numa_mem_id() is racy and best-effort. If the unlikely
>> race occurs, the memory allocation will occur on the wrong node, the
>> overall result being very slightly suboptimal performance. The
>> existing use of numa_node_id() suffers from the same issue.
>>
>> But I'm not the person proposing the patch. Please don't just ignore
>> reviewer comments!
> Hi Andrew,
> Apologize for the slow response due to personal reasons!
> And thanks for answering the question from David. To be honest,
> I didn't know how to answer this question before. Actually this
> question has puzzled me for a long time when dealing with memory
> hot-removal. For normal cases, it only causes sub-optimal memory
> allocation if schedule event happens between querying NUMA node id
> and calling alloc_pages_node(). But what happens if system run into
> following execution sequence?
> 1) node = numa_mem_id();
> 2) memory hot-removal event triggers
> 2.1) remove affected memory
> 2.2) reset pgdat to zero if node becomes empty after memory removal
I'm sorry if I misunderstand something.
After commit b0dc3a342af36f95a68fe229b8f0f73552c5ca08, there is no memset().
> 3) alloc_pages_node(), which may access zero-ed pgdat structure.
?
>
> I haven't found a mechanism to protect system from above sequence yet,
> so puzzled for a long time already:(. Does stop_machine() protect
> system from such a execution sequence?
To access pgdat, a pgdat's zone should be on per-pgdat-zonelist.
Now, __build_all_zonelists() is called under stop_machine(). That's the reason
why you're asking what stop_machine() does. And, as you know, stop_machine() is not
protecting anything. The caller may fallback into removed zone.
Then, let's think.
At first, please note "pgdat" is not removed (and cannot be removed),
accessing pgdat's memory will not cause segmentation fault.
Just contents are problem. At removal, zone's page related information
and pgdat's page related information is cleared.
alloc_pages uses zonelist/zoneref/cache to walk each zones without accessing
pgdat itself. I think accessing zonelist is safe because it's an array updated
by stop_machine().
So, the problem is alloc_pages() can work correctly even if zone contains no page.
I think it should work.
(Note: zones are included in pgdat. So, zeroing pgdat means zeroing zone and other
structures. it will not work.)
So, what problem you see now ?
I'm sorry I can't chase old discusions.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-10-09 9:09 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-17 3:18 [Patch V3 0/9] Enable memoryless node support for x86 Jiang Liu
2015-08-17 3:18 ` [Patch V3 1/9] x86, NUMA, ACPI: Online node earlier when doing CPU hot-addition Jiang Liu
2015-08-17 3:18 ` [Patch V3 2/9] kernel/profile.c: Replace cpu_to_mem() with cpu_to_node() Jiang Liu
2015-08-18 0:31 ` David Rientjes
2015-08-19 7:18 ` Jiang Liu
2015-08-20 0:00 ` David Rientjes
2015-10-09 2:35 ` Jiang Liu
2015-08-17 3:19 ` [Patch V3 3/9] sgi-xp: Replace cpu_to_node() with cpu_to_mem() to support memoryless node Jiang Liu
2015-08-18 0:25 ` David Rientjes
2015-08-19 8:20 ` Jiang Liu
2015-08-20 0:02 ` David Rientjes
2015-08-20 6:36 ` Jiang Liu
2015-10-09 5:04 ` Jiang Liu
2015-08-19 11:52 ` Robin Holt
2015-08-19 12:45 ` Jiang Liu
2015-08-17 3:19 ` [Patch V3 4/9] openvswitch: " Jiang Liu
2015-08-18 0:14 ` Pravin Shelar
2015-08-17 3:19 ` [Patch V3 5/9] i40e: Use numa_mem_id() to better " Jiang Liu
2015-08-18 0:35 ` David Rientjes
2015-08-19 22:38 ` [Intel-wired-lan] " Patil, Kiran
2015-08-20 0:18 ` David Rientjes
2015-10-08 20:20 ` Andrew Morton
2015-10-09 5:52 ` Jiang Liu
2015-10-09 9:08 ` Kamezawa Hiroyuki [this message]
2015-10-09 9:25 ` Jiang Liu
2015-08-17 3:19 ` [Patch V3 6/9] i40evf: " Jiang Liu
2015-08-17 19:03 ` [Intel-wired-lan] " Patil, Kiran
2015-08-18 21:34 ` Jeff Kirsher
2015-08-17 3:19 ` [Patch V3 7/9] x86, numa: Kill useless code to improve code readability Jiang Liu
2015-08-17 3:19 ` [Patch V3 8/9] mm: Update _mem_id_[] for every possible CPU when memory configuration changes Jiang Liu
2015-08-17 3:19 ` [Patch V3 9/9] mm, x86: Enable memoryless node support to better support CPU/memory hotplug Jiang Liu
2015-08-18 6:11 ` Tang Chen
2015-08-18 6:59 ` Jiang Liu
2015-08-18 11:28 ` Tang Chen
2015-08-18 7:31 ` Ingo Molnar
2015-08-17 21:35 ` [Patch V3 0/9] Enable memoryless node support for x86 Andrew Morton
2015-08-18 10:02 ` Tang Chen
2015-08-19 8:09 ` Jiang Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56178419.6090503@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=carolyn.wyborny@intel.com \
--cc=donald.c.skidmore@intel.com \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=jeffrey.t.kirsher@intel.com \
--cc=jesse.brandeburg@intel.com \
--cc=jiang.liu@linux.intel.com \
--cc=john.ronciak@intel.com \
--cc=kiran.patil@intel.com \
--cc=linux-hotplug@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=matthew.vick@intel.com \
--cc=mgorman@suse.de \
--cc=mitch.a.williams@intel.com \
--cc=netdev@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=rafael.j.wysocki@intel.com \
--cc=rientjes@google.com \
--cc=shannon.nelson@intel.com \
--cc=tangchen@cn.fujitsu.com \
--cc=tj@kernel.org \
--cc=tony.luck@intel.com \
--cc=umgwanakikbuti@gmail.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox