From: Toshi Kani <toshi.kani@hp.com>
To: Jiang Liu <jiang.liu@huawei.com>
Cc: Jaegeuk Hanse <jaegeuk.hanse@gmail.com>,
Tang Chen <tangchen@cn.fujitsu.com>,
hpa@zytor.com, akpm@linux-foundation.org, rob@landley.net,
isimatu.yasuaki@jp.fujitsu.com, laijs@cn.fujitsu.com,
wency@cn.fujitsu.com, linfeng@cn.fujitsu.com, yinghai@kernel.org,
kosaki.motohiro@jp.fujitsu.com, minchan.kim@gmail.com,
mgorman@suse.de, rientjes@google.com, rusty@rustcorp.com.au,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-doc@vger.kernel.org, Len Brown <lenb@kernel.org>,
Tony Luck <tony.luck@intel.com>,
"Wang, Frank" <frank.wang@intel.com>
Subject: Re: [PATCH v2 0/5] Add movablecore_map boot option
Date: Fri, 30 Nov 2012 15:27:15 -0700 [thread overview]
Message-ID: <1354314435.20085.55.camel@misato.fc.hp.com> (raw)
In-Reply-To: <50B6C7A4.806@huawei.com>
On Thu, 2012-11-29 at 10:25 +0800, Jiang Liu wrote:
> On 2012-11-29 9:42, Jaegeuk Hanse wrote:
> > On Wed, Nov 28, 2012 at 04:47:42PM +0800, Jiang Liu wrote:
> >> Hi all,
> >> Seems it's a great chance to discuss about the memory hotplug feature
> >> within this thread. So I will try to give some high level thoughts about memory
> >> hotplug feature on x86/IA64. Any comments are welcomed!
> >> First of all, I think usability really matters. Ideally, memory hotplug
> >> feature should just work out of box, and we shouldn't expect administrators to
> >> add several extra platform dependent parameters to enable memory hotplug.
> >> But how to enable memory (or CPU/node) hotplug out of box? I think the key point
> >> is to cooperate with BIOS/ACPI/firmware/device management teams.
> >> I still position memory hotplug as an advanced feature for high end
> >> servers and those systems may/should provide some management interfaces to
> >> configure CPU/memory/node hotplug features. The configuration UI may be provided
> >> by BIOS, BMC or centralized system management suite. Once administrator enables
> >> hotplug feature through those management UI, OS should support system device
> >> hotplug out of box. For example, HP SuperDome2 management suite provides interface
> >> to configure a node as floating node(hot-removable). And OpenSolaris supports
> >> CPU/memory hotplug out of box without any extra configurations. So we should
> >> shape interfaces between firmware and OS to better support system device hotplug.
Well described. I agree with you. I am also OK to have the boot option
for the time being, but we should be able to get the info from ACPI for
better TCE.
> >> On the other hand, I think there are no commercial available x86/IA64
> >> platforms with system device hotplug capabilities in the field yet, at least only
> >> limited quantity if any. So backward compatibility is not a big issue for us now.
HP SuperDome is IA64-based and supports node hotplug when running with
HP-UX. It implements vendor-unique ACPI interface to describe movable
memory ranges.
> >> So I think it's doable to rely on firmware to provide better support for system
> >> device hotplug.
> >> Then what should be enhanced to better support system device hotplug?
> >>
> >> 1) ACPI specification should be enhanced to provide a static table to describe
> >> components with hotplug features, so OS could reserve special resources for
> >> hotplug at early boot stages. For example, to reserve enough CPU ids for CPU
> >> hot-add. Currently we guess maximum number of CPUs supported by the platform
> >> by counting CPU entries in APIC table, that's not reliable.
Right. HP SuperDome implements vendor-unique ACPI interface for this as
well. For Linux, it is nice to have a standard interface defined.
> >> 2) BIOS should implement SRAT, MPST and PMTT tables to better support memory
> >> hotplug. SRAT associates memory ranges with proximity domains with an extra
> >> "hotpluggable" flag. PMTT provides memory device topology information, such
> >> as "socket->memory controller->DIMM". MPST is used for memory power management
> >> and provides a way to associate memory ranges with memory devices in PMTT.
> >> With all information from SRAT, MPST and PMTT, OS could figure out hotplug
> >> memory ranges automatically, so no extra kernel parameters needed.
I agree that using SRAT is a good compromise. The hotpluggable flag is
supposed to indicate the platform's capability, but could use for this
purpose until we have a better interface defined.
> >> 3) Enhance ACPICA to provide a method to scan static ACPI tables before
> >> memory subsystem has been initialized because OS need to access SRAT,
> >> MPST and PMTT when initializing memory subsystem.
I do not think this is an ACPICA issue. HP-UX also uses ACPICA, and can
access ACPI tables and walk ACPI namespace during early boot-time. This
is achieved by the acpi_os layer to use special early boot-time memory
allocator at early boot-time. Therefore, boot-time and hot-add config
code are very consistent in HP-UX.
> >> 4) The last and the most important issue is how to minimize performance
> >> drop caused by memory hotplug. As proposed by this patchset, once we
> >> configure all memory of a NUMA node as movable, it essentially disable
> >> NUMA optimization of kernel memory allocation from that node. According
> >> to experience, that will cause huge performance drop. We have observed
> >> 10-30% performance drop with memory hotplug enabled. And on another
> >> OS the average performance drop caused by memory hotplug is about 10%.
> >> If we can't resolve the performance drop, memory hotplug is just a feature
> >> for demo:( With help from hardware, we do have some chances to reduce
> >> performance penalty caused by memory hotplug.
> >> As we know, Linux could migrate movable page, but can't migrate
> >> non-movable pages used by kernel/DMA etc. And the most hard part is how
> >> to deal with those unmovable pages when hot-removing a memory device.
> >> Now hardware has given us a hand with a technology named memory migration,
> >> which could transparently migrate memory between memory devices. There's
> >> no OS visible changes except NUMA topology before and after hardware memory
> >> migration.
> >> And if there are multiple memory devices within a NUMA node,
> >> we could configure some memory devices to host unmovable memory and the
> >> other to host movable memory. With this configuration, there won't be
> >> bigger performance drop because we have preserved all NUMA optimizations.
> >> We also could achieve memory hotplug remove by:
> >> 1) Use existing page migration mechanism to reclaim movable pages.
> >> 2) For memory devices hosting unmovable pages, we need:
> >> 2.1) find a movable memory device on other nodes with enough capacity
> >> and reclaim it.
> >> 2.2) use hardware migration technology to migrate unmovable memory to
> >> the just reclaimed memory device on other nodes.
>>>
> >> I hope we could expect users to adopt memory hotplug technology
> >> with all these implemented.
> >>
> >> Back to this patch, we could rely on the mechanism provided
> >> by it to automatically mark memory ranges as movable with information
> >>from ACPI SRAT/MPST/PMTT tables. So we don't need administrator to
> >> manually configure kernel parameters to enable memory hotplug.
Right.
Thanks,
-Toshi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2012-11-30 22:35 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-23 10:44 Tang Chen
2012-11-23 10:44 ` [PATCH v2 1/5] x86: get pg_data_t's memory from other node Tang Chen
2012-11-24 1:19 ` Jiang Liu
2012-11-26 1:19 ` Tang Chen
2012-12-02 15:11 ` Jiang Liu
2012-11-23 10:44 ` [PATCH v2 2/5] page_alloc: add movable_memmap kernel parameter Tang Chen
2012-11-23 10:44 ` [PATCH v2 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes Tang Chen
2012-12-05 15:46 ` Jiang Liu
2012-12-06 1:20 ` Tang Chen
2012-11-23 10:44 ` [PATCH v2 4/5] page_alloc: Make movablecore_map has higher priority Tang Chen
2012-12-05 15:43 ` Jiang Liu
2012-12-06 1:26 ` Tang Chen
2012-12-06 2:26 ` Jiang Liu
2012-12-06 2:51 ` Jianguo Wu
2012-12-06 2:57 ` Tang Chen
2012-12-09 8:10 ` Tang Chen
2012-12-10 2:15 ` Jiang Liu
2012-11-23 10:44 ` [PATCH v2 5/5] page_alloc: Bootmem limit with movablecore_map Tang Chen
2012-11-26 12:22 ` wujianguo
2012-11-26 12:53 ` Tang Chen
2012-11-26 12:40 ` wujianguo
2012-11-26 13:15 ` Tang Chen
2012-11-26 15:48 ` H. Peter Anvin
2012-11-27 0:58 ` Jianguo Wu
2012-11-27 3:19 ` Wen Congyang
2012-11-27 3:22 ` Jianguo Wu
2012-11-27 3:34 ` Wen Congyang
2012-11-27 1:12 ` Jiang Liu
2012-11-27 1:20 ` H. Peter Anvin
2012-11-27 3:15 ` Wen Congyang
2012-11-27 5:31 ` H. Peter Anvin
2012-12-06 17:28 ` Jiang Liu
2012-12-06 17:41 ` H. Peter Anvin
2012-12-07 0:18 ` Jiang Liu
2012-12-19 9:17 ` Tang Chen
2012-11-27 3:10 ` [PATCH v2 0/5] Add movablecore_map boot option wujianguo
2012-11-27 5:43 ` Tang Chen
2012-11-27 6:20 ` H. Peter Anvin
2012-11-27 6:47 ` Jianguo Wu
2012-11-28 3:47 ` Tang Chen
2012-11-28 4:01 ` Jiang Liu
2012-11-28 5:21 ` Wen Congyang
2012-11-28 5:17 ` Jiang Liu
2012-11-28 4:53 ` Jianguo Wu
2012-11-27 8:00 ` Bob Liu
2012-11-27 8:29 ` Tang Chen
2012-11-27 8:49 ` H. Peter Anvin
2012-11-27 9:47 ` Wen Congyang
2012-11-27 9:53 ` H. Peter Anvin
2012-11-27 9:59 ` Yasuaki Ishimatsu
2012-11-27 12:09 ` Bob Liu
2012-11-27 12:49 ` Tang Chen
2012-11-28 3:24 ` Bob Liu
2012-11-28 4:08 ` Jiang Liu
2012-11-28 6:16 ` Tang Chen
2012-11-28 7:03 ` Jiang Liu
2012-11-28 8:29 ` Wen Congyang
2012-11-28 8:28 ` Jiang Liu
2012-11-28 8:38 ` Wen Congyang
2012-11-29 0:43 ` Jaegeuk Hanse
2012-11-29 1:24 ` Tang Chen
2012-11-30 9:20 ` Lai Jiangshan
2012-11-28 8:47 ` Jiang Liu
2012-11-28 21:34 ` Luck, Tony
2012-11-28 21:38 ` H. Peter Anvin
2012-11-29 11:00 ` Mel Gorman
2012-11-29 16:07 ` H. Peter Anvin
2012-11-29 22:41 ` Luck, Tony
2012-11-29 22:45 ` H. Peter Anvin
2012-11-30 2:56 ` Jiang Liu
2012-11-30 3:15 ` Yasuaki Ishimatsu
2012-11-30 15:36 ` Jiang Liu
2012-11-30 2:58 ` Luck, Tony
2012-11-30 3:28 ` H. Peter Anvin
2012-11-30 10:19 ` Glauber Costa
2012-11-30 10:52 ` Mel Gorman
2012-11-29 10:38 ` Yasuaki Ishimatsu
2012-11-29 11:05 ` Mel Gorman
2012-11-29 15:47 ` Jiang Liu
2012-11-29 15:53 ` Jiang Liu
2012-11-29 1:42 ` Jaegeuk Hanse
2012-11-29 2:25 ` Jiang Liu
2012-11-29 2:49 ` Wanpeng Li
2012-11-29 2:59 ` Jiang Liu
2012-11-29 2:49 ` Wanpeng Li
2012-11-30 22:27 ` Toshi Kani [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1354314435.20085.55.camel@misato.fc.hp.com \
--to=toshi.kani@hp.com \
--cc=akpm@linux-foundation.org \
--cc=frank.wang@intel.com \
--cc=hpa@zytor.com \
--cc=isimatu.yasuaki@jp.fujitsu.com \
--cc=jaegeuk.hanse@gmail.com \
--cc=jiang.liu@huawei.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=laijs@cn.fujitsu.com \
--cc=lenb@kernel.org \
--cc=linfeng@cn.fujitsu.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=minchan.kim@gmail.com \
--cc=rientjes@google.com \
--cc=rob@landley.net \
--cc=rusty@rustcorp.com.au \
--cc=tangchen@cn.fujitsu.com \
--cc=tony.luck@intel.com \
--cc=wency@cn.fujitsu.com \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox