linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: akpm@linux-foundation.org
Cc: "Jérôme Glisse" <jglisse@redhat.com>,
	"Logan Gunthorpe" <logang@deltatee.com>,
	"Toshi Kani" <toshi.kani@hpe.com>,
	"Jeff Moyer" <jmoyer@redhat.com>,
	"Michal Hocko" <mhocko@suse.com>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	stable@vger.kernel.org, linux-mm@kvack.org,
	linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org
Subject: [PATCH v5 00/10] mm: Sub-section memory hotplug support
Date: Fri, 22 Mar 2019 09:57:54 -0700	[thread overview]
Message-ID: <155327387405.225273.9325594075351253804.stgit@dwillia2-desk3.amr.corp.intel.com> (raw)

Changes since v4 [1]:
- Given v4 was from March of 2017 the bulk of the changes result from
  rebasing the patch set from a v4.11-rc2 baseline to v5.1-rc1.

- A unit test is added to ndctl to exercise the creation and dax
  mounting of multiple independent namespaces in a single 128M section.

[1]: https://lwn.net/Articles/717383/

---

Quote patch7:

"The libnvdimm sub-system has suffered a series of hacks and broken
 workarounds for the memory-hotplug implementation's awkward
 section-aligned (128MB) granularity. For example the following backtrace
 is emitted when attempting arch_add_memory() with physical address
 ranges that intersect 'System RAM' (RAM) with 'Persistent Memory' (PMEM)
 within a given section:
 
  WARNING: CPU: 0 PID: 558 at kernel/memremap.c:300 devm_memremap_pages+0x3b5/0x4c0
  devm_memremap_pages attempted on mixed region [mem 0x200000000-0x2fbffffff flags 0x200]
  [..]
  Call Trace:
    dump_stack+0x86/0xc3
    __warn+0xcb/0xf0
    warn_slowpath_fmt+0x5f/0x80
    devm_memremap_pages+0x3b5/0x4c0
    __wrap_devm_memremap_pages+0x58/0x70 [nfit_test_iomap]
    pmem_attach_disk+0x19a/0x440 [nd_pmem]
 
 Recently it was discovered that the problem goes beyond RAM vs PMEM
 collisions as some platform produce PMEM vs PMEM collisions within a
 given section. The libnvdimm workaround for that case revealed that the
 libnvdimm section-alignment-padding implementation has been broken for a
 long while. A fix for that long-standing breakage introduces as many
 problems as it solves as it would require a backward-incompatible change
 to the namespace metadata interpretation. Instead of that dubious route
 [2], address the root problem in the memory-hotplug implementation."

The approach is taken is to observe that each section already maintains
an array of 'unsigned long' values to hold the pageblock_flags. A single
additional 'unsigned long' is added to house a 'sub-section active'
bitmask. Each bit tracks the mapped state of one sub-section's worth of
capacity which is SECTION_SIZE / BITS_PER_LONG, or 2MB on x86-64.

The implication of allowing sections to be piecemeal mapped/unmapped is
that the valid_section() helper is no longer authoritative to determine
if a section is fully mapped. Instead pfn_valid() is updated to consult
the section-active bitmask. Given that typical memory hotplug still has
deep "section" dependencies the sub-section capability is limited to
'want_memblock=false' invocations of arch_add_memory(), effectively only
devm_memremap_pages() users for now.

With this in place the hacks in the libnvdimm sub-system can be
dropped, and other devm_memremap_pages() users need no longer be
constrained to 128MB mapping granularity.

[2]: https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com

---

Dan Williams (10):
      mm/sparsemem: Introduce struct mem_section_usage
      mm/sparsemem: Introduce common definitions for the size and mask of a section
      mm/sparsemem: Add helpers track active portions of a section at boot
      mm/hotplug: Prepare shrink_{zone,pgdat}_span for sub-section removal
      mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap()
      mm/sparsemem: Prepare for sub-section ranges
      mm/sparsemem: Support sub-section hotplug
      mm/devm_memremap_pages: Enable sub-section remap
      libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields
      libnvdimm/pfn: Stop padding pmem namespaces to section alignment


 arch/x86/mm/init_64.c          |   15 +-
 drivers/nvdimm/dax_devs.c      |    2 
 drivers/nvdimm/pfn.h           |   12 -
 drivers/nvdimm/pfn_devs.c      |   93 +++-------
 include/linux/memory_hotplug.h |    7 -
 include/linux/mm.h             |    4 
 include/linux/mmzone.h         |   60 ++++++
 kernel/memremap.c              |   57 ++----
 mm/hmm.c                       |    2 
 mm/memory_hotplug.c            |  119 +++++++-----
 mm/page_alloc.c                |    6 -
 mm/sparse-vmemmap.c            |   21 +-
 mm/sparse.c                    |  382 ++++++++++++++++++++++++++++------------
 13 files changed, 476 insertions(+), 304 deletions(-)


             reply	other threads:[~2019-03-22 17:10 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-22 16:57 Dan Williams [this message]
2019-03-22 16:57 ` [PATCH v5 01/10] mm/sparsemem: Introduce struct mem_section_usage Dan Williams
2019-03-22 16:58 ` [PATCH v5 02/10] mm/sparsemem: Introduce common definitions for the size and mask of a section Dan Williams
2019-03-22 16:58 ` [PATCH v5 03/10] mm/sparsemem: Add helpers track active portions of a section at boot Dan Williams
2019-03-22 16:58 ` [PATCH v5 04/10] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal Dan Williams
2019-03-22 16:58 ` [PATCH v5 05/10] mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap() Dan Williams
2019-03-22 16:58 ` [PATCH v5 06/10] mm/sparsemem: Prepare for sub-section ranges Dan Williams
2019-03-22 16:58 ` [PATCH v5 07/10] mm/sparsemem: Support sub-section hotplug Dan Williams
2019-03-22 16:58 ` [PATCH v5 08/10] mm/devm_memremap_pages: Enable sub-section remap Dan Williams
2019-03-22 16:58 ` [PATCH v5 09/10] libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields Dan Williams
2019-03-27 14:00   ` Sasha Levin
2019-03-22 16:58 ` [PATCH v5 10/10] libnvdimm/pfn: Stop padding pmem namespaces to section alignment Dan Williams
2019-03-22 18:05 ` [PATCH v5 00/10] mm: Sub-section memory hotplug support Michal Hocko
2019-03-22 18:32   ` Dan Williams
2019-03-25 10:19     ` Michal Hocko
2019-03-25 14:28       ` Jeff Moyer
2019-03-25 14:50         ` Michal Hocko
2019-03-25 20:03       ` Dan Williams
2019-03-26  8:04         ` Michal Hocko
2019-03-27  0:20           ` Dan Williams
2019-03-27 16:13             ` Michal Hocko
2019-03-27 16:17               ` Dan Williams
2019-03-28 13:38               ` David Hildenbrand
2019-03-28 14:16                 ` Michal Hocko
2019-04-01  9:18             ` David Hildenbrand
2019-03-28 20:10 ` David Hildenbrand
2019-03-28 20:43   ` Dan Williams
2019-03-28 21:17     ` David Hildenbrand
2019-03-28 21:32       ` Dan Williams
2019-03-28 21:54         ` David Hildenbrand
2019-04-10  9:51 ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=155327387405.225273.9325594075351253804.stgit@dwillia2-desk3.amr.corp.intel.com \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=jglisse@redhat.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=logang@deltatee.com \
    --cc=mhocko@suse.com \
    --cc=stable@vger.kernel.org \
    --cc=toshi.kani@hpe.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox