linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Pavel Tatashin <pasha.tatashin@oracle.com>
To: steven.sistare@oracle.com, daniel.m.jordan@oracle.com,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	kirill.shutemov@linux.intel.com, mhocko@suse.com,
	linux-mm@kvack.org, dan.j.williams@intel.com, jack@suse.cz,
	jglisse@redhat.com, jrdr.linux@gmail.com, bhe@redhat.com,
	gregkh@linuxfoundation.org, vbabka@suse.cz,
	richard.weiyang@gmail.com, dave.hansen@intel.com,
	rientjes@google.com, mingo@kernel.org,
	osalvador@techadventures.net, pasha.tatashin@oracle.com,
	abdhalee@linux.vnet.ibm.com, mpe@ellerman.id.au
Subject: [PATCH v5 4/5] mm/sparse: add new sparse_init_nid() and sparse_init()
Date: Thu, 12 Jul 2018 16:37:29 -0400	[thread overview]
Message-ID: <20180712203730.8703-5-pasha.tatashin@oracle.com> (raw)
In-Reply-To: <20180712203730.8703-1-pasha.tatashin@oracle.com>

sparse_init() requires to temporary allocate two large buffers:
usemap_map and map_map. Baoquan He has identified that these buffers are so
large that Linux is not bootable on small memory machines, such as a kdump
boot. The buffers are especially large when CONFIG_X86_5LEVEL is set, as
they are scaled to the maximum physical memory size.

Baoquan provided a fix, which reduces these sizes of these buffers, but it
is much better to get rid of them entirely.

Add a new way to initialize sparse memory: sparse_init_nid(), which only
operates within one memory node, and thus allocates memory either in large
contiguous block or allocates section by section. This eliminates the need
for use of temporary buffers.

For simplified bisecting and review temporarly call sparse_init()
new_sparse_init(), the new interface is going to be enabled as well as old
code removed in the next patch.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 mm/sparse.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)

diff --git a/mm/sparse.c b/mm/sparse.c
index 01c616342909..4087b94afddf 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -200,6 +200,11 @@ static inline int next_present_section_nr(int section_nr)
 	      (section_nr <= __highest_present_section_nr));	\
 	     section_nr = next_present_section_nr(section_nr))
 
+static inline unsigned long first_present_section_nr(void)
+{
+	return next_present_section_nr(-1);
+}
+
 /*
  * Record how many memory sections are marked as present
  * during system bootup.
@@ -668,6 +673,86 @@ void __init sparse_init(void)
 	memblock_free_early(__pa(usemap_map), size);
 }
 
+/*
+ * Initialize sparse on a specific node. The node spans [pnum_begin, pnum_end)
+ * And number of present sections in this node is map_count.
+ */
+static void __init sparse_init_nid(int nid, unsigned long pnum_begin,
+				   unsigned long pnum_end,
+				   unsigned long map_count)
+{
+	unsigned long pnum, usemap_longs, *usemap;
+	struct page *map;
+
+	usemap_longs = BITS_TO_LONGS(SECTION_BLOCKFLAGS_BITS);
+	usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nid),
+							  usemap_size() *
+							  map_count);
+	if (!usemap) {
+		pr_err("%s: node[%d] usemap allocation failed", __func__, nid);
+		goto failed;
+	}
+	sparse_buffer_init(map_count * section_map_size(), nid);
+	for_each_present_section_nr(pnum_begin, pnum) {
+		if (pnum >= pnum_end)
+			break;
+
+		map = sparse_mem_map_populate(pnum, nid, NULL);
+		if (!map) {
+			pr_err("%s: node[%d] memory map backing failed. Some memory will not be available.",
+			       __func__, nid);
+			pnum_begin = pnum;
+			goto failed;
+		}
+		check_usemap_section_nr(nid, usemap);
+		sparse_init_one_section(__nr_to_section(pnum), pnum, map, usemap);
+		usemap += usemap_longs;
+	}
+	sparse_buffer_fini();
+	return;
+failed:
+	/* We failed to allocate, mark all the following pnums as not present */
+	for_each_present_section_nr(pnum_begin, pnum) {
+		struct mem_section *ms;
+
+		if (pnum >= pnum_end)
+			break;
+		ms = __nr_to_section(pnum);
+		ms->section_mem_map = 0;
+	}
+}
+
+/*
+ * Allocate the accumulated non-linear sections, allocate a mem_map
+ * for each and record the physical to section mapping.
+ */
+void __init new_sparse_init(void)
+{
+	unsigned long pnum_begin = first_present_section_nr();
+	int nid_begin = sparse_early_nid(__nr_to_section(pnum_begin));
+	unsigned long pnum_end, map_count = 1;
+
+	/* Setup pageblock_order for HUGETLB_PAGE_SIZE_VARIABLE */
+	set_pageblock_order();
+
+	for_each_present_section_nr(pnum_begin + 1, pnum_end) {
+		int nid = sparse_early_nid(__nr_to_section(pnum_end));
+
+		if (nid == nid_begin) {
+			map_count++;
+			continue;
+		}
+		/* Init node with sections in range [pnum_begin, pnum_end) */
+		sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count);
+		nid_begin = nid;
+		pnum_begin = pnum_end;
+		map_count = 1;
+	}
+	/* cover the last node */
+	sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count);
+	vmemmap_populate_print_last();
+}
+
 #ifdef CONFIG_MEMORY_HOTPLUG
 
 /* Mark all memory sections within the pfn range as online */
-- 
2.18.0

  parent reply	other threads:[~2018-07-12 20:38 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-12 20:37 [PATCH v5 0/5] sparse_init rewrite Pavel Tatashin
2018-07-12 20:37 ` [PATCH v5 1/5] mm/sparse: abstract sparse buffer allocations Pavel Tatashin
2018-07-12 22:45   ` Andrew Morton
2018-07-13 13:17   ` Oscar Salvador
2018-07-13 13:24     ` Pavel Tatashin
2018-07-13 20:02       ` Andrew Morton
2018-07-12 20:37 ` [PATCH v5 2/5] mm/sparse: use the new sparse buffer functions in non-vmemmap Pavel Tatashin
2018-07-12 20:37 ` [PATCH v5 3/5] mm/sparse: move buffer init/fini to the common place Pavel Tatashin
2018-07-12 20:37 ` Pavel Tatashin [this message]
2018-07-13 12:03   ` [PATCH v5 4/5] mm/sparse: add new sparse_init_nid() and sparse_init() Oscar Salvador
2018-07-13 12:37     ` Pavel Tatashin
2018-07-12 20:37 ` [PATCH v5 5/5] mm/sparse: delete old sprase_init and enable new one Pavel Tatashin
2018-07-13  9:09   ` Oscar Salvador
2018-07-13 11:15     ` Pavel Tatashin
2018-07-13  9:59 ` [PATCH v5 0/5] sparse_init rewrite Oscar Salvador
2018-07-13 11:10   ` Pavel Tatashin
2018-07-16  6:40     ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180712203730.8703-5-pasha.tatashin@oracle.com \
    --to=pasha.tatashin@oracle.com \
    --cc=abdhalee@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=dave.hansen@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jack@suse.cz \
    --cc=jglisse@redhat.com \
    --cc=jrdr.linux@gmail.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=osalvador@techadventures.net \
    --cc=richard.weiyang@gmail.com \
    --cc=rientjes@google.com \
    --cc=steven.sistare@oracle.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox