From: Dave Hansen <dave.hansen@intel.com>
To: Baoquan He <bhe@redhat.com>, linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
kirill.shutemov@linux.intel.com, mhocko@suse.com,
tglx@linutronix.de
Subject: Re: [PATCH v2 3/3] mm/sparse: Optimize memmap allocation during sparse_init()
Date: Thu, 22 Feb 2018 14:22:43 -0800 [thread overview]
Message-ID: <34593e3f-879b-cdf9-9dc4-a114e4bfab52@intel.com> (raw)
In-Reply-To: <20180222091130.32165-4-bhe@redhat.com>
First of all, this is a much-improved changelog. Thanks for that!
On 02/22/2018 01:11 AM, Baoquan He wrote:
> In sparse_init(), two temporary pointer arrays, usemap_map and map_map
> are allocated with the size of NR_MEM_SECTIONS. They are used to store
> each memory section's usemap and mem map if marked as present. With
> the help of these two arrays, continuous memory chunk is allocated for
> usemap and memmap for memory sections on one node. This avoids too many
> memory fragmentations. Like below diagram, '1' indicates the present
> memory section, '0' means absent one. The number 'n' could be much
> smaller than NR_MEM_SECTIONS on most of systems.
>
> |1|1|1|1|0|0|0|0|1|1|0|0|...|1|0||1|0|...|1||0|1|...|0|
> -------------------------------------------------------
> 0 1 2 3 4 5 i i+1 n-1 n
>
> If fail to populate the page tables to map one section's memmap, its
> ->section_mem_map will be cleared finally to indicate that it's not present.
> After use, these two arrays will be released at the end of sparse_init().
Let me see if I understand this. tl;dr version of this changelog:
Today, we allocate usemap and mem_map for all sections up front and then
free them later if they are not needed. With 5-level paging, this eats
all memory and we fall over before we can free them. Fix it by only
allocating what we _need_ (nr_present_sections).
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 640e68f8324b..f83723a49e47 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -281,6 +281,7 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
> unsigned long pnum;
> unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
> void *vmemmap_buf_start;
> + int i = 0;
'i' is a criminally negligent variable name for how it is used here.
> size = ALIGN(size, PMD_SIZE);
> vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count,
> @@ -291,14 +292,15 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
> vmemmap_buf_end = vmemmap_buf_start + size * map_count;
> }
>
> - for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> + for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
> struct mem_section *ms;
>
> if (!present_section_nr(pnum))
> continue;
>
> - map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
> - if (map_map[pnum])
> + i++;
> + map_map[i-1] = sparse_mem_map_populate(pnum, nodeid, NULL);
> + if (map_map[i-1])
> continue;
The i-1 stuff here looks pretty funky. Isn't this much more readable?
map_map[i] = sparse_mem_map_populate(pnum, nodeid, NULL);
if (map_map[i]) {
i++;
continue;
}
> diff --git a/mm/sparse.c b/mm/sparse.c
> index e9311b44e28a..aafb6d838872 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -405,6 +405,7 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
> unsigned long pnum;
> unsigned long **usemap_map = (unsigned long **)data;
> int size = usemap_size();
> + int i = 0;
Ditto on the naming. Shouldn't it be nr_consumed_maps or something?
> usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid),
> size * usemap_count);
> @@ -413,12 +414,13 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
> return;
> }
>
> - for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> + for (pnum = pnum_begin; pnum < pnum_end && i < usemap_count; pnum++) {
> if (!present_section_nr(pnum))
> continue;
> - usemap_map[pnum] = usemap;
> + usemap_map[i] = usemap;
> usemap += size;
> - check_usemap_section_nr(nodeid, usemap_map[pnum]);
> + check_usemap_section_nr(nodeid, usemap_map[i]);
> + i++;
> }
> }
How would 'i' ever exceed usemap_count?
Also, are there any other side-effects from changing map_map[] to be
indexed by something other than the section number?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2018-02-22 22:22 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-22 9:11 [PATCH v2 0/3] " Baoquan He
2018-02-22 9:11 ` [PATCH v2 1/3] mm/sparse: Add a static variable nr_present_sections Baoquan He
2018-02-22 21:24 ` Andrew Morton
2018-02-22 23:56 ` Baoquan He
2018-02-22 9:11 ` [PATCH v2 2/3] mm/sparsemem: Defer the ms->section_mem_map clearing Baoquan He
2018-02-22 9:11 ` [PATCH v2 3/3] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
2018-02-22 10:07 ` Pankaj Gupta
2018-02-22 10:39 ` Baoquan He
2018-02-22 22:22 ` Dave Hansen [this message]
2018-02-23 2:38 ` Baoquan He
2018-02-22 9:15 ` [PATCH v2 0/3] " Baoquan He
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=34593e3f-879b-cdf9-9dc4-a114e4bfab52@intel.com \
--to=dave.hansen@intel.com \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox