From: Pingfan Liu <kernelfans@gmail.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: linux-kernel@vger.kernel.org,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
"H. Peter Anvin" <hpa@zytor.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Andy Lutomirski <luto@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
Len Brown <lenb@kernel.org>, Yinghai Lu <yinghai@kernel.org>,
Tejun Heo <tj@kernel.org>, Chao Fan <fanc.fnst@cn.fujitsu.com>,
Baoquan He <bhe@redhat.com>, Juergen Gross <jgross@suse.com>,
Andrew Morton <akpm@linux-foundation.org>,
Mike Rapoport <rppt@linux.vnet.ibm.com>,
Vlastimil Babka <vbabka@suse.cz>, Michal Hocko <mhocko@suse.com>,
x86@kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCHv2 0/7] x86_64/mm: remove bottom-up allocation style by pushing forward the parsing of mem hotplug info
Date: Tue, 15 Jan 2019 14:06:18 +0800 [thread overview]
Message-ID: <CAFgQCTtsw9xj3M85HU2GBk5iPSF4h_H43do-rfpXMo8svmgoJg@mail.gmail.com> (raw)
In-Reply-To: <fe88d6ff-00e1-b65d-f411-64b03227bd17@intel.com>
On Tue, Jan 15, 2019 at 7:02 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 1/10/19 9:12 PM, Pingfan Liu wrote:
> > Background
> > When kaslr kernel can be guaranteed to sit inside unmovable node
> > after [1].
>
> What does this "[1]" refer to?
>
https://lore.kernel.org/patchwork/patch/1029376/
> Also, can you clarify your terminology here a bit. By "kaslr kernel",
> do you mean the base address?
>
It should be the randomization of load address. Googled, and found out
that it is "base address".
> > But if kaslr kernel is located near the end of the movable node,
> > then bottom-up allocator may create pagetable which crosses the boundary
> > between unmovable node and movable node.
>
> Again, I'm confused. Do you literally mean a single page table page? I
> think you mean the page tables, but it would be nice to clarify this,
> and also explicitly state which page tables these are.
>
It should be page table pages. The page table is built by init_mem_mapping().
> > It is a probability issue,
> > two factors include -1. how big the gap between kernel end and
> > unmovable node's end. -2. how many memory does the system own.
> > Alternative way to fix this issue is by increasing the gap by
> > boot/compressed/kaslr*.
>
> Oh, you mean the KASLR code in arch/x86/boot/compressed/kaslr*.[ch]?
>
Sorry, and yes, code in arch/x86/boot/compressed/kaslr_64.c and kaslr.c
> It took me a minute to figure out you were talking about filenames.
>
> > But taking the scenario of PB level memory, the pagetable will take
> > server MB even if using 1GB page, different page attr and fragment
> > will make things worse. So it is hard to decide how much should the
> > gap increase.
> I'm not following this. If we move the image around, we leave holes.
> Why do we need page table pages allocated to cover these holes?
>
I means in arch/x86/boot/compressed/kaslr.c, store_slot_info() {
slot_area.num = (region->size - image_size) /CONFIG_PHYSICAL_ALIGN + 1
}. Let us denote the size of page table as "X", then the formula is
changed to slot_area.num = (region->size - image_size -X)
/CONFIG_PHYSICAL_ALIGN + 1. And it is hard to decide X due to the
above factors.
> > The following figure show the defection of current bottom-up style:
> > [startA, endA][startB, "kaslr kernel verly close to" endB][startC, endC]
>
> "defection"?
>
Oh, defect.
> > If nodeA,B is unmovable, while nodeC is movable, then init_mem_mapping()
> > can generate pgtable on nodeC, which stain movable node.
>
> Let me see if I can summarize this:
> 1. The kernel ASLR decompression code picks a spot to place the kernel
> image in physical memory.
> 2. Some page tables are dynamically allocated near (after) this spot.
> 3. Sometimes, based on the random ASLR location, these page tables fall
> over into the "movable node" area. Being unmovable allocations, this
> is not cool.
> 4. To fix this (on 64-bit at least), we stop allocating page tables
> based on the location of the kernel image. Instead, we allocate
> using the memblock allocator itself, which knows how to avoid the
> movable node.
>
Yes, you get my idea exactly. Thanks for your help to summary it. Hard
for me to express it clearly in English.
> > This patch makes it certainty instead of a probablity problem. It achieves
> > this by pushing forward the parsing of mem hotplug info ahead of init_mem_mapping().
>
> What does memory hotplug have to do with this? I thought this was all
> about early boot.
Put the info about memory hot plugable to memblock allocator,
initmem_init()->...->acpi_numa_memory_affinity_init(), where
memblock_mark_hotplug() does it. Later when memory allocator works, in
__next_mem_range(), it will check this info by
memblock_is_hotpluggable().
Thanks and regards,
Pingfan
next prev parent reply other threads:[~2019-01-15 6:06 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-11 5:12 Pingfan Liu
2019-01-11 5:12 ` [PATCHv2 1/7] x86/mm: concentrate the code to memblock allocator enabled Pingfan Liu
2019-01-11 6:12 ` Chao Fan
2019-01-11 6:12 ` Chao Fan
2019-01-11 10:06 ` Pingfan Liu
2019-01-11 10:06 ` Pingfan Liu
2019-01-14 23:07 ` Dave Hansen
2019-01-15 7:06 ` Pingfan Liu
2019-01-15 7:06 ` Pingfan Liu
2019-01-11 5:12 ` [PATCHv2 2/7] acpi: change the topo of acpi_table_upgrade() Pingfan Liu
2019-01-11 5:30 ` Chao Fan
2019-01-11 5:30 ` Chao Fan
2019-01-11 10:08 ` Pingfan Liu
2019-01-11 10:08 ` Pingfan Liu
2019-01-14 23:12 ` Dave Hansen
2019-01-15 7:28 ` Pingfan Liu
2019-01-15 7:28 ` Pingfan Liu
2019-01-11 5:12 ` [PATCHv2 3/7] mm/memblock: introduce allocation boundary for tracing purpose Pingfan Liu
2019-01-14 7:51 ` Mike Rapoport
2019-01-14 8:33 ` Pingfan Liu
2019-01-14 8:33 ` Pingfan Liu
2019-01-14 8:50 ` Mike Rapoport
2019-01-14 9:13 ` Pingfan Liu
2019-01-14 9:13 ` Pingfan Liu
2019-01-11 5:12 ` [PATCHv2 4/7] x86/setup: parse acpi to get hotplug info before init_mem_mapping() Pingfan Liu
2019-01-11 5:12 ` [PATCHv2 5/7] x86/mm: set allowed range for memblock allocator Pingfan Liu
2019-01-11 5:12 ` [PATCHv2 6/7] x86/mm: remove bottom-up allocation style for x86_64 Pingfan Liu
2019-01-14 23:27 ` Dave Hansen
2019-01-15 7:38 ` Pingfan Liu
2019-01-15 7:38 ` Pingfan Liu
2019-01-11 5:12 ` [PATCHv2 7/7] x86/mm: isolate the bottom-up style to init_32.c Pingfan Liu
2019-01-14 23:02 ` [PATCHv2 0/7] x86_64/mm: remove bottom-up allocation style by pushing forward the parsing of mem hotplug info Dave Hansen
2019-01-15 6:06 ` Pingfan Liu [this message]
2019-01-15 6:06 ` Pingfan Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAFgQCTtsw9xj3M85HU2GBk5iPSF4h_H43do-rfpXMo8svmgoJg@mail.gmail.com \
--to=kernelfans@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=bp@alien8.de \
--cc=dave.hansen@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=fanc.fnst@cn.fujitsu.com \
--cc=hpa@zytor.com \
--cc=jgross@suse.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rjw@rjwysocki.net \
--cc=rppt@linux.vnet.ibm.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
--cc=x86@kernel.org \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox