From: Michal Hocko <mhocko@kernel.org>
To: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: linux-kernel@vger.kernel.org, sparclinux@vger.kernel.org,
linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org,
linux-s390@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
x86@kernel.org, kasan-dev@googlegroups.com,
borntraeger@de.ibm.com, heiko.carstens@de.ibm.com,
davem@davemloft.net, willy@infradead.org,
ard.biesheuvel@linaro.org, mark.rutland@arm.com,
will.deacon@arm.com, catalin.marinas@arm.com, sam@ravnborg.org,
mgorman@techsingularity.net, steven.sistare@oracle.com,
daniel.m.jordan@oracle.com, bob.picco@oracle.com
Subject: Re: [PATCH v11 5/9] mm: zero reserved and unavailable struct pages
Date: Tue, 10 Oct 2017 15:44:41 +0200 [thread overview]
Message-ID: <20171010134441.pjemi7ytaqcfm372@dhcp22.suse.cz> (raw)
In-Reply-To: <20171009221931.1481-6-pasha.tatashin@oracle.com>
On Mon 09-10-17 18:19:27, Pavel Tatashin wrote:
> Some memory is reserved but unavailable: not present in memblock.memory
> (because not backed by physical pages), but present in memblock.reserved.
> Such memory has backing struct pages, but they are not initialized by going
> through __init_single_page().
>
> In some cases these struct pages are accessed even if they do not contain
> any data. One example is page_to_pfn() might access page->flags if this is
> where section information is stored (CONFIG_SPARSEMEM,
> SECTION_IN_PAGE_FLAGS).
>
> One example of such memory: trim_low_memory_range() unconditionally
> reserves from pfn 0, but e820__memblock_setup() might provide the exiting
> memory from pfn 1 (i.e. KVM).
>
> Since, struct pages are zeroed in __init_single_page(), and not during
> allocation time, we must zero such struct pages explicitly.
>
> The patch involves adding a new memblock iterator:
> for_each_resv_unavail_range(i, p_start, p_end)
>
> Which iterates through reserved && !memory lists, and we zero struct pages
> explicitly by calling mm_zero_struct_page().
>
> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
> Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
> Reviewed-by: Bob Picco <bob.picco@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
> ---
> include/linux/memblock.h | 16 ++++++++++++++++
> include/linux/mm.h | 15 +++++++++++++++
> mm/page_alloc.c | 38 ++++++++++++++++++++++++++++++++++++++
> 3 files changed, 69 insertions(+)
>
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index bae11c7e7bf3..ce8bfa5f3e9b 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -237,6 +237,22 @@ unsigned long memblock_next_valid_pfn(unsigned long pfn, unsigned long max_pfn);
> for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved, \
> nid, flags, p_start, p_end, p_nid)
>
> +/**
> + * for_each_resv_unavail_range - iterate through reserved and unavailable memory
> + * @i: u64 used as loop variable
> + * @flags: pick from blocks based on memory attributes
> + * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
> + * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
> + *
> + * Walks over unavailable but reserved (reserved && !memory) areas of memblock.
> + * Available as soon as memblock is initialized.
> + * Note: because this memory does not belong to any physical node, flags and
> + * nid arguments do not make sense and thus not exported as arguments.
> + */
> +#define for_each_resv_unavail_range(i, p_start, p_end) \
> + for_each_mem_range(i, &memblock.reserved, &memblock.memory, \
> + NUMA_NO_NODE, MEMBLOCK_NONE, p_start, p_end, NULL)
> +
> static inline void memblock_set_region_flags(struct memblock_region *r,
> unsigned long flags)
> {
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 065d99deb847..04c8b2e5aff4 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -94,6 +94,15 @@ extern int mmap_rnd_compat_bits __read_mostly;
> #define mm_forbids_zeropage(X) (0)
> #endif
>
> +/*
> + * On some architectures it is expensive to call memset() for small sizes.
> + * Those architectures should provide their own implementation of "struct page"
> + * zeroing by defining this macro in <asm/pgtable.h>.
> + */
> +#ifndef mm_zero_struct_page
> +#define mm_zero_struct_page(pp) ((void)memset((pp), 0, sizeof(struct page)))
> +#endif
> +
> /*
> * Default maximum number of active map areas, this limits the number of vmas
> * per mm struct. Users can overwrite this number by sysctl but there is a
> @@ -2001,6 +2010,12 @@ extern int __meminit __early_pfn_to_nid(unsigned long pfn,
> struct mminit_pfnnid_cache *state);
> #endif
>
> +#ifdef CONFIG_HAVE_MEMBLOCK
> +void zero_resv_unavail(void);
> +#else
> +static inline void zero_resv_unavail(void) {}
> +#endif
> +
> extern void set_dma_reserve(unsigned long new_dma_reserve);
> extern void memmap_init_zone(unsigned long, int, unsigned long,
> unsigned long, enum memmap_context);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 20b0bace2235..5f0013bbbe9d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6209,6 +6209,42 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
> free_area_init_core(pgdat);
> }
>
> +#ifdef CONFIG_HAVE_MEMBLOCK
> +/*
> + * Only struct pages that are backed by physical memory are zeroed and
> + * initialized by going through __init_single_page(). But, there are some
> + * struct pages which are reserved in memblock allocator and their fields
> + * may be accessed (for example page_to_pfn() on some configuration accesses
> + * flags). We must explicitly zero those struct pages.
> + */
> +void __paginginit zero_resv_unavail(void)
> +{
> + phys_addr_t start, end;
> + unsigned long pfn;
> + u64 i, pgcnt;
> +
> + /* Loop through ranges that are reserved, but do not have reported
> + * physical memory backing.
> + */
> + pgcnt = 0;
> + for_each_resv_unavail_range(i, &start, &end) {
> + for (pfn = PFN_DOWN(start); pfn < PFN_UP(end); pfn++) {
> + mm_zero_struct_page(pfn_to_page(pfn));
> + pgcnt++;
> + }
> + }
> +
> + /*
> + * Struct pages that do not have backing memory. This could be because
> + * firmware is using some of this memory, or for some other reasons.
> + * Once memblock is changed so such behaviour is not allowed: i.e.
> + * list of "reserved" memory must be a subset of list of "memory", then
> + * this code can be removed.
> + */
> + pr_info("Reserved but unavailable: %lld pages", pgcnt);
> +}
> +#endif /* CONFIG_HAVE_MEMBLOCK */
> +
> #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>
> #if MAX_NUMNODES > 1
> @@ -6632,6 +6668,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
> node_set_state(nid, N_MEMORY);
> check_for_memory(pgdat, nid);
> }
> + zero_resv_unavail();
> }
>
> static int __init cmdline_parse_core(char *p, unsigned long *core)
> @@ -6795,6 +6832,7 @@ void __init free_area_init(unsigned long *zones_size)
> {
> free_area_init_node(0, zones_size,
> __pa(PAGE_OFFSET) >> PAGE_SHIFT, NULL);
> + zero_resv_unavail();
> }
>
> static int page_alloc_cpu_dead(unsigned int cpu)
> --
> 2.14.2
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-10-10 13:44 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-09 22:19 [PATCH v11 0/9] complete deferred page initialization Pavel Tatashin
2017-10-09 22:19 ` [PATCH v11 1/9] x86/mm: setting fields in deferred pages Pavel Tatashin
2017-10-09 22:19 ` [PATCH v11 2/9] sparc64/mm: " Pavel Tatashin
2017-10-09 22:19 ` [PATCH v11 3/9] sparc64: simplify vmemmap_populate Pavel Tatashin
2017-10-09 22:19 ` [PATCH v11 4/9] mm: defining memblock_virt_alloc_try_nid_raw Pavel Tatashin
2017-10-09 22:19 ` [PATCH v11 5/9] mm: zero reserved and unavailable struct pages Pavel Tatashin
2017-10-10 13:44 ` Michal Hocko [this message]
2017-10-10 14:09 ` Michal Hocko
2017-10-10 14:30 ` Pavel Tatashin
2017-10-09 22:19 ` [PATCH v11 6/9] x86/kasan: add and use kasan_map_populate() Pavel Tatashin
2017-10-09 22:19 ` [PATCH v11 7/9] arm64/kasan: " Pavel Tatashin
2017-10-10 15:56 ` Will Deacon
2017-10-10 17:07 ` Pavel Tatashin
2017-10-10 17:10 ` Will Deacon
2017-10-10 17:41 ` Pavel Tatashin
2017-10-13 14:10 ` Pavel Tatashin
2017-10-13 14:43 ` Will Deacon
2017-10-13 14:56 ` Mark Rutland
2017-10-13 15:02 ` Pavel Tatashin
2017-10-13 15:09 ` Pavel Tatashin
2017-10-13 15:34 ` Pavel Tatashin
2017-10-13 15:44 ` Will Deacon
2017-10-13 15:54 ` Pavel Tatashin
2017-10-13 16:00 ` Pavel Tatashin
2017-10-13 16:18 ` Will Deacon
2017-10-09 22:19 ` [PATCH v11 8/9] mm: stop zeroing memory during allocation in vmemmap Pavel Tatashin
2017-10-09 22:19 ` [PATCH v11 9/9] sparc64: optimized struct page zeroing Pavel Tatashin
2017-10-10 14:15 ` [PATCH v11 0/9] complete deferred page initialization Michal Hocko
2017-10-10 17:19 ` Pavel Tatashin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171010134441.pjemi7ytaqcfm372@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=ard.biesheuvel@linaro.org \
--cc=bob.picco@oracle.com \
--cc=borntraeger@de.ibm.com \
--cc=catalin.marinas@arm.com \
--cc=daniel.m.jordan@oracle.com \
--cc=davem@davemloft.net \
--cc=heiko.carstens@de.ibm.com \
--cc=kasan-dev@googlegroups.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-s390@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mark.rutland@arm.com \
--cc=mgorman@techsingularity.net \
--cc=pasha.tatashin@oracle.com \
--cc=sam@ravnborg.org \
--cc=sparclinux@vger.kernel.org \
--cc=steven.sistare@oracle.com \
--cc=will.deacon@arm.com \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox