linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Pasha Tatashin <pasha.tatashin@soleen.com>,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-cxl@vger.kernel.org,
	cerasuolodomenico@gmail.com, hannes@cmpxchg.org,
	j.granados@samsung.com, lizhijian@fujitsu.com,
	muchun.song@linux.dev, nphamcs@gmail.com, rientjes@google.com,
	rppt@kernel.org, souravpanda@google.com, vbabka@suse.cz,
	willy@infradead.org, dan.j.williams@intel.com,
	yi.zhang@redhat.com, alison.schofield@intel.com,
	yosryahmed@google.com
Subject: Re: [PATCH v4 3/3] mm: don't account memmap per-node
Date: Fri, 9 Aug 2024 09:31:06 +0200	[thread overview]
Message-ID: <b13cf452-4573-4202-8178-26a33e9f2185@redhat.com> (raw)
In-Reply-To: <20240808213437.682006-4-pasha.tatashin@soleen.com>

On 08.08.24 23:34, Pasha Tatashin wrote:
> Fix invalid access to pgdat during hot-remove operation:
> ndctl users reported a GPF when trying to destroy a namespace:
> $ ndctl destroy-namespace all -r all -f
>   Segmentation fault
>   dmesg:
>   Oops: general protection fault, probably for
>   non-canonical address 0xdffffc0000005650: 0000 [#1] PREEMPT SMP KASAN
>   PTI
>   KASAN: probably user-memory-access in range
>   [0x000000000002b280-0x000000000002b287]
>   CPU: 26 UID: 0 PID: 1868 Comm: ndctl Not tainted 6.11.0-rc1 #1
>   Hardware name: Dell Inc. PowerEdge R640/08HT8T, BIOS
>   2.20.1 09/13/2023
>   RIP: 0010:mod_node_page_state+0x2a/0x110
> 
> cxl-test users report a GPF when trying to unload the test module:
> $ modrpobe -r cxl-test
>   dmesg
>   BUG: unable to handle page fault for address: 0000000000004200
>   #PF: supervisor read access in kernel mode
>   #PF: error_code(0x0000) - not-present page
>   PGD 0 P4D 0
>   Oops: Oops: 0000 [#1] PREEMPT SMP PTI
>   CPU: 0 UID: 0 PID: 1076 Comm: modprobe Tainted: G O N 6.11.0-rc1 #197
>   Tainted: [O]=OOT_MODULE, [N]=TEST
>   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/15
>   RIP: 0010:mod_node_page_state+0x6/0x90
> 
> Currently, when memory is hot-plugged or hot-removed the accounting is
> done based on the assumption that memmap is allocated from the same node
> as the hot-plugged/hot-removed memory, which is not always the case.
> 
> In addition, there are challenges with keeping the node id of the memory
> that is being remove to the time when memmap accounting is actually
> performed: since this is done after remove_pfn_range_from_zone(), and
> also after remove_memory_block_devices(). Meaning that we cannot use
> pgdat nor walking though memblocks to get the nid.
> 
> Given all of that, account the memmap overhead system wide instead.
> 
> For this we are going to be using global atomic counters, but given that
> memmap size is rarely modified, and normally is only modified either
> during early boot when there is only one CPU, or under a hotplug global
> mutex lock, therefore there is no need for per-cpu optimizations.
> 
> Also, while we are here rename nr_memmap to nr_memmap_pages, and
> nr_memmap_boot to nr_memmap_boot_pages to be self explanatory that the
> units are in page count.
> 
> Reported-by: Yi Zhang <yi.zhang@redhat.com>
> Closes: https://lore.kernel.org/linux-cxl/CAHj4cs9Ax1=CoJkgBGP_+sNu6-6=6v=_L-ZBZY0bVLD3wUWZQg@mail.gmail.com
> Reported-by: Alison Schofield <alison.schofield@intel.com>
> Closes: https://lore.kernel.org/linux-mm/Zq0tPd2h6alFz8XF@aschofie-mobl2/#t
> 
> Fixes: 15995a352474 ("mm: report per-page metadata information")
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> Tested-by: Dan Williams <dan.j.williams@intel.com>
> ---

[...]

In general

Acked-by: David Hildenbrand <david@redhat.com>

Two nits below:


>   static void free_map_bootmem(struct page *memmap)
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 6f8aa4766f16..ad82c1bf0e63 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1033,6 +1033,23 @@ unsigned long node_page_state(struct pglist_data *pgdat,
>   }
>   #endif
>   
> +/*
> + * Count number of pages "struct page" and "struct page_ext" consume.
> + * nr_memmap_boot: # of pages allocated by boot allocator & not part of MemTotal
> + * nr_memmap: # of pages that were allocated by buddy allocator
> + */
> +static atomic_long_t nr_memmap_boot, nr_memmap;

I *think* the clean and portable way to do it is use ATOMIC_INIT(0) for 
both. [even though what you have likely works on all archs]

> +
> +void mod_memmap_boot(long delta)
> +{
> +	atomic_long_add(delta, &nr_memmap_boot);
> +}
> +
> +void mod_memmap(long delta)
> +{
> +	atomic_long_add(delta, &nr_memmap);
> +}
> +

Nit picking: (up to you)

I'd do it similar to totalram_pages_add():

memmap_pages_add()
memmap_boot_pages_add()

And call the variables something like

static atomic_long_t memmap_pages_boot, memmap_pages;


-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-08-09  7:31 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-08 21:34 [PATCH v4 0/3] Fixes for memmap accounting Pasha Tatashin
2024-08-08 21:34 ` [PATCH v4 1/3] mm: don't account memmap on failure Pasha Tatashin
2024-08-08 21:34 ` [PATCH v4 2/3] mm: add system wide stats items category Pasha Tatashin
2024-08-09  7:24   ` David Hildenbrand
2024-08-08 21:34 ` [PATCH v4 3/3] mm: don't account memmap per-node Pasha Tatashin
2024-08-09  7:31   ` David Hildenbrand [this message]
2024-08-09 18:09     ` Pasha Tatashin
2024-08-08 22:34 ` [PATCH v4 0/3] Fixes for memmap accounting Alison Schofield

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b13cf452-4573-4202-8178-26a33e9f2185@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alison.schofield@intel.com \
    --cc=cerasuolodomenico@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=j.granados@samsung.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizhijian@fujitsu.com \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=rientjes@google.com \
    --cc=rppt@kernel.org \
    --cc=souravpanda@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yi.zhang@redhat.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox