From: John Hubbard <jhubbard@nvidia.com>
To: Jens Axboe <axboe@kernel.dk>,
David Hildenbrand <david@redhat.com>,
David Howells <dhowells@redhat.com>
Cc: Linux-MM <linux-mm@kvack.org>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Subject: Re: [PATCH] mm: move FOLL_PIN debug accounting under CONFIG_DEBUG_VM
Date: Tue, 31 Jan 2023 09:47:58 -0800 [thread overview]
Message-ID: <b7ac83d6-bcea-c942-1502-fc876957c78e@nvidia.com> (raw)
In-Reply-To: <54b0b07a-c178-9ffe-b5af-088f3c21696c@kernel.dk>
On 1/31/23 07:36, Jens Axboe wrote:
> Using FOLL_PIN for mapping user pages caused a performance regression of
> about 2.7%. Looking at profiles, we see:
>
> +2.71% [kernel.vmlinux] [k] mod_node_page_state
>
> which wasn't there before. The node page state counters are percpu, but
> with a very low threshold. On my setup, every 108th update ends up
> needing to punt to two atomic_lond_add()'s, which is causing this above
> regression.
>
> As these counters are purely for debug purposes, move them under
> CONFIG_DEBUG_VM rather than do them unconditionally.
>
> Fixes: fd20d0c1852e ("block: convert bio_map_user_iov to use iov_iter_extract_pages")
> Fixes: 920756a3306a ("block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages")
> Link: https://lore.kernel.org/linux-block/f57ee72f-38e9-6afa-182f-2794638eadcb@kernel.dk/
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>
Yes, that's the exact right fix (in the absence of some magical
high-volume, high performance per-cpu counters anyway). In fact,
originally these counter operations were behind CONFIG_DEBUG_VM in an
early patchset, but the FOLL_PIN feature was new and potentially flaky,
and also not yet in the Direct IO fast path. So during code review we
decided to enable the debugging information unconditionally.
But now we can no longer afford the debug counters in release
builds--but we also no longer need them, because the FOLL_PIN feature
has settled in and had enough soak time to be more confident about it.
Over time, I'd kind of started assuming that these counters were
necessary for release builds, and it required someone else to wake me up
and point out the obvious, so thanks! :)
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
--
John Hubbard
NVIDIA
> ---
>
> I added fixes tags, even though it's not a strict fix for this commits.
> But it does fix a performance regression introduced by those commits.
> It's a useful hint for backporting.
>
> I'd prefer sticking this at the end of the iov-extract series that
> is already pulled in, so it can go with those patches.
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index cd28a100d9e4..0153ec8a54ae 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -195,8 +195,10 @@ enum node_stat_item {
> NR_WRITTEN, /* page writings since bootup */
> NR_THROTTLED_WRITTEN, /* NR_WRITTEN while reclaim throttled */
> NR_KERNEL_MISC_RECLAIMABLE, /* reclaimable non-slab kernel pages */
> +#ifdef CONFIG_DEBUG_VM
> NR_FOLL_PIN_ACQUIRED, /* via: pin_user_page(), gup flag: FOLL_PIN */
> NR_FOLL_PIN_RELEASED, /* pages returned via unpin_user_page() */
> +#endif
> NR_KERNEL_STACK_KB, /* measured in KiB */
> #if IS_ENABLED(CONFIG_SHADOW_CALL_STACK)
> NR_KERNEL_SCS_KB, /* measured in KiB */
> diff --git a/mm/gup.c b/mm/gup.c
> index f45a3a5be53a..41abb16286ec 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -168,7 +168,9 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
> */
> smp_mb__after_atomic();
>
> +#ifdef CONFIG_DEBUG_VM
> node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs);
> +#endif
>
> return folio;
> }
> @@ -180,7 +182,9 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
> static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
> {
> if (flags & FOLL_PIN) {
> +#ifdef CONFIG_DEBUG_VM
> node_stat_mod_folio(folio, NR_FOLL_PIN_RELEASED, refs);
> +#endif
> if (folio_test_large(folio))
> atomic_sub(refs, folio_pincount_ptr(folio));
> else
> @@ -236,8 +240,9 @@ int __must_check try_grab_page(struct page *page, unsigned int flags)
> } else {
> folio_ref_add(folio, GUP_PIN_COUNTING_BIAS);
> }
> -
> +#ifdef CONFIG_DEBUG_VM
> node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1);
> +#endif
> }
>
> return 0;
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 1ea6a5ce1c41..5cbd9a1924bf 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1227,8 +1227,10 @@ const char * const vmstat_text[] = {
> "nr_written",
> "nr_throttled_written",
> "nr_kernel_misc_reclaimable",
> +#ifdef CONFIG_DEBUG_VM
> "nr_foll_pin_acquired",
> "nr_foll_pin_released",
> +#endif
> "nr_kernel_stack",
> #if IS_ENABLED(CONFIG_SHADOW_CALL_STACK)
> "nr_shadow_call_stack",
>
prev parent reply other threads:[~2023-01-31 17:48 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-31 15:36 Jens Axboe
2023-01-31 15:48 ` David Hildenbrand
2023-01-31 15:50 ` Jens Axboe
2023-01-31 17:47 ` John Hubbard [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b7ac83d6-bcea-c942-1502-fc876957c78e@nvidia.com \
--to=jhubbard@nvidia.com \
--cc=axboe@kernel.dk \
--cc=david@redhat.com \
--cc=dhowells@redhat.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox