linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yunsheng Lin <linyunsheng@huawei.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: <davem@davemloft.net>, <kuba@kernel.org>, <pabeni@redhat.com>,
	<netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	Shuah Khan <skhan@linuxfoundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH net-next v2 00/10] Replace page_frag with page_frag_cache (Part-2)
Date: Wed, 11 Dec 2024 20:52:15 +0800	[thread overview]
Message-ID: <389876b8-e565-4dc9-bc87-d97a639ff585@huawei.com> (raw)
In-Reply-To: <CAKgT0Uf7V+wMa7zz+9j9gwHC+hia3OwL_bo_O-yhn4=Xh0WadA@mail.gmail.com>

On 2024/12/10 23:58, Alexander Duyck wrote:

> 
> I'm not sure perf stat will tell us much as it is really too high
> level to give us much in the way of details. I would be more
> interested in the output from perf record -g followed by a perf
> report, or maybe even just a snapshot from perf top while the test is
> running. That should show us where the CPU is spending most of its
> time and what areas are hot in the before and after graphs.

It seems the bottleneck is in the freeing side that page_frag_free()
function took up to about 50% cpu for non-aligned API and 16% cpu
for aligned API in the push CPU using 'perf top'.

Using the below patch cause the page_frag_free() to disappear in the
push CPU  of 'perf top', new performance data is below:
Without patch 1:
 Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=0 test_pop_cpu=1 test_alloc_len=12 nr_test=51200000' (20 runs):

         21.084113      task-clock (msec)         #    0.008 CPUs utilized            ( +-  1.59% )
                 7      context-switches          #    0.334 K/sec                    ( +-  1.25% )
                 1      cpu-migrations            #    0.031 K/sec                    ( +- 20.20% )
                78      page-faults               #    0.004 M/sec                    ( +-  0.26% )
          54748233      cycles                    #    2.597 GHz                      ( +-  1.59% )
          61637051      instructions              #    1.13  insn per cycle           ( +-  0.13% )
          14727268      branches                  #  698.501 M/sec                    ( +-  0.11% )
             20178      branch-misses             #    0.14% of all branches          ( +-  0.94% )

       2.637345524 seconds time elapsed                                          ( +-  0.19% )

 Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=0 test_pop_cpu=1 test_alloc_len=12 nr_test=51200000 test_align=1' (20 runs):

         19.669259      task-clock (msec)         #    0.009 CPUs utilized            ( +-  2.91% )
                 7      context-switches          #    0.356 K/sec                    ( +-  1.04% )
                 0      cpu-migrations            #    0.005 K/sec                    ( +- 68.82% )
                77      page-faults               #    0.004 M/sec                    ( +-  0.27% )
          51077447      cycles                    #    2.597 GHz                      ( +-  2.91% )
          58875368      instructions              #    1.15  insn per cycle           ( +-  4.47% )
          14040015      branches                  #  713.805 M/sec                    ( +-  4.68% )
             20150      branch-misses             #    0.14% of all branches          ( +-  0.64% )

       2.226539190 seconds time elapsed                                          ( +-  0.12% )

With patch 1:
 Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=0 test_pop_cpu=1 test_alloc_len=12 nr_test=51200000' (20 runs):

         20.782788      task-clock (msec)         #    0.008 CPUs utilized            ( +-  0.09% )
                 7      context-switches          #    0.342 K/sec                    ( +-  0.97% )
                 1      cpu-migrations            #    0.031 K/sec                    ( +- 16.83% )
                78      page-faults               #    0.004 M/sec                    ( +-  0.31% )
          53967333      cycles                    #    2.597 GHz                      ( +-  0.08% )
          61577257      instructions              #    1.14  insn per cycle           ( +-  0.02% )
          14712140      branches                  #  707.900 M/sec                    ( +-  0.02% )
             20234      branch-misses             #    0.14% of all branches          ( +-  0.55% )

       2.677974457 seconds time elapsed                                          ( +-  0.15% )

root@(none):/home# perf stat -r 20 insmod ./page_frag_test.ko test_push_cpu=0 test_pop_cpu=1 test_alloc_len=12 nr_test=51200000 test_align=1

insmod: can't insert './page_frag_test.ko': Resource temporarily unavailable

 Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=0 test_pop_cpu=1 test_alloc_len=12 nr_test=51200000 test_align=1' (20 runs):

         20.420537      task-clock (msec)         #    0.009 CPUs utilized            ( +-  0.05% )
                 7      context-switches          #    0.345 K/sec                    ( +-  0.71% )
                 0      cpu-migrations            #    0.005 K/sec                    ( +-100.00% )
                77      page-faults               #    0.004 M/sec                    ( +-  0.23% )
          53038942      cycles                    #    2.597 GHz                      ( +-  0.05% )
          59965712      instructions              #    1.13  insn per cycle           ( +-  0.03% )
          14372507      branches                  #  703.826 M/sec                    ( +-  0.03% )
             20580      branch-misses             #    0.14% of all branches          ( +-  0.56% )

       2.287783171 seconds time elapsed                                          ( +-  0.12% )

It seems that bottleneck is still the freeing side that the above
result might not be as meaningful as it should be.

As we can't use more than one cpu for the free side without some
lock using a single ptr_ring, it seems something more complicated
might need to be done in order to support more than one CPU for the
freeing side?

Before patch 1, __page_frag_alloc_align took up to 3.62% percent of
CPU using 'perf top'.
After patch 1, __page_frag_cache_prepare() and __page_frag_cache_commit_noref()
took up to 4.67% + 1.01% = 5.68%.
Having a similar result, I am not sure if the CPU usages is able tell us
the performance degradation here as it seems to be quite large?

@@ -100,13 +100,20 @@ static int page_frag_push_thread(void *arg)
                if (!va)
                        continue;

-               ret = __ptr_ring_produce(ring, va);
-               if (ret) {
+               do {
+                       ret = __ptr_ring_produce(ring, va);
+                       if (!ret) {
+                               va = NULL;
+                               break;
+                       } else {
+                               cond_resched();
+                       }
+               } while (!force_exit);
+
+               if (va)
                        page_frag_free(va);
-                       cond_resched();
-               } else {
+               else
                        test_pushed++;
-               }
        }

        pr_info("page_frag push test thread exits on cpu %d\n",



  reply	other threads:[~2024-12-11 12:52 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-06 12:25 Yunsheng Lin
2024-12-06 12:25 ` [PATCH net-next v2 01/10] mm: page_frag: some minor refactoring before adding new API Yunsheng Lin
2024-12-06 12:25 ` [PATCH net-next v2 02/10] net: rename skb_copy_to_page_nocache() helper Yunsheng Lin
2024-12-06 12:25 ` [PATCH net-next v2 03/10] mm: page_frag: update documentation for page_frag Yunsheng Lin
2024-12-06 12:25 ` [PATCH net-next v2 04/10] mm: page_frag: introduce page_frag_alloc_abort() related API Yunsheng Lin
2024-12-06 12:25 ` [PATCH net-next v2 05/10] mm: page_frag: introduce refill prepare & commit API Yunsheng Lin
2024-12-06 12:25 ` [PATCH net-next v2 06/10] mm: page_frag: introduce alloc_refill " Yunsheng Lin
2024-12-06 12:25 ` [PATCH net-next v2 07/10] mm: page_frag: introduce probe related API Yunsheng Lin
2024-12-06 12:25 ` [PATCH net-next v2 08/10] mm: page_frag: add testing for the newly added API Yunsheng Lin
2024-12-06 12:25 ` [PATCH net-next v2 09/10] net: replace page_frag with page_frag_cache Yunsheng Lin
2024-12-06 12:25 ` [PATCH net-next v2 10/10] mm: page_frag: add an entry in MAINTAINERS for page_frag Yunsheng Lin
2024-12-08 21:34 ` [PATCH net-next v2 00/10] Replace page_frag with page_frag_cache (Part-2) Alexander Duyck
2024-12-09 11:42   ` Yunsheng Lin
2024-12-09 16:03     ` Alexander Duyck
2024-12-10 12:27       ` Yunsheng Lin
2024-12-10 15:58         ` Alexander Duyck
2024-12-11 12:52           ` Yunsheng Lin [this message]
2024-12-13 12:09             ` Yunsheng Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=389876b8-e565-4dc9-bc87-d97a639ff585@huawei.com \
    --to=linyunsheng@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=davem@davemloft.net \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=skhan@linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox