From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C121EC4167B
	for <linux-mm@archiver.kernel.org>; Mon, 11 Dec 2023 03:38:41 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 38AAE6B0092; Sun, 10 Dec 2023 22:38:41 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 33B3D6B0093; Sun, 10 Dec 2023 22:38:41 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 201E66B0095; Sun, 10 Dec 2023 22:38:41 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15])
	by kanga.kvack.org (Postfix) with ESMTP id 111566B0092
	for <linux-mm@kvack.org>; Sun, 10 Dec 2023 22:38:41 -0500 (EST)
Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay07.hostedemail.com (Postfix) with ESMTP id CC7761605FC
	for <linux-mm@kvack.org>; Mon, 11 Dec 2023 03:38:40 +0000 (UTC)
X-FDA: 81553130400.16.10D3B4D
Received: from mail-ej1-f53.google.com (mail-ej1-f53.google.com [209.85.218.53])
	by imf01.hostedemail.com (Postfix) with ESMTP id 036F140006
	for <linux-mm@kvack.org>; Mon, 11 Dec 2023 03:38:38 +0000 (UTC)
Authentication-Results: imf01.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=lhDkszmH;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (imf01.hostedemail.com: domain of liangchen.linux@gmail.com designates 209.85.218.53 as permitted sender) smtp.mailfrom=liangchen.linux@gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1702265919;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=X7O1esqpdTStbI1MKxaKI+ohkzTfVXxZH1J+nUtT3YM=;
	b=faTOtbHMGhhxx84RPyB7NryIefM+JWptLNxZq28MKRtjBYI7nh2fpQUjvt02vQwOyphqZe
	VJjRUxzy5UyRW6jkHgYz8pjvB9Ul/6HI2dcK/GDB3S0yVKPOstHrshmaJZ7TtgZbJ2Y6IC
	voXtjSV8K9J1jSYuu4lquFuQW+GQX0E=
ARC-Authentication-Results: i=1;
	imf01.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=lhDkszmH;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (imf01.hostedemail.com: domain of liangchen.linux@gmail.com designates 209.85.218.53 as permitted sender) smtp.mailfrom=liangchen.linux@gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702265919; a=rsa-sha256;
	cv=none;
	b=kF8QTT9eketkZg+g8xrdaba69YEOqjB6QaX67q0utpkEr316jpSAOBnHcwSNQF7hlmPFXV
	vmTtoGepBZWhLiTQ0vJKm8EwNYlLEaVsUKZZOjJGOCakizbkhNrI1uRjPHDE5qbHGcnwT0
	IwYuQYZmgwW8SqqPoo7xwycXSv3DbkM=
Received: by mail-ej1-f53.google.com with SMTP id a640c23a62f3a-a1f653e3c3dso360278566b.2
        for <linux-mm@kvack.org>; Sun, 10 Dec 2023 19:38:38 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1702265917; x=1702870717; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=X7O1esqpdTStbI1MKxaKI+ohkzTfVXxZH1J+nUtT3YM=;
        b=lhDkszmHNFC1PtF6qhgVuJgwf/R94kMKvhKKzWLJniWjrjrp47npv96ploARqDsHnr
         pM8vvXTPD7uVR6gMceqU4r2vl3En72MPhuuCA6wqOjagd9tsGub6wgG4cEJPIzh3Qa71
         bLi1VcZctvLnOSPhGy+Z2W/5P1Mo5BgR/5WXWbuundHjt7uLAkW9rMnYNouRV7pmw5uM
         4S4lWgE4YmlqVVUmzOgvSuICzZ6Mf15k2JpYhU7w8RxhlsYeM+4xLe/Y7kSZQrgXvubD
         MEJg4ZpZnqPNdOBJfrcHkEJqjSAp/rLuSEdTEKxysU4NjsG0b9xsI5zU1olgsxjYEGgn
         IS4Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1702265917; x=1702870717;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=X7O1esqpdTStbI1MKxaKI+ohkzTfVXxZH1J+nUtT3YM=;
        b=vlxydSkyuaOs/5WB+gbrsKLGLowppWFRr93RiIBuDmSfK62h5fkq3z6JULjpOVhQCf
         TWKmDVyuXqUZch97b7j6pB28Ipc8Op2Jhvh2LKdDgW3xpAzIF5TOttUSvbjVGa9qU+e6
         GxoYJ17Eoft984v0JpOxXzKiPU+McBVKBV3+805+Q2Gg6Ykfy031KMW+sSZLzu3T1QME
         U0m2vHMfZg4ZUWFzpuP/Q30z+xrvV050oQ2kXT3YhZOQVpEmLe2061VMkeHNxZl0f6AK
         6XuC+KvhBCcrTltonOV2JIqgIXSCBScAAhe5NpVFGFvFwM3XBmBzR/D1pLACJq4526I5
         boeQ==
X-Gm-Message-State: AOJu0Yw+F//Nz2b/Iiw+Hwm3/yU7TkpG8d75U0TzpwqrZde9P5lCRQHf
	wUdug8XXulbrzhG/jNXBQDbH3CZf1plDsiKkJuQ=
X-Google-Smtp-Source: AGHT+IFT//uYgxvOcwEL10srNQCEJ9a/EG4Bf14+fDpxBRIKwhcfRtu58/ge/qc+usbqj5PDostnbVBH/JpPeOgrJbE=
X-Received: by 2002:a17:907:720a:b0:a19:d40a:d252 with SMTP id
 dr10-20020a170907720a00b00a19d40ad252mr1435630ejc.286.1702265917002; Sun, 10
 Dec 2023 19:38:37 -0800 (PST)
MIME-Version: 1.0
References: <20231206105419.27952-1-liangchen.linux@gmail.com>
 <20231206105419.27952-5-liangchen.linux@gmail.com> <CAHS8izNQeSwWQ9NwiDUcPoSX1WONG4JYu2rfpqF3+4xkxE=Wyw@mail.gmail.com>
In-Reply-To: <CAHS8izNQeSwWQ9NwiDUcPoSX1WONG4JYu2rfpqF3+4xkxE=Wyw@mail.gmail.com>
From: Liang Chen <liangchen.linux@gmail.com>
Date: Mon, 11 Dec 2023 11:38:24 +0800
Message-ID: <CAKhg4t+LpF=G0DBhbuRYtxKyTrMiR3pSc15sY42kc57iGQfPmw@mail.gmail.com>
Subject: Re: [PATCH net-next v7 4/4] skbuff: Optimization of SKB coalescing
 for page pool
To: Mina Almasry <almasrymina@google.com>
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, 
	pabeni@redhat.com, hawk@kernel.org, ilias.apalodimas@linaro.org, 
	linyunsheng@huawei.com, netdev@vger.kernel.org, linux-mm@kvack.org, 
	jasowang@redhat.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Queue-Id: 036F140006
X-Rspam-User: 
X-Rspamd-Server: rspam02
X-Stat-Signature: 36g6mjhqau7jhprt6fxmiid9uxj97ga9
X-HE-Tag: 1702265918-842233
X-HE-Meta: U2FsdGVkX189q5lx1F7VSBxdfhLL9oIDX9S15ql7YKrx04XPaGgIEP9mH6Wmzg03MqRWacD6066WMHhK3t0vNRWy5mX6c0pHwSi+UmnZUKnpU7P66jbmZxTbWAzVZnd1ahjrW8M4DIi8d5Njyo4lrgtJVBe1HG4yT0eOys+HnVfUktRYdRp+Jsdx681ZtPMEc6I6cmGEpNQXHWGs05BmlzaxVevzLmv+icjPbyzJBiiu0U6imAq2adyQxb6YsDscFVIj3r6gPF6OKbCqgLJtNOmLeAbLyz0FTlUqIF1/hEiLGN/UYB7rADwLPLBw2PgD/xW1ZMTEFa29XSQaWLVYM04EQJPaiflk/hkpoS95bh67xYXfT3xQMCnPfx1ByFG8SHigzM87zu9DjTpCqB32knO3Ku8gJY6zAqvPsitxPDQQFGhlVdd1wUrr83WqTJmXySrrvaY7XuHNQLbUHYMjV4D2XzD7KrVFV8trDdjO9hBKyRa5fzvQxzBsEmnLZt8+NnMGNhcgnc9XQAVOSPcFj4zQSeLvoLJ3ygBA3Ue6fHo1jM0mvvONWw8UfTzjqZkLVbRbYn/WrT4VhF4mmNjMXyTF+3Hz7qkLt1cWHqg2ulJR9zM9RJUvXNdTuM0zMm8qI5Mjf0dC028I7mdbLKZVgsF42j1Il+5wVJhqqKaegjW2TYZfgdGd6E8FPUW0Xv2aTAJPW27S1OjlM5F9XKfGkpVAizgTEDNoR6uruEV6wUbrGORZpJPprIqSBtvapgGsyTPYvG9nDBKPweVKiEJO1f9s01FClcaJgyAm/q/5C0mdEuZFSKdzsJri92AtImXEXXJBYAk4/Z9n44FS3Wa+tHTMdBGe05TJih4PfGIOI55FRHRGGxbOhrQx3Ez97s8k2P0Tvb9fTw06JkJ66memgq/FLjLj5PGn+qj7sbBQl8kwnLvYV6vnN9evYwxFUncNWpnaE5jIknHzDtO+PFY
 aMK55d5q
 ieDU/zmXZtkAnahXPOeu9N+zO5qBbDPa24JhT97T7Inv3NoV5USxrXM8RQyOT+7UNf2RXw1YQ4Tz76NIz6Ntcjzv7b7p+DdNv7Ouy4M76W2tTTAKHuKF5jC1z5iJmW9BCN3s/GbJmeB1H+iJ0y0mRGez5U1i+sdoXy9OipYmMBkpmX3O9mf+MoT/JibWWuwv+Q1hiARk7n+CB6YnZcA8xP/bleeDJclqNTQLWRP5IWmiYTtpmkZWYUP/GkHeL++dhGmZKZ6eLpyYZ27Ulro/RTyMA4HnwA2hdC+69lojj8xJF1VPFpO0XUnc13/OSHcHg7IwV9c9kScHhCPCu/BW5CBLXuD/K6LkXwYF83N9tDx/pFdQyToQY47A7kx5obzTobBDd2bUZ/mslxajpwOZvEF+uEdbVRexqkFsWwVv0cmPkBhe6gQ6GdA9+OodoTWBe8hmWmGwA47BZvQUcstFpUaxEar8Sh5cO810DL8UnUQVS6u5d8rDBMoVg2+RBAOqW3VF0vs7hU5OgbuTOFU659H3Clg==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Sat, Dec 9, 2023 at 10:18=E2=80=AFAM Mina Almasry <almasrymina@google.co=
m> wrote:
>
> On Wed, Dec 6, 2023 at 2:54=E2=80=AFAM Liang Chen <liangchen.linux@gmail.=
com> wrote:
> >
> > In order to address the issues encountered with commit 1effe8ca4e34
> > ("skbuff: fix coalescing for page_pool fragment recycling"), the
> > combination of the following condition was excluded from skb coalescing=
:
> >
> > from->pp_recycle =3D 1
> > from->cloned =3D 1
> > to->pp_recycle =3D 1
> >
> > However, with page pool environments, the aforementioned combination ca=
n
> > be quite common(ex. NetworkMananger may lead to the additional
> > packet_type being registered, thus the cloning). In scenarios with a
> > higher number of small packets, it can significantly affect the success
> > rate of coalescing. For example, considering packets of 256 bytes size,
> > our comparison of coalescing success rate is as follows:
> >
> > Without page pool: 70%
> > With page pool: 13%
> >
> > Consequently, this has an impact on performance:
> >
> > Without page pool: 2.57 Gbits/sec
> > With page pool: 2.26 Gbits/sec
> >
> > Therefore, it seems worthwhile to optimize this scenario and enable
> > coalescing of this particular combination. To achieve this, we need to
> > ensure the correct increment of the "from" SKB page's page pool
> > reference count (pp_ref_count).
> >
> > Following this optimization, the success rate of coalescing measured in
> > our environment has improved as follows:
> >
> > With page pool: 60%
> >
> > This success rate is approaching the rate achieved without using page
> > pool, and the performance has also been improved:
> >
> > With page pool: 2.52 Gbits/sec
> >
> > Below is the performance comparison for small packets before and after
> > this optimization. We observe no impact to packets larger than 4K.
> >
> > packet size     before      after       improved
> > (bytes)         (Gbits/sec) (Gbits/sec)
> > 128             1.19        1.27        7.13%
> > 256             2.26        2.52        11.75%
> > 512             4.13        4.81        16.50%
> > 1024            6.17        6.73        9.05%
> > 2048            14.54       15.47       6.45%
> > 4096            25.44       27.87       9.52%
> >
> > Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
> > Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
> > Suggested-by: Jason Wang <jasowang@redhat.com>
> > ---
> >  include/net/page_pool/helpers.h |  5 ++++
> >  net/core/skbuff.c               | 41 +++++++++++++++++++++++----------
> >  2 files changed, 34 insertions(+), 12 deletions(-)
> >
> > diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/he=
lpers.h
> > index 9dc8eaf8a959..268bc9d9ffd3 100644
> > --- a/include/net/page_pool/helpers.h
> > +++ b/include/net/page_pool/helpers.h
> > @@ -278,6 +278,11 @@ static inline long page_pool_unref_page(struct pag=
e *page, long nr)
> >         return ret;
> >  }
> >
> > +static inline void page_pool_ref_page(struct page *page)
> > +{
> > +       atomic_long_inc(&page->pp_ref_count);
> > +}
> > +
> >  static inline bool page_pool_is_last_ref(struct page *page)
> >  {
> >         /* If page_pool_unref_page() returns 0, we were the last user *=
/
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 7e26b56cda38..3c2515a29376 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -947,6 +947,24 @@ static bool skb_pp_recycle(struct sk_buff *skb, vo=
id *data, bool napi_safe)
> >         return napi_pp_put_page(virt_to_page(data), napi_safe);
> >  }
> >
> > +/**
> > + * skb_pp_frag_ref() - Increase fragment reference count of a page
> > + * @page:      page of the fragment on which to increase a reference
> > + *
> > + * Increase fragment reference count (pp_ref_count) on a page, but if =
it is
> > + * not a page pool page, fallback to increase a reference(_refcount) o=
n a
> > + * normal page.
> > + */
> > +static void skb_pp_frag_ref(struct page *page)
> > +{
> > +       struct page *head_page =3D compound_head(page);
> > +
> > +       if (likely(is_pp_page(head_page)))
> > +               page_pool_ref_page(head_page);
> > +       else
> > +               page_ref_inc(head_page);
> > +}
> > +
>
> I am confused by this, why add a new helper instead of modifying the
> existing helper, skb_frag_ref()?
>
> My mental model is that if the net stack wants to acquire a reference
> on a frag, it calls skb_frag_ref(), and if it wants to drop a
> reference on a frag, it should call skb_frag_unref(). Internally
> skb_frag_ref/unref() can do all sorts of checking to decide whether to
> increment page->refcount or page->pp_ref_count. I can't wrap my head
> around the introduction of skb_pp_frag_ref(), but no equivalent
> skb_pp_frag_unref().
>
> But even if skb_pp_frag_unref() was added, when should the net stack
> use skb_frag_ref/unref, and when should the stack use
> skb_pp_ref/unref? The docs currently describe what the function does,
> but when a program unfamiliar with the page pool should use it.
>
> >  static void skb_kfree_head(void *head, unsigned int end_offset)
> >  {
> >         if (end_offset =3D=3D SKB_SMALL_HEAD_HEADROOM)
> > @@ -5769,17 +5787,12 @@ bool skb_try_coalesce(struct sk_buff *to, struc=
t sk_buff *from,
> >                 return false;
> >
> >         /* In general, avoid mixing page_pool and non-page_pool allocat=
ed
> > -        * pages within the same SKB. Additionally avoid dealing with c=
lones
> > -        * with page_pool pages, in case the SKB is using page_pool fra=
gment
> > -        * references (page_pool_alloc_frag()). Since we only take full=
 page
> > -        * references for cloned SKBs at the moment that would result i=
n
> > -        * inconsistent reference counts.
> > -        * In theory we could take full references if @from is cloned a=
nd
> > -        * !@to->pp_recycle but its tricky (due to potential race with
> > -        * the clone disappearing) and rare, so not worth dealing with.
> > +        * pages within the same SKB. In theory we could take full
> > +        * references if @from is cloned and !@to->pp_recycle but its
> > +        * tricky (due to potential race with the clone disappearing) a=
nd
> > +        * rare, so not worth dealing with.
> >          */
> > -       if (to->pp_recycle !=3D from->pp_recycle ||
> > -           (from->pp_recycle && skb_cloned(from)))
> > +       if (to->pp_recycle !=3D from->pp_recycle)
> >                 return false;
> >
> >         if (len <=3D skb_tailroom(to)) {
> > @@ -5836,8 +5849,12 @@ bool skb_try_coalesce(struct sk_buff *to, struct=
 sk_buff *from,
> >         /* if the skb is not cloned this does nothing
> >          * since we set nr_frags to 0.
> >          */
> > -       for (i =3D 0; i < from_shinfo->nr_frags; i++)
> > -               __skb_frag_ref(&from_shinfo->frags[i]);
> > +       if (from->pp_recycle)
> > +               for (i =3D 0; i < from_shinfo->nr_frags; i++)
> > +                       skb_pp_frag_ref(skb_frag_page(&from_shinfo->fra=
gs[i]));
> > +       else
> > +               for (i =3D 0; i < from_shinfo->nr_frags; i++)
> > +                       __skb_frag_ref(&from_shinfo->frags[i]);
>
> You added a check here to use skb_pp_frag_ref() instead of
> skb_frag_ref() here, but it's not clear to me why other callsites of
> skb_frag_ref() don't need to be modified in the same way after your
> patch.
>
> After your patch:
>
> skb_frag_ref() will always increment page->_refcount
> skb_frag_unref() will either decrement page->_refcount or decrement
> page->pp_ref_count (depending on the value of skb->pp_recycle).
> skb_pp_frag_ref() will either increment page->_refcount or increment
> page->pp_ref_count (depending on the value of is_pp_page(), not
> skb->pp_recycle).
> skb_pp_frag_unref() doesn't exist.
>
> Is this not confusing? Can we streamline things:
>
> skb_frag_ref() increments page->pp_ref_count for skb->pp_recycle,
> page->_refcount otherwise.
> skb_frag_unref() decrement page->pp_ref_count for skb->pp_recycle,
> page->_refcount otherwise.
>
> Or am I missing something that causes us to require this asymmetric
> reference counting?
>

This idea was previously implemented, as shown here:
https://lore.kernel.org/all/20211009093724.10539-5-linyunsheng@huawei.com/.
But implementing this would result in some unnecessary overhead, since
currently, 'skb_try_coalesce' is the only place where the page pool
reference count for skb frag might be increased. I would prefer to
move the logic to '__skb_frag_ref' when such a need becomes more
common. Thanks!

> >
> >         to->truesize +=3D delta;
> >         to->len +=3D len;
> > --
> > 2.31.1
> >
> >
>
>
> --
> Thanks,
> Mina