From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 39D19C4167B
	for <linux-mm@archiver.kernel.org>; Mon, 11 Dec 2023 04:21:36 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 694716B00A1; Sun, 10 Dec 2023 23:21:35 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 642D96B00A2; Sun, 10 Dec 2023 23:21:35 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 50AF96B00A3; Sun, 10 Dec 2023 23:21:35 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17])
	by kanga.kvack.org (Postfix) with ESMTP id 432156B00A1
	for <linux-mm@kvack.org>; Sun, 10 Dec 2023 23:21:35 -0500 (EST)
Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay04.hostedemail.com (Postfix) with ESMTP id 211A71A063B
	for <linux-mm@kvack.org>; Mon, 11 Dec 2023 04:21:35 +0000 (UTC)
X-FDA: 81553238550.19.F78A9D5
Received: from mail-vs1-f54.google.com (mail-vs1-f54.google.com [209.85.217.54])
	by imf16.hostedemail.com (Postfix) with ESMTP id 5672B180011
	for <linux-mm@kvack.org>; Mon, 11 Dec 2023 04:21:33 +0000 (UTC)
Authentication-Results: imf16.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=cv+dBeoI;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf16.hostedemail.com: domain of almasrymina@google.com designates 209.85.217.54 as permitted sender) smtp.mailfrom=almasrymina@google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1702268493;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=H8/aEGODWLQyR+Ehqt9zVQlYBaFeN9s1x0j4MUOMfuY=;
	b=HTQlz3uiJZYHN4NmKLNy00Kx5B7O7QnBTeWPXwgdi12oj1rXeKnKyOLar69qKBUv7RvpjB
	9nNrySXzTSgITOCmKACG5ujCBX/2CGE5+n7RSm6t8BHoARSJY1ezffR3uC5slA7a7ksKh0
	SeU/QXb1QDSjTLzqNpWnNmKMqrJcYz8=
ARC-Authentication-Results: i=1;
	imf16.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=cv+dBeoI;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf16.hostedemail.com: domain of almasrymina@google.com designates 209.85.217.54 as permitted sender) smtp.mailfrom=almasrymina@google.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702268493; a=rsa-sha256;
	cv=none;
	b=CXzrVcMIYQc0l6wd5GteUbxYG4TDmM5FksJaTPoyA0IK7cPKvzWQw4LaNtJROZkDRUD6hK
	HPtCmslXs9zxV1XuZVyAkTD4zn/19X/FV+H9V7pHmik9fgWPSiPQ8eyk3FQH3ySDu61T9F
	qpOzMCLeDJbChtEYnR81XK1floYg1yA=
Received: by mail-vs1-f54.google.com with SMTP id ada2fe7eead31-4647e1fd35cso1005417137.0
        for <linux-mm@kvack.org>; Sun, 10 Dec 2023 20:21:33 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1702268492; x=1702873292; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=H8/aEGODWLQyR+Ehqt9zVQlYBaFeN9s1x0j4MUOMfuY=;
        b=cv+dBeoIVCKgq+Ejgd4F6Hi8utJaB+vQWm0BGnCyprgRtCDOAEbOHTklx5VMJkTIPH
         fzwrjRC+HOGwp/ZBK6TwGIqRjxCJxzWcbY+khsi3aVoiCGbJQMVp5eNRuQg2QIhzyTqW
         0rxh66ojRsvOKVUZoT9e0LACsfUCgPs537AEqBzpUCkrS/qf6tVkb5ur+hCQZs4NESKy
         iNVvMwrEk5yfk1yHCx4tiLQKmoAs7kIgkIjaFA5qI/M9VEyYTo9vPkzwG9kDHC8G89X3
         0yEUverdPlwIGBQbUzTP+Jy8WSx80YTcHaeezwv5sVY+SJ6+c7uamGWTIJAxi59HmyJN
         mqBA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1702268492; x=1702873292;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=H8/aEGODWLQyR+Ehqt9zVQlYBaFeN9s1x0j4MUOMfuY=;
        b=kxRnSn/cYz9WD9yhQPM3FFSFJScrMrwbjbCbuOOizLW68gDSzgteBrPTc4abiof83p
         zQeGcDocX83ErMcwiEtzdHnV9owtufrO7bn4nBtMIuBCa5VudtwI/NM/Drz+edEQgTHv
         ULkE9cACIa17g3qbQ6mh2Hmdtso0o8+yhTMr5DP/jy2gQo/aB34cjteRljESea01+9F1
         +/B1utnNv9n8aby2Ke2MibvHKdT9BQez+6dzXNT9aH+5PdHo4ZUGyYZU12pSMu9m8BHB
         X/Wief3bz6E248ii/sE9GnVuaLC27mZNwC3TQhuS+aIlMBlU+Ull/wDix+okK/wPGxAK
         LbQQ==
X-Gm-Message-State: AOJu0YwOTPwRPXLvST9IMCCZ8T9wMwQbI8l1JigaC3QM8ojexZeKJj8n
	ZZBG0KWl3RQWcZBZwf4LiLO5HlJf/nb1y+vSjuyveA==
X-Google-Smtp-Source: AGHT+IEHw5pq3fytxwBk+eZGmvzRYn/8p+W9cVF0ENu6U7xeRe54oCpUeMlsmrnzh19Jo757GGewV5ZEfwxBhu8cAcE=
X-Received: by 2002:a05:6102:54a1:b0:464:84e4:fa70 with SMTP id
 bk33-20020a05610254a100b0046484e4fa70mr2257691vsb.24.1702268492207; Sun, 10
 Dec 2023 20:21:32 -0800 (PST)
MIME-Version: 1.0
References: <20231206105419.27952-1-liangchen.linux@gmail.com>
 <20231206105419.27952-5-liangchen.linux@gmail.com> <CAHS8izNQeSwWQ9NwiDUcPoSX1WONG4JYu2rfpqF3+4xkxE=Wyw@mail.gmail.com>
 <CAKhg4t+LpF=G0DBhbuRYtxKyTrMiR3pSc15sY42kc57iGQfPmw@mail.gmail.com>
In-Reply-To: <CAKhg4t+LpF=G0DBhbuRYtxKyTrMiR3pSc15sY42kc57iGQfPmw@mail.gmail.com>
From: Mina Almasry <almasrymina@google.com>
Date: Sun, 10 Dec 2023 20:21:21 -0800
Message-ID: <CAHS8izPpWZvOSswHP0n-_nBiUMw8Ay2iM4yFE-HZenHv51iBHA@mail.gmail.com>
Subject: Re: [PATCH net-next v7 4/4] skbuff: Optimization of SKB coalescing
 for page pool
To: Liang Chen <liangchen.linux@gmail.com>
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, 
	pabeni@redhat.com, hawk@kernel.org, ilias.apalodimas@linaro.org, 
	linyunsheng@huawei.com, netdev@vger.kernel.org, linux-mm@kvack.org, 
	jasowang@redhat.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Server: rspam09
X-Rspamd-Queue-Id: 5672B180011
X-Stat-Signature: ouywu776p4p9itjxxuy7iez4z4c94eti
X-Rspam-User: 
X-HE-Tag: 1702268493-909233
X-HE-Meta: U2FsdGVkX1/eAWNPTgMulnvOtMFHkkvANCotZc8g3V5FTAO6bsmF5Iik0gP8RtLyjxPGwkO2//HgFh8VghGYk8NApz4MxA5up83s/dwi0uVxagyn3KNrIhMeOECKXfKfjM+3sIRSXtYZg9CTG949fXXc/agvEhCm3HjXv8fJxFaxP0Dv/UF0TT+gCGzgclLsOQASymXai+pnEEB/KseRb9NyJMv6HEL1uYzZxIbSG5Y9kszcvknhE/zClhcLju6N2FJT0FXNXGz0cCRIjxfKGUJS3Q+gHUIHX32jRsqrwN5WX8FByaaTdoZPIzdRru+jYNo0IbncsFQgZdfd738Y5nlEcuGyJEXV5TtxrlC6RiuAcekttV3532AZqdHqjEMH5JruwwK2F7Xv6mg7+P1WgE9+QGr2aTx+MVJzJZVT9frSFW3qAtqpFXBKl0ISdX7b3b9Gt79+lAsBHmkYOvbehgh39Y2y7X9jCeuNto5Ct+2O+BmtIV65WTq8L4zTnDvNx+xzti1K72st2ziNlUqdOuzw5TOlaTf7NrLaUE9sU+W/JrDVMFvzB8S+1IBbF6btM5HO7/IGLVmnEzClmeHe5b07ioe3z1O4RKDcoNLJPmbD2UVGL7UXGbLVpCC7ctz21ZyOkht8aehsBY61U3eCv7aVx7wSv7RnluU4zVEAccFrIaRWNeAzk1GkUp0i3tr4/9MAXr4xwbN+fTodhUrUo1H8eEb5gq1i3QCznOiC9mCYY2L8D4URSOwrKvd958EDSDrfkdEPluDWG0MRDlTTKfS4uxpaDY9tHOR9L8CZM1zwjyO/utjdXA72mXf3Dse/X8G9Z/u+gMixHRlw7kBjHaWDBjPVZlX+J/FGx12rRDEhSryO2NZSNJ98hANYJNuTAGhQgrjTRgn3ufQep3UCoZSmzwirpyorR+9ghKFVCW3T/zGYU8H5b3DpvCKLfguS9+zvv84A1n8zjcqCcPJ
 q8Z/RKxJ
 OsRh1uHQ+UQogMJO51M5mGQmdE2JXzTqYnlkdR6svEu19LXa//qnUS2B0WpA2oEbMgsMOZ805hAT6jVwZGdZxkLjtybexsLi5ZptWTUuD4ImdRIEc4Xt5qTSQtqXOYw+URgq6DnFgyo4oETDQHab/wyxEkUrOJqbnlviJdRCYWyU2PW0qcBGLNUDT4wbUUj9zhcGQhkqML04luPm0XVghpbRMChdC4iMaS+8FWPxb+fd9mX9IrJuL7M9btmoNLY6hMKWpIUR7exRyQrAPyEAwDYQAJaGDpiMnORATih19oOlOYiIr7dvmZ7fBgTus5yuw+V6+Axntz3slYztA7mSYikWRFzGfGYIJsL4Sp9onwM5WPHBM+VG6eGyPr93d9NvCTM7IRDEu+Z7sAI0MPjAoierjz2OznkDyFh1YOZzMlDddizvgoRGToiKB8i38vwmfC0H4pamEWQuglVg=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Sun, Dec 10, 2023 at 7:38=E2=80=AFPM Liang Chen <liangchen.linux@gmail.c=
om> wrote:
>
> On Sat, Dec 9, 2023 at 10:18=E2=80=AFAM Mina Almasry <almasrymina@google.=
com> wrote:
> >
> > On Wed, Dec 6, 2023 at 2:54=E2=80=AFAM Liang Chen <liangchen.linux@gmai=
l.com> wrote:
> > >
> > > In order to address the issues encountered with commit 1effe8ca4e34
> > > ("skbuff: fix coalescing for page_pool fragment recycling"), the
> > > combination of the following condition was excluded from skb coalesci=
ng:
> > >
> > > from->pp_recycle =3D 1
> > > from->cloned =3D 1
> > > to->pp_recycle =3D 1
> > >
> > > However, with page pool environments, the aforementioned combination =
can
> > > be quite common(ex. NetworkMananger may lead to the additional
> > > packet_type being registered, thus the cloning). In scenarios with a
> > > higher number of small packets, it can significantly affect the succe=
ss
> > > rate of coalescing. For example, considering packets of 256 bytes siz=
e,
> > > our comparison of coalescing success rate is as follows:
> > >
> > > Without page pool: 70%
> > > With page pool: 13%
> > >
> > > Consequently, this has an impact on performance:
> > >
> > > Without page pool: 2.57 Gbits/sec
> > > With page pool: 2.26 Gbits/sec
> > >
> > > Therefore, it seems worthwhile to optimize this scenario and enable
> > > coalescing of this particular combination. To achieve this, we need t=
o
> > > ensure the correct increment of the "from" SKB page's page pool
> > > reference count (pp_ref_count).
> > >
> > > Following this optimization, the success rate of coalescing measured =
in
> > > our environment has improved as follows:
> > >
> > > With page pool: 60%
> > >
> > > This success rate is approaching the rate achieved without using page
> > > pool, and the performance has also been improved:
> > >
> > > With page pool: 2.52 Gbits/sec
> > >
> > > Below is the performance comparison for small packets before and afte=
r
> > > this optimization. We observe no impact to packets larger than 4K.
> > >
> > > packet size     before      after       improved
> > > (bytes)         (Gbits/sec) (Gbits/sec)
> > > 128             1.19        1.27        7.13%
> > > 256             2.26        2.52        11.75%
> > > 512             4.13        4.81        16.50%
> > > 1024            6.17        6.73        9.05%
> > > 2048            14.54       15.47       6.45%
> > > 4096            25.44       27.87       9.52%
> > >
> > > Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
> > > Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
> > > Suggested-by: Jason Wang <jasowang@redhat.com>
> > > ---
> > >  include/net/page_pool/helpers.h |  5 ++++
> > >  net/core/skbuff.c               | 41 +++++++++++++++++++++++--------=
--
> > >  2 files changed, 34 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/=
helpers.h
> > > index 9dc8eaf8a959..268bc9d9ffd3 100644
> > > --- a/include/net/page_pool/helpers.h
> > > +++ b/include/net/page_pool/helpers.h
> > > @@ -278,6 +278,11 @@ static inline long page_pool_unref_page(struct p=
age *page, long nr)
> > >         return ret;
> > >  }
> > >
> > > +static inline void page_pool_ref_page(struct page *page)
> > > +{
> > > +       atomic_long_inc(&page->pp_ref_count);
> > > +}
> > > +
> > >  static inline bool page_pool_is_last_ref(struct page *page)
> > >  {
> > >         /* If page_pool_unref_page() returns 0, we were the last user=
 */
> > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > index 7e26b56cda38..3c2515a29376 100644
> > > --- a/net/core/skbuff.c
> > > +++ b/net/core/skbuff.c
> > > @@ -947,6 +947,24 @@ static bool skb_pp_recycle(struct sk_buff *skb, =
void *data, bool napi_safe)
> > >         return napi_pp_put_page(virt_to_page(data), napi_safe);
> > >  }
> > >
> > > +/**
> > > + * skb_pp_frag_ref() - Increase fragment reference count of a page
> > > + * @page:      page of the fragment on which to increase a reference
> > > + *
> > > + * Increase fragment reference count (pp_ref_count) on a page, but i=
f it is
> > > + * not a page pool page, fallback to increase a reference(_refcount)=
 on a
> > > + * normal page.
> > > + */
> > > +static void skb_pp_frag_ref(struct page *page)
> > > +{
> > > +       struct page *head_page =3D compound_head(page);
> > > +
> > > +       if (likely(is_pp_page(head_page)))
> > > +               page_pool_ref_page(head_page);
> > > +       else
> > > +               page_ref_inc(head_page);
> > > +}
> > > +
> >
> > I am confused by this, why add a new helper instead of modifying the
> > existing helper, skb_frag_ref()?
> >
> > My mental model is that if the net stack wants to acquire a reference
> > on a frag, it calls skb_frag_ref(), and if it wants to drop a
> > reference on a frag, it should call skb_frag_unref(). Internally
> > skb_frag_ref/unref() can do all sorts of checking to decide whether to
> > increment page->refcount or page->pp_ref_count. I can't wrap my head
> > around the introduction of skb_pp_frag_ref(), but no equivalent
> > skb_pp_frag_unref().
> >
> > But even if skb_pp_frag_unref() was added, when should the net stack
> > use skb_frag_ref/unref, and when should the stack use
> > skb_pp_ref/unref? The docs currently describe what the function does,
> > but when a program unfamiliar with the page pool should use it.
> >
> > >  static void skb_kfree_head(void *head, unsigned int end_offset)
> > >  {
> > >         if (end_offset =3D=3D SKB_SMALL_HEAD_HEADROOM)
> > > @@ -5769,17 +5787,12 @@ bool skb_try_coalesce(struct sk_buff *to, str=
uct sk_buff *from,
> > >                 return false;
> > >
> > >         /* In general, avoid mixing page_pool and non-page_pool alloc=
ated
> > > -        * pages within the same SKB. Additionally avoid dealing with=
 clones
> > > -        * with page_pool pages, in case the SKB is using page_pool f=
ragment
> > > -        * references (page_pool_alloc_frag()). Since we only take fu=
ll page
> > > -        * references for cloned SKBs at the moment that would result=
 in
> > > -        * inconsistent reference counts.
> > > -        * In theory we could take full references if @from is cloned=
 and
> > > -        * !@to->pp_recycle but its tricky (due to potential race wit=
h
> > > -        * the clone disappearing) and rare, so not worth dealing wit=
h.
> > > +        * pages within the same SKB. In theory we could take full
> > > +        * references if @from is cloned and !@to->pp_recycle but its
> > > +        * tricky (due to potential race with the clone disappearing)=
 and
> > > +        * rare, so not worth dealing with.
> > >          */
> > > -       if (to->pp_recycle !=3D from->pp_recycle ||
> > > -           (from->pp_recycle && skb_cloned(from)))
> > > +       if (to->pp_recycle !=3D from->pp_recycle)
> > >                 return false;
> > >
> > >         if (len <=3D skb_tailroom(to)) {
> > > @@ -5836,8 +5849,12 @@ bool skb_try_coalesce(struct sk_buff *to, stru=
ct sk_buff *from,
> > >         /* if the skb is not cloned this does nothing
> > >          * since we set nr_frags to 0.
> > >          */
> > > -       for (i =3D 0; i < from_shinfo->nr_frags; i++)
> > > -               __skb_frag_ref(&from_shinfo->frags[i]);
> > > +       if (from->pp_recycle)
> > > +               for (i =3D 0; i < from_shinfo->nr_frags; i++)
> > > +                       skb_pp_frag_ref(skb_frag_page(&from_shinfo->f=
rags[i]));
> > > +       else
> > > +               for (i =3D 0; i < from_shinfo->nr_frags; i++)
> > > +                       __skb_frag_ref(&from_shinfo->frags[i]);
> >
> > You added a check here to use skb_pp_frag_ref() instead of
> > skb_frag_ref() here, but it's not clear to me why other callsites of
> > skb_frag_ref() don't need to be modified in the same way after your
> > patch.
> >
> > After your patch:
> >
> > skb_frag_ref() will always increment page->_refcount
> > skb_frag_unref() will either decrement page->_refcount or decrement
> > page->pp_ref_count (depending on the value of skb->pp_recycle).
> > skb_pp_frag_ref() will either increment page->_refcount or increment
> > page->pp_ref_count (depending on the value of is_pp_page(), not
> > skb->pp_recycle).
> > skb_pp_frag_unref() doesn't exist.
> >
> > Is this not confusing? Can we streamline things:
> >
> > skb_frag_ref() increments page->pp_ref_count for skb->pp_recycle,
> > page->_refcount otherwise.
> > skb_frag_unref() decrement page->pp_ref_count for skb->pp_recycle,
> > page->_refcount otherwise.
> >
> > Or am I missing something that causes us to require this asymmetric
> > reference counting?
> >
>
> This idea was previously implemented, as shown here:
> https://lore.kernel.org/all/20211009093724.10539-5-linyunsheng@huawei.com=
/.
> But implementing this would result in some unnecessary overhead, since
> currently, 'skb_try_coalesce' is the only place where the page pool
> reference count for skb frag might be increased. I would prefer to
> move the logic to '__skb_frag_ref' when such a need becomes more
> common. Thanks!
>

Is it possible/desirable to add a comment to skb_frag_ref() that it
should not be used with skb->pp_recycle? At least I was tripped by
this, but maybe it's considered obvious somehow.

But I feel like this maybe needs to be fixed. Why does the page_pool
need a separate page->pp_ref_count? Why not use page->_refcount like
the rest of the code? Is there a history here behind this decision
that you can point me to? It seems to me that
incrementing/decrementing page->pp_ref_count may be equivalent to
doing the same on page->_refcount.

> > >
> > >         to->truesize +=3D delta;
> > >         to->len +=3D len;
> > > --
> > > 2.31.1
> > >
> > >
> >
> >
> > --
> > Thanks,
> > Mina


--=20
Thanks,
Mina