From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA2D2C4332F for ; Wed, 13 Dec 2023 02:49:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1493C6B0443; Tue, 12 Dec 2023 21:49:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0F9186B0444; Tue, 12 Dec 2023 21:49:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB4426B0445; Tue, 12 Dec 2023 21:49:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D6CC36B0443 for ; Tue, 12 Dec 2023 21:49:55 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id B825E1A0B02 for ; Wed, 13 Dec 2023 02:49:55 +0000 (UTC) X-FDA: 81560265150.09.27D2D91 Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) by imf20.hostedemail.com (Postfix) with ESMTP id EEE3F1C000A for ; Wed, 13 Dec 2023 02:49:53 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=G+r8SnqS; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf20.hostedemail.com: domain of almasrymina@google.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=almasrymina@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702435794; a=rsa-sha256; cv=none; b=pwGXLcR5sIe02W/8qkhXCAMuGqejEApBeHqiOdGVSgmKJi5qtmOsxTauu82TpFCodlyyv8 8qN55nhLDn50oGDY2sPoCHlunlC8PKI1t66jBRLhsCaG6kibB+dEmXPqNshIoy24ycdY8G CbYz4yQrFsba+8K1XhZExDr6si1gtbI= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=G+r8SnqS; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf20.hostedemail.com: domain of almasrymina@google.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=almasrymina@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702435794; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Hx3MGS0vh0t0JSEKl4JpxxRT2HR4aiOSN7E9sUnPJsU=; b=ClQUMDE0h2nks7OLK5enbFnkoXhscPPmHWiqyBsNKopwXkQg6s1DMNV3E1pHpvQ+dfMsWv iyTduFTeUdBHiNL9E7LpqmtL0MMsCdu9DlfCOdoxU907poA5wJuX63Q9zFM6PCLYYbJQFY fFXwr1gJXRHv4Fm0DjLTOIwmC8XiLpk= Received: by mail-ed1-f53.google.com with SMTP id 4fb4d7f45d1cf-54cd2281ccbso8407317a12.2 for ; Tue, 12 Dec 2023 18:49:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1702435792; x=1703040592; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Hx3MGS0vh0t0JSEKl4JpxxRT2HR4aiOSN7E9sUnPJsU=; b=G+r8SnqSET1kSG2lmiUZUM+z1YyfMUkVSBI0WPeRSxq0VdhBrmiIk3eBGDtgUpcPbE vEyYXJkFzHmQZjShcqqbnxTIq3iefhnNCk35djnapl6d8n96ZU6qICk7180tYoDwg5Vt KSdjKGx/HcjJiFJ33d8gSkmvZPOlVQ9XRqDA9xCSBqrIrrmyeggyh9xBZXGxllnbjtFZ 7rWFL24C0SFmsj9hf032rr80DxLkI7+1ofBY5VWuz+GYdGgn1REQKYjQloeC66pbfF66 BZWBXWQG/epbybD8OJECSV04j3By0xklF6hw8asriD7R86k6LBa/DkYzSk5J45mbZE1p e/Yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702435792; x=1703040592; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Hx3MGS0vh0t0JSEKl4JpxxRT2HR4aiOSN7E9sUnPJsU=; b=cS3TpxsONz8N2bhpQMo9sOdY3DqD3DYjfZL7UDaKTfyAXXeNJtc92aMClYO6F455VR oSuHJ7/JknfiW62FWWWa/SInUxkTlgAwSwc5+NrEtJrfst6ZI4mV02+IWa3suuGAef5y Phdzt5a7JrJcKOvRm0cBfsvcl5uvBF1b3RgW0pyaLytvrTvxz9HiMw5kqz2+QO58sYFH j+1eaftK3smfBpnVVWEjEy7/7CJ7Tty2p9wYZtPuqXG4QjcUb6lqwRPyeeO3b6s0btYQ RX5eBTgGh3MhmNXzBQ9MRt/QlvF2aVoPBA0xsmU747ZU1xyc6EdhpmOvfAhgfAVxboEC MB3A== X-Gm-Message-State: AOJu0YwQAZoCFaL2XFYn/0IoCJ0Pr7g2C/5wVID1M1JWUofFxwHrs9R8 A+BqyIH1lD7Q5F0zYzDQBnkppaaSOZs6n/w/zoSY/Q== X-Google-Smtp-Source: AGHT+IGe3jEGYSGhaaFpyZ5DbH0V4pQ4kWfFiVOeEr6q2XFMv4n1SIL+/2Go2YLnoKl5LEaDe5iEkB9Kkdwf84tvtPg= X-Received: by 2002:a17:906:ceda:b0:a1d:ddfc:1cda with SMTP id si26-20020a170906ceda00b00a1dddfc1cdamr3643955ejb.93.1702435792229; Tue, 12 Dec 2023 18:49:52 -0800 (PST) MIME-Version: 1.0 References: <20231212044614.42733-1-liangchen.linux@gmail.com> <20231212044614.42733-5-liangchen.linux@gmail.com> In-Reply-To: From: Mina Almasry Date: Tue, 12 Dec 2023 18:49:40 -0800 Message-ID: Subject: Re: [PATCH net-next v9 4/4] skbuff: Optimization of SKB coalescing for page pool To: Liang Chen Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, hawk@kernel.org, ilias.apalodimas@linaro.org, linyunsheng@huawei.com, netdev@vger.kernel.org, linux-mm@kvack.org, jasowang@redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: EEE3F1C000A X-Stat-Signature: jbopgb35rynz9z7sma6tyy6567djmf59 X-HE-Tag: 1702435793-172685 X-HE-Meta: U2FsdGVkX1+4AaVVvv7kGQWN7GiRYKMwqrwp/4jfBffJ9RnNZp/JyJpu0nxEYY9gBqsoxyF8x16HmSdFn2kzu9YOcv+WzQUW3AQWtWLwxFBZHWvwAoGplZRX2bEIaZ1cxOmpUtDeECG2WwT1r/Uc4LmiCVCMmXaXOXRoecA5dfloLDY1ZxyJ0+ewQBbvb9tdeXBkCl50j+YI/ac0+hwdMXK+AcqSIPEL39z3tFKcoS2sSJibgV8bpoCvSczRJqcltipEVi21VenA4Lt2640Dn2u/T112xDHKnNau0e91RUYZ9sWHM2Z5+A/qPxNRIJ02T8XNiaLTixE60noocNYROg2fcNzZzekQ+3tNW1PLd/ntMdLG7kTxW7d7DfTPuCVTyU5aV287y+tekXSEmszDUOQG5optxAlGYEMog3cA4uUTbVvJrn7x/4wuVTHbikDUZvz1ARrTfe0hKIVicAHFrSFW++wLo1hYZCMYSsAWb3hiOkpJnoFSe/+NUHWlCciI415/jSvvtcoT0XUzcaHEebMAMfcm6xESOYerFfMzpFWWxotdOhm2Fd2xes7S7aImUJgPeZ51xcsVD53dpUgYFuTBTRFfRXJ8nYHRcWLNiFr38YNCILacTkaxb3w3L5hprrZKzoIhGBp2FZQUIJoNpABhfIp/fMM0nRjyasBS+qFpWmIZi2yvpuNO4oeSLH5z9qtS1pXADDs1H0FI1lumyKL+/cjLFu/+rCAFlQAlReSDyo2k86XasAyL8picYqQbbuKl7xH3XZMUGb6C2IAXxbZArD7TB74mfmqbb13Uj8TKC0B69Hzg/aeIL2YM0XY5j2a/ecZVxWIiDkKlnfhrjp/wA9D9aX0WHkdq4pL1o9z2qhrGcn9Xhnkoyx7fYGnng6W/xUFU9f6Dg1/Lop4/rNNchomlXZrJOsQNxL7UtGMmD1aSw6y/u+xUOcBhTc7o5H0sC8r1AZTmAbCK9Hk g4VKHIbU W6pA+vM55+X1W3ovG/cBSOt4b/lpqD21QWBmggAelnfRRs3KJBos9Z1/LRyt3skK1xymfJFTfoqYvpFPgXA6JFZ64XRFFNcWP+d1Ab/XybcZhT1pTBUV5mPKlaky56p7Go/1YQVKhtq3fWS8A9gmM0CEP9o15LqcJp7bK/TAXQ7Zr3uzVITJdqZY1ZThpwxROx62VX4YKeLApaPzR8rDRmgAWae64I18mp6rSSEWz8m9kukilydXkct6V84AEPjTwVvsiPjNWD9ILPqq/qKpuJiYKUC270FTCDmW7e746ewo6AgBO51zfJNwrLFEh9VmGSTFAgPTiBX768RqrB82sKEMGdUYUMxOkMpfZciR00oRsvUl1jw5OAF+sJYyIbM74EPOvAM7siPTRQbPwrb5vOqmrkYZ8y5M8OkwuoAglDx5ITV7r6dka9qD2/x+JUx5ZWK1b/BJfGkO+SoY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Dec 12, 2023 at 6:37=E2=80=AFPM Liang Chen wrote: > > On Wed, Dec 13, 2023 at 9:49=E2=80=AFAM Mina Almasry wrote: > > > > On Mon, Dec 11, 2023 at 8:47=E2=80=AFPM Liang Chen wrote: > > > > > > In order to address the issues encountered with commit 1effe8ca4e34 > > > ("skbuff: fix coalescing for page_pool fragment recycling"), the > > > combination of the following condition was excluded from skb coalesci= ng: > > > > > > from->pp_recycle =3D 1 > > > from->cloned =3D 1 > > > to->pp_recycle =3D 1 > > > > > > However, with page pool environments, the aforementioned combination = can > > > be quite common(ex. NetworkMananger may lead to the additional > > > packet_type being registered, thus the cloning). In scenarios with a > > > higher number of small packets, it can significantly affect the succe= ss > > > rate of coalescing. For example, considering packets of 256 bytes siz= e, > > > our comparison of coalescing success rate is as follows: > > > > > > Without page pool: 70% > > > With page pool: 13% > > > > > > Consequently, this has an impact on performance: > > > > > > Without page pool: 2.57 Gbits/sec > > > With page pool: 2.26 Gbits/sec > > > > > > Therefore, it seems worthwhile to optimize this scenario and enable > > > coalescing of this particular combination. To achieve this, we need t= o > > > ensure the correct increment of the "from" SKB page's page pool > > > reference count (pp_ref_count). > > > > > > Following this optimization, the success rate of coalescing measured = in > > > our environment has improved as follows: > > > > > > With page pool: 60% > > > > > > This success rate is approaching the rate achieved without using page > > > pool, and the performance has also been improved: > > > > > > With page pool: 2.52 Gbits/sec > > > > > > Below is the performance comparison for small packets before and afte= r > > > this optimization. We observe no impact to packets larger than 4K. > > > > > > packet size before after improved > > > (bytes) (Gbits/sec) (Gbits/sec) > > > 128 1.19 1.27 7.13% > > > 256 2.26 2.52 11.75% > > > 512 4.13 4.81 16.50% > > > 1024 6.17 6.73 9.05% > > > 2048 14.54 15.47 6.45% > > > 4096 25.44 27.87 9.52% > > > > > > Signed-off-by: Liang Chen > > > Reviewed-by: Yunsheng Lin > > > Suggested-by: Jason Wang > > > --- > > > include/net/page_pool/helpers.h | 5 ++++ > > > net/core/skbuff.c | 43 ++++++++++++++++++++++++-------= -- > > > 2 files changed, 36 insertions(+), 12 deletions(-) > > > > > > diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/= helpers.h > > > index d0c5e7e6857a..0dc8fab43bef 100644 > > > --- a/include/net/page_pool/helpers.h > > > +++ b/include/net/page_pool/helpers.h > > > @@ -281,6 +281,11 @@ static inline long page_pool_unref_page(struct p= age *page, long nr) > > > return ret; > > > } > > > > > > +static inline void page_pool_ref_page(struct page *page) > > > +{ > > > + atomic_long_inc(&page->pp_ref_count); > > > +} > > > + > > > static inline bool page_pool_is_last_ref(struct page *page) > > > { > > > /* If page_pool_unref_page() returns 0, we were the last user= */ > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > index 7e26b56cda38..783a04733109 100644 > > > --- a/net/core/skbuff.c > > > +++ b/net/core/skbuff.c > > > @@ -947,6 +947,26 @@ static bool skb_pp_recycle(struct sk_buff *skb, = void *data, bool napi_safe) > > > return napi_pp_put_page(virt_to_page(data), napi_safe); > > > } > > > > > > +/** > > > + * skb_pp_frag_ref() - Increase fragment reference count of a page > > > + * @page: page of the fragment on which to increase a reference > > > + * > > > + * Increase the fragment reference count (pp_ref_count) of a page. T= his is > > > + * intended to gain a fragment reference only for page pool aware sk= bs, > > > + * i.e. when skb->pp_recycle is true, and not for fragments in a > > > + * non-pp-recycling skb. It has a fallback to increase a reference o= n a > > > + * normal page, as page pool aware skbs may also have normal page fr= agments. > > > + */ > > > +static void skb_pp_frag_ref(struct page *page) > > > +{ > > > + struct page *head_page =3D compound_head(page); > > > + > > > > Feel free to not delay this patch series further based on this > > comment/question, but... > > > > I'm a bit confused about the need for compound_head() here, but > > skb_frag_ref() doesn't first obtain the compound_head(). Is there a > > page_pool specific reason why skb_frag_ref() can get_page() directly > > but this helper needs to grab the compound_head() first? > > > > get_page includes the call to compound_head, so skb_frag_ref > indirectly calls compound_head as well. > > > > + if (likely(is_pp_page(head_page))) > > > + page_pool_ref_page(head_page); > > > + else > > > + page_ref_inc(head_page); > > > > Any reason why not get_page() here? > > > > head_page is a head page because of the compound_head call above. This > was actually a comment received from a previous iteration:) > I see, thanks. Reviewed-by: Mina Almasry Noob question: do we actually support someone passing a compound_page to skb_frag_fill_page_desc()? Anyone know of any driver that does this? I kinda like the direction this patch was going instead: https://patchwork.kernel.org/project/netdevbpf/patch/20231113130041.58124-5= -linyunsheng@huawei.com/ Where we explicitly exclude compound pages from skbs... This is for convenience for devmem TCP, where I don't support compound pages, but that is more my problem than yours. This patch is fine. > > > +} > > > + > > > static void skb_kfree_head(void *head, unsigned int end_offset) > > > { > > > if (end_offset =3D=3D SKB_SMALL_HEAD_HEADROOM) > > > @@ -5769,17 +5789,12 @@ bool skb_try_coalesce(struct sk_buff *to, str= uct sk_buff *from, > > > return false; > > > > > > /* In general, avoid mixing page_pool and non-page_pool alloc= ated > > > - * pages within the same SKB. Additionally avoid dealing with= clones > > > - * with page_pool pages, in case the SKB is using page_pool f= ragment > > > - * references (page_pool_alloc_frag()). Since we only take fu= ll page > > > - * references for cloned SKBs at the moment that would result= in > > > - * inconsistent reference counts. > > > - * In theory we could take full references if @from is cloned= and > > > - * !@to->pp_recycle but its tricky (due to potential race wit= h > > > - * the clone disappearing) and rare, so not worth dealing wit= h. > > > + * pages within the same SKB. In theory we could take full > > > + * references if @from is cloned and !@to->pp_recycle but its > > > + * tricky (due to potential race with the clone disappearing)= and > > > + * rare, so not worth dealing with. > > > */ > > > - if (to->pp_recycle !=3D from->pp_recycle || > > > - (from->pp_recycle && skb_cloned(from))) > > > + if (to->pp_recycle !=3D from->pp_recycle) > > > return false; > > > > > > if (len <=3D skb_tailroom(to)) { > > > @@ -5836,8 +5851,12 @@ bool skb_try_coalesce(struct sk_buff *to, stru= ct sk_buff *from, > > > /* if the skb is not cloned this does nothing > > > * since we set nr_frags to 0. > > > */ > > > - for (i =3D 0; i < from_shinfo->nr_frags; i++) > > > - __skb_frag_ref(&from_shinfo->frags[i]); > > > + if (from->pp_recycle) > > > + for (i =3D 0; i < from_shinfo->nr_frags; i++) > > > + skb_pp_frag_ref(skb_frag_page(&from_shinfo->f= rags[i])); > > > + else > > > + for (i =3D 0; i < from_shinfo->nr_frags; i++) > > > + __skb_frag_ref(&from_shinfo->frags[i]); > > > > > > to->truesize +=3D delta; > > > to->len +=3D len; > > > -- > > > 2.31.1 > > > > > > > > > -- > > Thanks, > > Mina --=20 Thanks, Mina