From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 069BFD3C92A
	for <linux-mm@archiver.kernel.org>; Sun, 20 Oct 2024 15:46:11 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 2E02B6B007B; Sun, 20 Oct 2024 11:46:11 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 28FA36B0082; Sun, 20 Oct 2024 11:46:11 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 157866B0083; Sun, 20 Oct 2024 11:46:11 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11])
	by kanga.kvack.org (Postfix) with ESMTP id EC34A6B007B
	for <linux-mm@kvack.org>; Sun, 20 Oct 2024 11:46:10 -0400 (EDT)
Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay07.hostedemail.com (Postfix) with ESMTP id 30F0F16130E
	for <linux-mm@kvack.org>; Sun, 20 Oct 2024 15:45:54 +0000 (UTC)
X-FDA: 82694406312.19.63FD0BA
Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com [209.85.221.42])
	by imf20.hostedemail.com (Postfix) with ESMTP id DA0C71C0012
	for <linux-mm@kvack.org>; Sun, 20 Oct 2024 15:45:51 +0000 (UTC)
Authentication-Results: imf20.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=Lm350m1i;
	spf=pass (imf20.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.221.42 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1729439093;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=3PjXYYMDpPm5X5EPWEKHoxPJTPqKzwTZG3+9F8SNHhM=;
	b=0Omormuw/+s3c/kSJoj8/JM7glJAoGgPneZZzq/TMiR2fmoPYoQoZH2/Z6XQQ4OLUL8a3M
	0IeZ7jAuc6jKmPZzBezV739yRr3pb9EVQry74UeUOUVunJy8WMJLNY1xVq2aUS7z57AWeN
	aKTAAn6Vh30N5Ia3ALd3iqfDT4eDK0A=
ARC-Authentication-Results: i=1;
	imf20.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=Lm350m1i;
	spf=pass (imf20.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.221.42 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729439093; a=rsa-sha256;
	cv=none;
	b=3ne/a9Mh6lC2BXiwUwpyjAhzSJtTssR9DaylgMW33CCdg9CeGV/MNAZ0riWHrQPxptuvQC
	acEMQOtPiHOb7mQURgR2HyV8gXahTw50P/RmxQIJrUiX5XuN1WmjWEyAxi9Cok2fs2/MhW
	EDZxzA5Rhein9fk/6xvocoe2S+kZtyQ=
Received: by mail-wr1-f42.google.com with SMTP id ffacd0b85a97d-37d63a79bb6so2822878f8f.0
        for <linux-mm@kvack.org>; Sun, 20 Oct 2024 08:46:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1729439167; x=1730043967; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=3PjXYYMDpPm5X5EPWEKHoxPJTPqKzwTZG3+9F8SNHhM=;
        b=Lm350m1i/nGcDcjNNRUAntEbr7vi09g+snODehAaVouxWVrR16gydV9pmHI7OT1f3J
         E9j0hW2mIGzB53g8AjFkpiKGd+o44zYoX+VtOkZVKtvrP8efMaP/bOmYmiVdaoX2kslp
         dBJBOT7lDkRHmnFGBs4vTV467gmbqQbFCsJdGU8ziJhcNCocoR/pldu+ixR0PiaJDgmw
         /r0mvG61nTdK+C7g0wSj6LcvV4e7osWN7mF+cuIsoeTkWSq9NJzFe5vbJ0PZk9ulFEPS
         QhMUg0mpZHrGUQMGsYKYD2U1TR81SUWtw0kLQna2GwjWYHwvUxqpabnQ9UMh2JWuuXPn
         L2FA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1729439167; x=1730043967;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=3PjXYYMDpPm5X5EPWEKHoxPJTPqKzwTZG3+9F8SNHhM=;
        b=NL1GNs0swKR9L7SuwN09f1ekoBUBmyUq6SMKFQd89pzg7OVM3tIAuGW9AUzslTKKUa
         uy3xbFxKsVOmZHoxoEJsItOkBWIbXBUrtk57+6q3kJGRW5Gb96ZoIP1p5kJf57TJiN7X
         aMMGs6PKNHGQz1+fFff/UFGfxKy935jqzA8bj/BVmoJ5vG9alhr1QZektf/8YvDhNMuO
         eet9uV1MhdX3PYzQpakZ5DjJ2S6Sb5HUPCX+3vWBnKhW8iAt5ban5o3hFrTkPciu/EdO
         ePnN6zLKe2then2zlq83f5TIdtZO1oRyuRcV3VVkpaSImcO9a5qQezkyKPty9AohGISi
         a4YQ==
X-Forwarded-Encrypted: i=1; AJvYcCVil8umlrm8b1m8/nGbqC7zpcaAaSa9KNa6qof1+jaoTN7O0PJM2AYAIzbBl4RDRXArX5epvfmSrg==@kvack.org
X-Gm-Message-State: AOJu0YwEVWZDnzmxH5ZEN0SS/S6QY4Hy/3o69S7DoGWjosURxxPo3340
	U8EtG+2+Hx64VfCI41cuia2NGLlT8o7sBcTTZPMBB401asIC6KsfTr7kc2AGku73welI3HnRNsG
	SDis7FQvJoHC1CFvwnoq6Zg+qpgA=
X-Google-Smtp-Source: AGHT+IFpTTK4xiZI+sQVonpSTpHD6MhBxh/2WcsJso7oJEHFdD5WDxTb3wuuPVyGYl3gdXTbACogsrG9ax0OKJcikNs=
X-Received: by 2002:a05:6000:4013:b0:374:b5fc:d31a with SMTP id
 ffacd0b85a97d-37eab7260ebmr7395445f8f.25.1729439166886; Sun, 20 Oct 2024
 08:46:06 -0700 (PDT)
MIME-Version: 1.0
References: <20241018105351.1960345-1-linyunsheng@huawei.com>
 <20241018105351.1960345-8-linyunsheng@huawei.com> <CAKgT0UcBveXG3D9aHHADHn3yAwA6mLeQeSqoyP+UwyQ3FDEKGw@mail.gmail.com>
 <e38cc22e-afbc-445e-b986-9ab31c799a09@gmail.com>
In-Reply-To: <e38cc22e-afbc-445e-b986-9ab31c799a09@gmail.com>
From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Sun, 20 Oct 2024 08:45:29 -0700
Message-ID: <CAKgT0UeM15+HZor5_woJ4Fd_YrHVgrMM86wD4o5xGczQXC2aOg@mail.gmail.com>
Subject: Re: [PATCH net-next v22 07/14] mm: page_frag: some minor refactoring
 before adding new API
To: Yunsheng Lin <yunshenglin0825@gmail.com>
Cc: Yunsheng Lin <linyunsheng@huawei.com>, davem@davemloft.net, kuba@kernel.org, 
	pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, 
	Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Server: rspam03
X-Rspam-User: 
X-Rspamd-Queue-Id: DA0C71C0012
X-Stat-Signature: s4ucs15xc56e1f7e4u1ergyzc95f49oj
X-HE-Tag: 1729439151-98434
X-HE-Meta: U2FsdGVkX19WWno22lduhD+tsIn2GKuoBmzssbRaEiyx/kuO8vgldZRh0b8sHQxfuUPTdCmy56w3bHxsNME9mToU8MsBHQKOlN2gpy1rEIbrrXnELnLeqzHLtqrF2NGUKjfDv2IUkrAnB1uKS554uIR8w9fxJkb9MQYe0XjPTxQA7fLEVwDl3zQsXCNceJlnGo7PUvf1ZnOh4W+GHFi1GAHexVP0XQImFdMQsmZ/+k1+mQJWL7ARu05u1lq8xNjYznyA0Z3lOZ4+AXyxGd9Jj1WFBGIoOTvk5GBrsBMsW/26cPzVvENN/DNQ1CNUOTO+VAtXoN/0sDFNzkygVV8AHJOFvYHhXfMw9S+E9pMnYkE/chHqBth7ZvaJjAOgMJz38mAz+1kT3cLH4LuNTF823xPQN9R1oeVaQA/bhiZuWekC9Pzrtj4ebZ4QLPULeDHUydJS2g1yi3YAW+0quIot6tgdQgvIDGSBX/k1DpTrn4L7+GxXiiT7klJNzoNved2eyR76TkrxGS4QgV+edxmsE2q1nMhC1TIHRIrVBwcV/eEzEp5HIJGJCaJNtAJ958L/nH/j00OEiImZAb5BDqXefayooVRXb/c1NHahFq50IiSIJsliVNHxZgMDJjWEpr6MZXBvYd2uB4HPwwGdip+JWXQ4kqaY5+d18Ytt2Q5Pj/r24vV9DhwcISePHva+sZAi+96cL1I6m25oKA2SZcC2ZCNVkZVAZqcyS+ZF7BVKjeiwzMjPctFNo3WZVSlPLVs95c2wtXy+sPa7X61rWvQRHep4aXKIsYpTmk8/Lxuf3TJrWuliWfvlnTGDE1xpMcpy9SeFZZByEnMfvJbBEGlxw+xK4G2jjenuajh9GTBrxBPwuPflmvOnx2z97N6s232rt4vL1IHNGTWqdjgjWCHAg2Vf4w0OHSFJhJBXYeMKFkDsc1Uz+RXEi00p1KCoNlySPZ85dNNyu2xtM7ID7Ci
 TLiTGtBi
 1mQmTkFHndqSl6eryCRaQTKMcXHVVeDJr/C9+LWXh0S7nt0pVJ2kSGAn3So4Xp6j31xbMtSXX+iUMPmGDF8iWfZR6dQJpJfeWXoXlCn0FuaNzpH514asijO56FRXnGFq8nNUhU02jM+t6DHoex1XYZlyqAROG2pLgRzyk3JgsWF7hDtZtOOUJSKnIxl/M1ZfgE6GIQ59Q6HW46VQ7JXpq7FW8T9+aNmL61LPiBQFaZGESx5ZgmXeH+iNtf1vdT+HXMxP4090N9dS5fqWCnGGuTZjvzvX7N2mV6r7juCvD0rn6bH5VP3YLXSbTgjWHC/SOHazepXuTi/kEVwqHzfIEzM6mIOSMrIFXq6I9iJqE7FsrHVYovql/rQf7phZEkwqnC5IE4blPRn7GwYMicreeEwr1ZG+pRh/Vw9ANNK3mvOgzHDA=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Sat, Oct 19, 2024 at 1:30=E2=80=AFAM Yunsheng Lin <yunshenglin0825@gmail=
.com> wrote:
>
> On 10/19/2024 1:26 AM, Alexander Duyck wrote:
>
> ...
>
> >> +static inline void *__page_frag_alloc_align(struct page_frag_cache *n=
c,
> >> +                                           unsigned int fragsz, gfp_t=
 gfp_mask,
> >> +                                           unsigned int align_mask)
> >> +{
> >> +       struct page_frag page_frag;
> >> +       void *va;
> >> +
> >> +       va =3D __page_frag_cache_prepare(nc, fragsz, &page_frag, gfp_m=
ask,
> >> +                                      align_mask);
> >> +       if (unlikely(!va))
> >> +               return NULL;
> >> +
> >> +       __page_frag_cache_commit(nc, &page_frag, fragsz);
> >
> > Minor nit here. Rather than if (!va) return I think it might be better
> > to just go with if (likely(va)) __page_frag_cache_commit.
>
> Ack.
>
> >
> >> +
> >> +       return va;
> >> +}
> >>
> >>   static inline void *page_frag_alloc_align(struct page_frag_cache *nc=
,
> >>                                            unsigned int fragsz, gfp_t =
gfp_mask,
> >> diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
> >> index a36fd09bf275..a852523bc8ca 100644
> >> --- a/mm/page_frag_cache.c
> >> +++ b/mm/page_frag_cache.c
> >> @@ -90,9 +90,31 @@ void __page_frag_cache_drain(struct page *page, uns=
igned int count)
> >>   }
> >>   EXPORT_SYMBOL(__page_frag_cache_drain);
> >>
> >> -void *__page_frag_alloc_align(struct page_frag_cache *nc,
> >> -                             unsigned int fragsz, gfp_t gfp_mask,
> >> -                             unsigned int align_mask)
> >> +unsigned int __page_frag_cache_commit_noref(struct page_frag_cache *n=
c,
> >> +                                           struct page_frag *pfrag,
> >> +                                           unsigned int used_sz)
> >> +{
> >> +       unsigned int orig_offset;
> >> +
> >> +       VM_BUG_ON(used_sz > pfrag->size);
> >> +       VM_BUG_ON(pfrag->page !=3D encoded_page_decode_page(nc->encode=
d_page));
> >> +       VM_BUG_ON(pfrag->offset + pfrag->size >
> >> +                 (PAGE_SIZE << encoded_page_decode_order(nc->encoded_=
page)));
> >> +
> >> +       /* pfrag->offset might be bigger than the nc->offset due to al=
ignment */
> >> +       VM_BUG_ON(nc->offset > pfrag->offset);
> >> +
> >> +       orig_offset =3D nc->offset;
> >> +       nc->offset =3D pfrag->offset + used_sz;
> >> +
> >> +       /* Return true size back to caller considering the offset alig=
nment */
> >> +       return nc->offset - orig_offset;
> >> +}
> >> +EXPORT_SYMBOL(__page_frag_cache_commit_noref);
> >> +
> >
> > I have a question. How often is it that we are committing versus just
> > dropping the fragment? It seems like this approach is designed around
> > optimizing for not commiting the page as we are having to take an
> > extra function call to commit the change every time. Would it make
> > more sense to have an abort versus a commit?
>
> Before this patch, page_frag_alloc() related API seems to be mostly used
> for skb data or frag for rx part, see napi_alloc_skb() or some drivers
> like e1000, but with more drivers using the page_pool for skb rx frag,
> it seems skb data for tx is the main usecase.
>
> And the prepare and commit API added in the patchset seems to be mainly
> used for skb frag for tx part except af_packet.
>
> It seems it is not very clear which is mostly used one, mostly likely
> the prepare and commit API might be the mostly used one if I have to
> guess as there might be more memory needed for skb frag than skb data.

Well one of the things I am noticing is that you have essentially two
API setups in the later patches.

In one you are calling the page_frag_alloc_align and then later
calling an abort function that is added later. In the other you have
the probe/commit approach. In my mind it might make sense to think
about breaking those up to be handled as two seperate APIs rather than
trying to replace everything all at once.

> >
> >> +void *__page_frag_cache_prepare(struct page_frag_cache *nc, unsigned =
int fragsz,
> >> +                               struct page_frag *pfrag, gfp_t gfp_mas=
k,
> >> +                               unsigned int align_mask)
> >>   {
> >>          unsigned long encoded_page =3D nc->encoded_page;
> >>          unsigned int size, offset;
> >> @@ -114,6 +136,8 @@ void *__page_frag_alloc_align(struct page_frag_cac=
he *nc,
> >>                  /* reset page count bias and offset to start of new f=
rag */
> >>                  nc->pagecnt_bias =3D PAGE_FRAG_CACHE_MAX_SIZE + 1;
> >>                  nc->offset =3D 0;
> >> +       } else {
> >> +               page =3D encoded_page_decode_page(encoded_page);
> >>          }
> >>
> >>          size =3D PAGE_SIZE << encoded_page_decode_order(encoded_page)=
;
> >
> > This makes no sense to me. Seems like there are scenarios where you
> > are grabbing the page even if you aren't going to use it? Why?
> >
> > I think you would be better off just waiting to the end and then
> > fetching it instead of trying to grab it and potentially throw it away
> > if there is no space left in the page. Otherwise what you might do is
> > something along the lines of:
> > pfrag->page =3D page ? : encoded_page_decode_page(encoded_page);
>
> But doesn't that mean an additional checking is needed to decide if we
> need to grab the page?
>
> But the './scripts/bloat-o-meter' does show some binary size shrink
> using the above.

You are probably correct on this one. I think your approach may be
better. I think the only case my approach would be optimizing for
would probably be the size > 4K which isn't appropriate anyway.

> >
> >
> >> @@ -132,8 +156,6 @@ void *__page_frag_alloc_align(struct page_frag_cac=
he *nc,
> >>                          return NULL;
> >>                  }
> >>
> >> -               page =3D encoded_page_decode_page(encoded_page);
> >> -
> >>                  if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
> >>                          goto refill;
> >>
> >> @@ -148,15 +170,17 @@ void *__page_frag_alloc_align(struct page_frag_c=
ache *nc,
> >>
> >>                  /* reset page count bias and offset to start of new f=
rag */
> >>                  nc->pagecnt_bias =3D PAGE_FRAG_CACHE_MAX_SIZE + 1;
> >> +               nc->offset =3D 0;
> >>                  offset =3D 0;
> >>          }
> >>
> >> -       nc->pagecnt_bias--;
> >> -       nc->offset =3D offset + fragsz;
> >> +       pfrag->page =3D page;
> >> +       pfrag->offset =3D offset;
> >> +       pfrag->size =3D size - offset;
> >
> > I really think we should still be moving the nc->offset forward at
> > least with each allocation. It seems like you end up doing two flavors
> > of commit, one with and one without the decrement of the bias. So I
> > would be okay with that being pulled out into some separate logic to
> > avoid the extra increment in the case of merging the pages. However in
> > both cases you need to move the offset, so I would recommend keeping
> > that bit there as it would allow us to essentially call this multiple
> > times without having to do a commit in between to keep the offset
> > correct. With that your commit logic only has to verify nothing
> > changes out from underneath us and then update the pagecnt_bias if
> > needed.
>
> The problem is that we don't really know how much the nc->offset
> need to be moved forward to and the caller needs the original offset
> for skb_fill_page_desc() related calling when prepare API is used as
> an example in 'Preparation & committing API' section of patch 13:

The thing is you really have 2 different APIs. You have one you were
doing which was a alloc/abort approach and another that is a
probe/commit approach. I think for the probe/commit you could probably
get away with using an "alloc" type approach with a size of 0 which
would correctly set the start of your offset and then you would need
to update it later once you know the total size for your commit. For
the probe/commit we could use the nc->offset as a kind of cookie to
verify we are working with the expected page and offset.

For the alloc/abort it would be something similar but more the
reverse. With that one we would need to have the size + offset and
then verify the current offset is equal to that before we allow
reverting the previous nc->offset update. The current patch set is a
bit too permissive on the abort in my opinion and should be verifying
that we are updating the correct offset.