From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE7A7C433B4 for ; Thu, 6 May 2021 12:58:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 16550610E9 for ; Thu, 6 May 2021 12:58:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 16550610E9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7D0FA6B006C; Thu, 6 May 2021 08:58:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 780796B006E; Thu, 6 May 2021 08:58:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5D42C6B0070; Thu, 6 May 2021 08:58:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0219.hostedemail.com [216.40.44.219]) by kanga.kvack.org (Postfix) with ESMTP id 422A56B006C for ; Thu, 6 May 2021 08:58:22 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id E508E181AEF1D for ; Thu, 6 May 2021 12:58:21 +0000 (UTC) X-FDA: 78110809602.15.77B31B1 Received: from mail-wr1-f52.google.com (mail-wr1-f52.google.com [209.85.221.52]) by imf10.hostedemail.com (Postfix) with ESMTP id BC80640002F0 for ; Thu, 6 May 2021 12:58:07 +0000 (UTC) Received: by mail-wr1-f52.google.com with SMTP id t18so5517341wry.1 for ; Thu, 06 May 2021 05:58:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=fa7ACIhZyoerlvMI7YYqAPObnFDdQHVVAOFKQxST/+c=; b=ZDliHnCPrXjG8/kRhZWx/9CDOguqLFd3w23uhVzyHIXVaEo7R55rdgns9b/PQGv+qt /F9sk1EmvR1PpKNn3vlc1KcEUCaUaTj9WPbrOrYtu16fDdAkBnLUfmV5f7qTQ80Mw/qF lPStC8I9512h+QwnZMas1eRB2ODQAJahlH8a5LFjsNw/wdkFvcZreg0ubY0J5yVwwoNI AQbU3BDLlojgVhVhcCTagK8gSkE/GuNIWtVqv+BkyvC2Ydqszjmnz7BJkdzKdx4mrh/A TM4vP+kOaT5qtJ/sx7bdCMVCd7Rdkrx2nUVVEqU0KF2e3p0ODFr9KstwndrszpfVeHDr XDHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=fa7ACIhZyoerlvMI7YYqAPObnFDdQHVVAOFKQxST/+c=; b=Q/n6kgNhYDNV0+gbyT2ztQTGZDUrIpG5wffnWtoOu05trzbC0zwJtsbqYOLeA29kOW NRuE9fFvokZb2xZiUMNTjdb0YisgT3IXNbfT1irIaigHhQmzcZp6Z96Q1TT6MQ2RCL3j hAdb1P2vHUNfaKeVoDRkHvl/5zZZsb9wLpquixV6khoiofEscY8x9rmK/6hZnxOnXK8d gZDy3DsYjDdYrobDv/JrBC5vTSZ8K8e9j2FvR2dvqkt8DSi3MQoqITwDQoZ9ELYhLMVv qLeS1FWzzKE5apqqUUPcFLZ2bnmNoNM06cRcvkGcow7aBWCwHw3fTl8GWAQJfTB5dQAQ C9XA== X-Gm-Message-State: AOAM530bA3Turc+12mkua/06UPKBHRPR2CPfI9avRjtmJyaopW17xhhy TWJ5xPY4HkTNfKxEGON9XFi++Q== X-Google-Smtp-Source: ABdhPJxEd33XnpeA5P7B7jQyn++vibr+Kt84ZOWxy2tzd0GqUIDpW/0LgkdMGeDF3ihInoQLDiGRXA== X-Received: by 2002:a5d:678d:: with SMTP id v13mr4889409wru.85.1620305900254; Thu, 06 May 2021 05:58:20 -0700 (PDT) Received: from apalos.home ([94.69.77.156]) by smtp.gmail.com with ESMTPSA id b6sm9299994wmj.2.2021.05.06.05.58.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 May 2021 05:58:19 -0700 (PDT) Date: Thu, 6 May 2021 15:58:14 +0300 From: Ilias Apalodimas To: Yunsheng Lin Cc: Matteo Croce , netdev@vger.kernel.org, linux-mm@kvack.org, Ayush Sawal , Vinay Kumar Yadav , Rohit Maheshwari , "David S. Miller" , Jakub Kicinski , Thomas Petazzoni , Marcin Wojtas , Russell King , Mirko Lindner , Stephen Hemminger , Tariq Toukan , Jesper Dangaard Brouer , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Boris Pismenny , Arnd Bergmann , Andrew Morton , "Peter Zijlstra (Intel)" , Vlastimil Babka , Yu Zhao , Will Deacon , Michel Lespinasse , Fenghua Yu , Roman Gushchin , Hugh Dickins , Peter Xu , Jason Gunthorpe , Guoqing Jiang , Jonathan Lemon , Alexander Lobakin , Cong Wang , wenxu , Kevin Hao , Aleksandr Nogikh , Jakub Sitnicki , Marco Elver , Willem de Bruijn , Miaohe Lin , Guillaume Nault , linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, bpf@vger.kernel.org, Matthew Wilcox , Eric Dumazet , David Ahern , Lorenzo Bianconi , Saeed Mahameed , Andrew Lunn , Paolo Abeni Subject: Re: [PATCH net-next v3 0/5] page_pool: recycle buffers Message-ID: References: <20210409223801.104657-1-mcroce@linux.microsoft.com> <9bf7c5b3-c3cf-e669-051f-247aa8df5c5a@huawei.com> <33b02220-cc50-f6b2-c436-f4ec041d6bc4@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <33b02220-cc50-f6b2-c436-f4ec041d6bc4@huawei.com> Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=linaro.org header.s=google header.b=ZDliHnCP; dmarc=pass (policy=none) header.from=linaro.org; spf=pass (imf10.hostedemail.com: domain of ilias.apalodimas@linaro.org designates 209.85.221.52 as permitted sender) smtp.mailfrom=ilias.apalodimas@linaro.org X-Rspamd-Server: rspam03 X-Stat-Signature: d1y3xswkqi3j9s8huwopmzbfgxnoxm1k X-Rspamd-Queue-Id: BC80640002F0 Received-SPF: none (linaro.org>: No applicable sender policy available) receiver=imf10; identity=mailfrom; envelope-from=""; helo=mail-wr1-f52.google.com; client-ip=209.85.221.52 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1620305887-488743 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, May 06, 2021 at 08:34:48PM +0800, Yunsheng Lin wrote: > On 2021/5/1 0:24, Ilias Apalodimas wrote: > > [...] > >>>> > >>>> 1. skb frag page recycling do not need "struct xdp_rxq_info" or > >>>> "struct xdp_mem_info" to bond the relation between "struct page" and > >>>> "struct page_pool", which seems uncessary at this point if bonding > >>>> a "struct page_pool" pointer directly in "struct page" does not cause > >>>> space increasing. > >>> > >>> We can't do that. The reason we need those structs is that we rely on the > >>> existing XDP code, which already recycles it's buffers, to enable > >>> recycling. Since we allocate a page per packet when using page_pool for a > >>> driver , the same ideas apply to an SKB and XDP frame. We just recycle the > >> > >> I am not really familar with XDP here, but a packet from hw is either a > >> "struct xdp_frame/xdp_buff" for XDP or a "struct sk_buff" for TCP/IP stack, > >> a packet can not be both "struct xdp_frame/xdp_buff" and "struct sk_buff" at > >> the same time, right? > >> > > > > Yes, but the payload is irrelevant in both cases and that's what we use > > page_pool for. You can't use this patchset unless your driver usues > > build_skb(). So in both cases you just allocate memory for the payload and > > I am not sure I understood why build_skb() matters here. If the head data of > a skb is a page frag and is from page pool, then it's page->signature should be > PP_SIGNATURE, otherwise it's page->signature is zero, so a recyclable skb does > not require it's head data being from a page pool, right? > Correct, and that's the big improvement compared to the original RFC. The wording was a bit off in my initial response. I was trying to point out you can recycle *any* buffer coming from page_pool and one of the ways you can do that in your driver, is use build_skb() while the payload is allocated by page_pool. > > decide what the wrap the buffer with (XDP or SKB) later. > > [...] > > >> > >> I am not sure I understand what you meant by "free the skb", does it mean > >> that kfree_skb() is called to free the skb. > > > > Yes > > > >> > >> As my understanding, if the skb completely own the page(which means page_count() > >> == 1) when kfree_skb() is called, __page_pool_put_page() is called, otherwise > >> page_ref_dec() is called, which is exactly what page_pool_atomic_sub_if_positive() > >> try to handle it atomically. > >> > > > > Not really, the opposite is happening here. If the pp_recycle bit is set we > > will always call page_pool_return_skb_page(). If the page signature matches > > the 'magic' set by page pool we will always call xdp_return_skb_frame() will > > end up calling __page_pool_put_page(). If the refcnt is 1 we'll try > > to recycle the page. If it's not we'll release it from page_pool (releasing > > some internal references we keep) unmap the buffer and decrement the refcnt. > > Yes, I understood the above is what the page pool do now. > > But the question is who is still holding an extral reference to the page when > kfree_skb()? Perhaps a cloned and pskb_expand_head()'ed skb is holding an extral > reference to the same page? So why not just do a page_ref_dec() if the orginal skb > is freed first, and call __page_pool_put_page() when the cloned skb is freed later? > So that we can always reuse the recyclable page from a recyclable skb. This may > make the page_pool_destroy() process delays longer than before, I am supposed the > page_pool_destroy() delaying for cloned skb case does not really matters here. > > If the above works, I think the samiliar handling can be added to RX zerocopy if > the RX zerocopy also hold extral references to the recyclable page from a recyclable > skb too? > Right, this sounds doable, but I'll have to go back code it and see if it really makes sense. However I'd still prefer the support to go in as-is (including the struct xdp_mem_info in struct page, instead of a page_pool pointer). There's a couple of reasons for that. If we keep the struct xdp_mem_info we can in the future recycle different kind of buffers using __xdp_return(). And this is a non intrusive change if we choose to store the page pool address directly in the future. It just affects the internal contract between the page_pool code and struct page. So it won't affect any drivers that already use the feature. Regarding the page_ref_dec(), which as I said sounds doable, I'd prefer playing it safe for now and getting rid of the buffers that somehow ended up holding an extra reference. Once this gets approved we can go back and try to save the extra space. I hope I am not wrong but the changes required to support a few extra refcounts should not change the current patches much. Thanks for taking the time on this! /Ilias > > > > [1] https://lore.kernel.org/netdev/154413868810.21735.572808840657728172.stgit@firesoul/ > > > > Cheers > > /Ilias > > > > . > > >