Re: [PATCH net-next v13 11/14] mm: page_frag: introduce prepare/probe/commit API

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Yunsheng Lin <linyunsheng@huawei.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: <davem@davemloft.net>, <kuba@kernel.org>, <pabeni@redhat.com>,
	<netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>, <linux-mm@kvack.org>
Subject: Re: [PATCH net-next v13 11/14] mm: page_frag: introduce prepare/probe/commit API
Date: Tue, 20 Aug 2024 21:08:04 +0800	[thread overview]
Message-ID: <37fc4b01-43f1-4a2c-af35-96cf3f7fe3d5@huawei.com> (raw)
In-Reply-To: <CAKgT0Ucz4R=xOCWgauDO_i6PX7=hgiohXngo2Mea5R8GC_s2qQ@mail.gmail.com>

On 2024/8/19 23:52, Alexander Duyck wrote:

>>
>> Yes, the expectation is that somebody else didn't take an access to the
>> page/data to send it off somewhere else between page_frag_alloc_va()
>> and page_frag_alloc_abort(), did you see expectation was broken in that
>> patch? If yes, we should fix that by using page_frag_free_va() related
>> API instead of using page_frag_alloc_abort().
> 
> The problem is when you expose it to XDP there are a number of
> different paths it can take. As such you shouldn't be expecting XDP to
> not do something like that. Basically you have to check the reference

Even if XDP operations like xdp_do_redirect() or tun_xdp_xmit() return
failure, we still can not do that? It seems odd that happens.
If not, can we use page_frag_alloc_abort() with fragsz being zero to avoid
atomic operation?

> count before you can rewind the page.
> 
>>>
>>>
>>>>
>>>>>> +static struct page *__page_frag_cache_reload(struct page_frag_cache *nc,
>>>>>> +                                         gfp_t gfp_mask)
>>>>>>  {
>>>>>> +    struct page *page;
>>>>>> +
>>>>>>      if (likely(nc->encoded_va)) {
>>>>>> -            if (__page_frag_cache_reuse(nc->encoded_va, nc->pagecnt_bias))
>>>>>> +            page = __page_frag_cache_reuse(nc->encoded_va, nc->pagecnt_bias);
>>>>>> +            if (page)
>>>>>>                      goto out;
>>>>>>      }
>>>>>>
>>>>>> -    if (unlikely(!__page_frag_cache_refill(nc, gfp_mask)))
>>>>>> -            return false;
>>>>>> +    page = __page_frag_cache_refill(nc, gfp_mask);
>>>>>> +    if (unlikely(!page))
>>>>>> +            return NULL;
>>>>>>
>>>>>>  out:
>>>>>>      /* reset page count bias and remaining to start of new frag */
>>>>>>      nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
>>>>>>      nc->remaining = page_frag_cache_page_size(nc->encoded_va);
>>>>>> -    return true;
>>>>>> +    return page;
>>>>>> +}
>>>>>> +
>>>>>
>>>>> None of the functions above need to be returning page.
>>>>
>>>> Are you still suggesting to always use virt_to_page() even when it is
>>>> not really necessary? why not return the page here to avoid the
>>>> virt_to_page()?
>>>
>>> Yes. The likelihood of you needing to pass this out as a page should
>>> be low as most cases will just be you using the virtual address
>>> anyway. You are essentially trading off branching for not having to
>>> use virt_to_page. It is unnecessary optimization.
>>
>> As my understanding, I am not trading off branching for not having to
>> use virt_to_page, the branching is already needed no matter we utilize
>> it to avoid calling virt_to_page() or not, please be more specific about
>> which branching is traded off for not having to use virt_to_page() here.
> 
> The virt_to_page overhead isn't that high. It would be better to just
> use a consistent path rather than try to optimize for an unlikely
> branch in your datapath.

I am not sure if I understand what do you mean by 'consistent path' here.
If I understand your comment correctly, the path is already not consistent
to avoid having to fetch size multiple times multiple ways as mentioned in
[1]. As below, doesn't it seems nature to avoid virt_to_page() calling while
avoiding page_frag_cache_page_size() calling, even if it is an unlikely case
as you mentioned:

struct page *page_frag_alloc_pg(struct page_frag_cache *nc,
                                unsigned int *offset, unsigned int fragsz,
                                gfp_t gfp)
{
        unsigned int remaining = nc->remaining;
        struct page *page;

        VM_BUG_ON(!fragsz);
        if (likely(remaining >= fragsz)) {
                unsigned long encoded_va = nc->encoded_va;

                *offset = page_frag_cache_page_size(encoded_va) -
                                remaining;

                return virt_to_page((void *)encoded_va);
        }

        if (unlikely(fragsz > PAGE_SIZE))
                return NULL;

        page = __page_frag_cache_reload(nc, gfp);
        if (unlikely(!page))
                return NULL;

        *offset = 0;
        nc->remaining -= fragsz;
        nc->pagecnt_bias--;

        return page;
}

1. https://lore.kernel.org/all/CAKgT0UeQ9gwYo7qttak0UgXC9+kunO2gedm_yjtPiMk4VJp9yQ@mail.gmail.com/

> 
>>>
>>>
>>>>
>>>>>> +struct page *page_frag_alloc_pg(struct page_frag_cache *nc,
>>>>>> +                            unsigned int *offset, unsigned int fragsz,
>>>>>> +                            gfp_t gfp)
>>>>>> +{
>>>>>> +    unsigned int remaining = nc->remaining;
>>>>>> +    struct page *page;
>>>>>> +
>>>>>> +    VM_BUG_ON(!fragsz);
>>>>>> +    if (likely(remaining >= fragsz)) {
>>>>>> +            unsigned long encoded_va = nc->encoded_va;
>>>>>> +
>>>>>> +            *offset = page_frag_cache_page_size(encoded_va) -
>>>>>> +                            remaining;
>>>>>> +
>>>>>> +            return virt_to_page((void *)encoded_va);
>>>>>> +    }
>>>>>> +
>>>>>> +    if (unlikely(fragsz > PAGE_SIZE))
>>>>>> +            return NULL;
>>>>>> +
>>>>>> +    page = __page_frag_cache_reload(nc, gfp);
>>>>>> +    if (unlikely(!page))
>>>>>> +            return NULL;
>>>>>> +
>>>>>> +    *offset = 0;
>>>>>> +    nc->remaining = remaining - fragsz;
>>>>>> +    nc->pagecnt_bias--;
>>>>>> +
>>>>>> +    return page;
>>>>>>  }
>>>>>> +EXPORT_SYMBOL(page_frag_alloc_pg);
>>>>>
>>>>> Again, this isn't returning a page. It is essentially returning a
>>>>> bio_vec without calling it as such. You might as well pass the bio_vec
>>>>> pointer as an argument and just have it populate it directly.
>>>>
>>>> I really don't think your bio_vec suggestion make much sense  for now as
>>>> the reason mentioned in below:
>>>>
>>>> "Through a quick look, there seems to be at least three structs which have
>>>> similar values: struct bio_vec & struct skb_frag & struct page_frag.
>>>>
>>>> As your above agrument about using bio_vec, it seems it is ok to use any
>>>> one of them as each one of them seems to have almost all the values we
>>>> are using?
>>>>
>>>> Personally, my preference over them: 'struct page_frag' > 'struct skb_frag'
>>>>> 'struct bio_vec', as the naming of 'struct page_frag' seems to best match
>>>> the page_frag API, 'struct skb_frag' is the second preference because we
>>>> mostly need to fill skb frag anyway, and 'struct bio_vec' is the last
>>>> preference because it just happen to have almost all the values needed.
>>>
>>> That is why I said I would be okay with us passing page_frag in patch
>>> 12 after looking closer at the code. The fact is it should make the
>>> review of that patch set much easier if you essentially just pass the
>>> page_frag back out of the call. Then it could be used in exactly the
>>> same way it was before and should reduce the total number of lines of
>>> code that need to be changed.
>>
>> So the your suggestion changed to something like below?
>>
>> int page_frag_alloc_pfrag(struct page_frag_cache *nc, struct page_frag *pfrag);
>>
>> The API naming of 'page_frag_alloc_pfrag' seems a little odd to me, any better
>> one in your mind?
> 
> Well at this point we are populating/getting/pulling a page frag from
> the page frag cache. Maybe look for a word other than alloc such as
> populate. Essentially what you are doing is populating the pfrag from
> the frag cache, although I thought there was a size value you passed
> for that isn't there?

'struct page_frag' does have a size field, but I am not sure if I
understand what do you mean by  'although I thought there was a size
value you passed for that isn't there?‘ yet.

     prev parent reply	other threads:[~2024-08-20 13:08 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20240808123714.462740-1-linyunsheng@huawei.com>
     [not found] ` <20240808123714.462740-2-linyunsheng@huawei.com>
2024-08-09 11:08   ` [PATCH net-next v13 01/14] mm: page_frag: add a test module for page_frag Muhammad Usama Anjum
2024-08-09 12:29     ` Yunsheng Lin
     [not found] ` <20240808123714.462740-3-linyunsheng@huawei.com>
2024-08-14 15:33   ` [PATCH net-next v13 02/14] mm: move the page fragment allocator from page_alloc into its own file Alexander H Duyck
2024-08-14 20:22   ` Andrew Morton
     [not found] ` <20240808123714.462740-5-linyunsheng@huawei.com>
2024-08-14 15:49   ` [PATCH net-next v13 04/14] mm: page_frag: add '_va' suffix to page_frag API Alexander H Duyck
2024-08-15  2:59     ` Yunsheng Lin
2024-08-15 15:00       ` Alexander Duyck
2024-08-16 11:55         ` Yunsheng Lin
2024-08-19 15:54           ` Alexander Duyck
2024-08-20 13:07             ` Yunsheng Lin
2024-08-20 16:02               ` Alexander Duyck
2024-08-21 12:30                 ` Yunsheng Lin
     [not found] ` <20240808123714.462740-8-linyunsheng@huawei.com>
2024-08-14 16:13   ` [PATCH net-next v13 07/14] mm: page_frag: reuse existing space for 'size' and 'pfmemalloc' Alexander H Duyck
2024-08-15  3:10     ` Yunsheng Lin
2024-08-15 15:03       ` Alexander Duyck
2024-08-16 11:55         ` Yunsheng Lin
2024-08-19 16:00           ` Alexander Duyck
     [not found] ` <20240808123714.462740-9-linyunsheng@huawei.com>
2024-08-14 17:54   ` [PATCH net-next v13 08/14] mm: page_frag: some minor refactoring before adding new API Alexander H Duyck
2024-08-15  3:04     ` Yunsheng Lin
2024-08-15 15:09       ` Alexander Duyck
2024-08-16 11:58         ` Yunsheng Lin
     [not found] ` <20240808123714.462740-12-linyunsheng@huawei.com>
2024-08-14 21:00   ` [PATCH net-next v13 11/14] mm: page_frag: introduce prepare/probe/commit API Alexander H Duyck
2024-08-15  3:05     ` Yunsheng Lin
2024-08-15 15:25       ` Alexander Duyck
2024-08-16 12:01         ` Yunsheng Lin
2024-08-19 15:52           ` Alexander Duyck
2024-08-20 13:08             ` Yunsheng Lin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=37fc4b01-43f1-4a2c-af35-96cf3f7fe3d5@huawei.com \
    --to=linyunsheng@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=davem@davemloft.net \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox