From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id DD1D9C04FFE
	for <linux-mm@archiver.kernel.org>; Mon, 29 Apr 2024 14:49:49 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 5C1FF6B009D; Mon, 29 Apr 2024 10:49:49 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 571776B009E; Mon, 29 Apr 2024 10:49:49 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 45F736B009F; Mon, 29 Apr 2024 10:49:49 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14])
	by kanga.kvack.org (Postfix) with ESMTP id 277A76B009D
	for <linux-mm@kvack.org>; Mon, 29 Apr 2024 10:49:49 -0400 (EDT)
Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay01.hostedemail.com (Postfix) with ESMTP id C126A1C16CF
	for <linux-mm@kvack.org>; Mon, 29 Apr 2024 14:49:48 +0000 (UTC)
X-FDA: 82062853656.23.F74C9D8
Received: from mail-wr1-f46.google.com (mail-wr1-f46.google.com [209.85.221.46])
	by imf07.hostedemail.com (Postfix) with ESMTP id E65C640013
	for <linux-mm@kvack.org>; Mon, 29 Apr 2024 14:49:46 +0000 (UTC)
Authentication-Results: imf07.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=P1HZqpZK;
	spf=pass (imf07.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.221.46 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1714402187;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=TfyRjOuSPcP/8vf1MVxXT/xq8HK5zEuNahOzWkgZ0UE=;
	b=8EZKts8bVtdgRjOzQVdiY28AYg6FLlRISFMild0H1TDA10PEYT1a5is85bjaWYLj6oXhKh
	DqFdct4JTFJXKZ9e4pzuv8Ul9RTaNoEoYnvKQXfFtAd+6hx7HsfOhxh+fiV1wzAhodhJ0m
	KMoEXHpQ/6kEKxDp0DSErklhD1ICEiU=
ARC-Authentication-Results: i=1;
	imf07.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=P1HZqpZK;
	spf=pass (imf07.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.221.46 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714402187; a=rsa-sha256;
	cv=none;
	b=hNheUAfMfn6dCnRBm9I50p9Ag0ufxlmkm1mW0huJMz128kBEE8iZuFuOhhFxJCtSIiDnXK
	Zxtu8WPoptexx2qkCErBGQQ+5uVJOuEU01klrPAZvpt12E2zYontpSMMaU1iNaWZnlPyfS
	XyrPseqG+aqdgvoinF9QftVaUWMOinI=
Received: by mail-wr1-f46.google.com with SMTP id ffacd0b85a97d-343e46ec237so3882248f8f.2
        for <linux-mm@kvack.org>; Mon, 29 Apr 2024 07:49:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1714402185; x=1715006985; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=TfyRjOuSPcP/8vf1MVxXT/xq8HK5zEuNahOzWkgZ0UE=;
        b=P1HZqpZK8vqVpfDXPMzqbcSnNP6Jd6WZddz6TtS1cQ4u8JeX6NjbQxolN2uidZk5Df
         BkIYI6W7YP2RUE5a+uAM+xJCERP/Xwo95FgYBbqvVuAK9wja+xtqm0xvCOAPRm3X7P08
         fz/81s2VeixDp+M0pI2FbzXx96H72ZLOyF5RNBh3VW0c0hJLKW+VY3p8MUwdkY1VzXSL
         OCVIGfVN220my+MBH7dl/cccU4FwwdJU6vdlWeIwpA8mbN7WiwvOj6WPUp8+NmKZ5gG8
         ue2SkKzh6keOq3Sad+gt0jyi79SpL1Hb/yS4DLfHXxzyUrgXGsaatztXjyw0j0KTMzas
         uNCw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1714402185; x=1715006985;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=TfyRjOuSPcP/8vf1MVxXT/xq8HK5zEuNahOzWkgZ0UE=;
        b=gLGhO8/4Y2fc9rjcKZMR+qrhMHIkNL1updPDTFLK4G47/GbTsMLySpixbh77zpU7yj
         q7x5qa/pYCbDbR/trxPBfTBcImJGW/tLwJOcRo4hQ1501rUFHkAhASHgHlsU7YnU/fOa
         aC3h2YyQNG6Qfj4W4YQVMPVcYBHAEkUY7Z7tsCU3qQfYki1DSoc+hFgtxUjy0lgHo9ZZ
         ljjXn21PztlflZCYAfKDgHsbW/liEPr5vShDgd8wOVg05vfRce4n91UX3q4h/6HWqWfG
         HxM16XwxgZTth7nLYW4SSLbhnUtYxwphLlTmPWYJmGk21V72IXGtfXWaw35ACYqOZjEw
         XIsg==
X-Forwarded-Encrypted: i=1; AJvYcCX8ji1/9Fh2mw/LzbnZy/VlWLh3O70ubJAYPxNXkHnK8t4q9YwNE1707jNQjT3cNpjdg6UIWEj+71eF1HOORtTqPgY=
X-Gm-Message-State: AOJu0YylL6ryrxekyRJoHfFUxiC+7ltMiEXcNZJq4YJ93Tagq/ndJ3Y/
	ZitrK+jO/7LQMaWLW6raEVJLoXlpAZ1dR26k65jeLsCKapBZnz9CrQfjM7KomYcozCgixrMysrr
	O11EmEKAcpR1u884IuLLH/q/qjKs=
X-Google-Smtp-Source: AGHT+IHN6ERBx3BZ2D7GhtOhaTlbSbZAY8hX8akybIxOVIxZINp7QT5r+1i/VyK3jejhfRJu2FkKq+RQtBYoZpfzVdM=
X-Received: by 2002:adf:e111:0:b0:345:ca71:5ddb with SMTP id
 t17-20020adfe111000000b00345ca715ddbmr4826030wrz.66.1714402185017; Mon, 29
 Apr 2024 07:49:45 -0700 (PDT)
MIME-Version: 1.0
References: <20240415131941.51153-1-linyunsheng@huawei.com>
 <20240415131941.51153-10-linyunsheng@huawei.com> <37d012438d4850c3d7090e784e09088d02a2780c.camel@gmail.com>
 <8b7361c2-6f45-72e8-5aca-92e8a41a7e5e@huawei.com> <17066b6a4f941eea3ef567767450b311096da22b.camel@gmail.com>
 <c45fdd75-44be-82a6-8e47-42bbc5ee4795@huawei.com> <efd21f1d-8c67-b060-5ad2-0d500fac2ba6@huawei.com>
In-Reply-To: <efd21f1d-8c67-b060-5ad2-0d500fac2ba6@huawei.com>
From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Mon, 29 Apr 2024 07:49:08 -0700
Message-ID: <CAKgT0UfQWEkaWM_mfk=FhCErTL_ZS3RL6x3iMzPdEP3FD+9zZQ@mail.gmail.com>
Subject: Re: [PATCH net-next v2 09/15] mm: page_frag: reuse MSB of 'size'
 field for pfmemalloc
To: Yunsheng Lin <linyunsheng@huawei.com>
Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, 
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org, 
	Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Stat-Signature: jfy6o5c88az55jxhn3hdbcippj3rnz11
X-Rspamd-Queue-Id: E65C640013
X-Rspamd-Server: rspam10
X-Rspam-User: 
X-HE-Tag: 1714402186-903955
X-HE-Meta: U2FsdGVkX185o5WCFCukRI5EkfIPFII5fbXAuUbFTCM+mytP9qwPSlEMqLdCVpvxst02CHN7gzyT0EDvriIjY/ifIXwAX9zSxLz3wfa4DstCzlHxvvyFzd9jascIxgMG1GjNQe+MpsX8Hdov1fv7KMqx/aDEJFWt+Zm5/GNNJX4S6x67aWL/T/OmXv3hKpmqxuoLdpRAJzf5/6Mddxrb7m/MpM8J+ZiBug7yh1BPLebkuElQ5M7yLXaWVQPNjcBkLX7UCL9I8P8kiM1st5qTh2MncFxz1K52f1SJk7kzhTLH1XgBrqdxoqg1HYLea2OMsjT3CbHs2lDbYdCEgAhwOM9xtwnqX+gufTOuyj+bl5WtuYD3v8WV/8ZJpJHJ8NJJtU+dcqUAQw5KM4AO+YQZdDSmFe8nAWvJq8JMuyV+q/oIuufLK9QmJMSAY3Bz6LNtj5U6zus+3VM6kHt82rOqFioa2xivY8exAQdOUXwPt1IC/klTMRsEdaDrhN2IAfQlNJ3TFyNHHlzCP32WGdbE/sy5zDCUIqZVM6eq/0egYc0mNPNv4cWH9EBURFPNt1NRYSq7oN8nXqxbaFsVcz3rGTqz+gYVSIXELP8ailqSq0SsZIgQb3OG2K+pJrG+U9tAYT/HmvAx6NfvCMHNIeEnvD8fA96ZBH8lV/d4y1TXwDMe+SnmShOebvtM5ByaGlZmEl4WKZNaYxFtc9RGbK1QP1pB/8ItT1tjQeM4fcvqMm53jsOCGGAbVYnS4WfHhx6LYMN1MdU+EoJGYEu25sJkmm9aX1GZlvzjKoX0no0V3Ld8X7Kncs/OmphV1SRHrqoSq5azjiyEpfWiyRgG5pet7Dk6f6Bd77kacTwd0+rDUlYlFbiPDudscFUDaUKFiz0r6eebfnYj6FRDO3YMCng/eUaq8jjaaHzQzJJ0Q0eaS6QwZdtmrPjoYNK7slQLJaG4wSwT6imr2Tyk53E0316
 aYHoJxUi
 d8KdbZrj01SOzScw5/rMx6lfrYzr7PvSSeMf4pXgE/8+dh+kg84y5vQbP6hNtOXVGPB3jsxX04oVen86qqOfGW3R3pAaINRzn43C1+WS+o4pnQR7GRPTVyCtvohpwas2OdjNxZ2gcdqJCFtG+x6Xdw83cdcBf0TL/i9jGvB2lLzwt8X6YG8r6o+5TFcsI4SY3QkOCvNXVaBNm5Jd9Iwl5qq2wdbckDkE1yknnq0NTpWcBo0yfzoJFFJuY7vjjSjM+slhrlc0FlOYXo77q8ZCDsi5ogVuukg6MKvi9Qh5jlCEboqdvAN05RU3z3gIQeaspc2C9tsSkkjiRBWHcisKw5P/4oVjTuILWRoG1poLKS42X/jomGtEu+4z0PiPGuumr1YN8pTPKKItAlXgno72WtCDBm3C780f9kXr8eUqyXgVz04dOl5bZ7sHw6fEr2ibI+9wa3dPmgctFQQk=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Fri, Apr 26, 2024 at 2:38=E2=80=AFAM Yunsheng Lin <linyunsheng@huawei.co=
m> wrote:
>
> On 2024/4/18 17:39, Yunsheng Lin wrote:
>
> ...
>
> >
> >> combining the pagecnt_bias with the va. I'm wondering if it wouldn't
> >> make more sense to look at putting together the structure something
> >> like:
> >>
> >> #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> >> typedef u16 page_frag_bias_t;
> >> #else
> >> typedef u32 page_frag_bias_t;
> >> #endif
> >>
> >> struct page_frag_cache {
> >>      /* page address and offset */
> >>      void *va;
> >
> > Generally I am agreed with combining the virtual address with the
> > offset for the reason you mentioned below.
> >
> >>      page_frag_bias_t pagecnt_bias;
> >>      u8 pfmemalloc;
> >>      u8 page_frag_order;
> >> }
> >
> > The issue with the 'page_frag_order' I see is that we might need to do
> > a 'PAGE << page_frag_order' to get actual size, and we might also need
> > to do 'size - 1' to get the size_mask if we want to mask out the offset
> > from the 'va'.
> >
> > For page_frag_order, we need to:
> > size =3D PAGE << page_frag_order
> > size_mask =3D size - 1
> >
> > For size_mask, it seem we only need to do:
> > size =3D size_mask + 1
> >
> > And as PAGE_FRAG_CACHE_MAX_SIZE =3D 32K, which can be fitted into 15 bi=
ts
> > if we use size_mask instead of size.
> >
> > Does it make sense to use below, so that we only need to use bitfield
> > for SIZE < PAGE_FRAG_CACHE_MAX_SIZE in 32 bits system? And 'struct
> > page_frag' is using a similar '(BITS_PER_LONG > 32)' checking trick.
> >
> > struct page_frag_cache {
> >       /* page address and offset */
> >       void *va;
> >
> > #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) && (BITS_PER_LONG <=3D 32)
> >       u16 pagecnt_bias;
> >       u16 size_mask:15;
> >       u16 pfmemalloc:1;
> > #else
> >       u32 pagecnt_bias;
> >       u16 size_mask;
> >       u16 pfmemalloc;
> > #endif
> > };
> >
>
> After considering a few different layouts for 'struct page_frag_cache',
> it seems the below is more optimized:
>
> struct page_frag_cache {
>         /* page address & pfmemalloc & order */
>         void *va;

I see. So basically just pack the much smaller bitfields in here.

>
> #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) && (BITS_PER_LONG <=3D 32)
>         u16 pagecnt_bias;
>         u16 size;
> #else
>         u32 pagecnt_bias;
>         u32 size;
> #endif
> }
>
> The lower bits of 'va' is or'ed with the page order & pfmemalloc instead
> of offset or pagecnt_bias, so that we don't have to add more checking
> for handling the problem of not having enough space for offset or
> pagecnt_bias for PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE and 32 bits system.
> And page address & pfmemalloc & order is unchanged for the same page
> in the same 'page_frag_cache' instance, it makes sense to fit them
> together.
>
> Also, it seems it is better to replace 'offset' with 'size', which indica=
tes
> the remaining size for the cache in a 'page_frag_cache' instance, and we
> might be able to do a single 'size >=3D fragsz' checking for the case of =
cache
> being enough, which should be the fast path if we ensure size is zoro whe=
n
> 'va' =3D=3D NULL.

I'm not sure the rename to size is called for as it is going to be
confusing. Maybe something like "remaining"?

> Something like below:
>
> #define PAGE_FRAG_CACHE_ORDER_MASK      GENMASK(1, 0)
> #define PAGE_FRAG_CACHE_PFMEMALLOC_BIT  BIT(2)

The only downside is that it is ossifying things so that we can only
ever do order 3 as the maximum cache size. It might be better to do a
full 8 bytes as on x86 it would just mean accessing the low end of a
16b register. Then you can have pfmemalloc at bit 8.

> struct page_frag_cache {
>         /* page address & pfmemalloc & order */
>         void *va;
>

When you start combining things like this we normally would convert va
to an unsigned long.

> #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) && (BITS_PER_LONG <=3D 32)
>         u16 pagecnt_bias;
>         u16 size;
> #else
>         u32 pagecnt_bias;
>         u32 size;
> #endif
> };
>
>
> static void *__page_frag_cache_refill(struct page_frag_cache *nc,
>                                       unsigned int fragsz, gfp_t gfp_mask=
,
>                                       unsigned int align_mask)
> {
>         gfp_t gfp =3D gfp_mask;
>         struct page *page;
>         void *va;
>
> #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>         /* Ensure free_unref_page() can be used to free the page fragment=
 */
>         BUILD_BUG_ON(PAGE_FRAG_CACHE_MAX_ORDER > PAGE_ALLOC_COSTLY_ORDER)=
;
>
>         gfp_mask =3D (gfp_mask & ~__GFP_DIRECT_RECLAIM) |  __GFP_COMP |
>                    __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC;
>         page =3D alloc_pages_node(NUMA_NO_NODE, gfp_mask,
>                                 PAGE_FRAG_CACHE_MAX_ORDER);
>         if (likely(page)) {
>                 nc->size =3D PAGE_FRAG_CACHE_MAX_SIZE - fragsz;

I wouldn't pass fragsz here. Ideally we keep this from having to get
pulled directly into the allocator and can instead treat this as a
pristine page. We can do the subtraction further down in the actual
frag alloc call.

>                 va =3D page_address(page);
>                 nc->va =3D (void *)((unsigned long)va |
>                                   PAGE_FRAG_CACHE_MAX_ORDER |
>                                   (page_is_pfmemalloc(page) ?
>                                    PAGE_FRAG_CACHE_PFMEMALLOC_BIT : 0));
>                 page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE);
>                 nc->pagecnt_bias =3D PAGE_FRAG_CACHE_MAX_SIZE;
>                 return va;
>         }
> #endif
>         page =3D alloc_pages_node(NUMA_NO_NODE, gfp, 0);
>         if (likely(page)) {
>                 nc->size =3D PAGE_SIZE - fragsz;
>                 va =3D page_address(page);
>                 nc->va =3D (void *)((unsigned long)va |
>                                   (page_is_pfmemalloc(page) ?
>                                    PAGE_FRAG_CACHE_PFMEMALLOC_BIT : 0));
>                 page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE);
>                 nc->pagecnt_bias =3D PAGE_FRAG_CACHE_MAX_SIZE;
>                 return va;
>         }
>
>         nc->va =3D NULL;
>         nc->size =3D 0;
>         return NULL;
> }
>
> void *__page_frag_alloc_va_align(struct page_frag_cache *nc,
>                                  unsigned int fragsz, gfp_t gfp_mask,
>                                  unsigned int align_mask)
> {
> #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>         unsigned long page_order;
> #endif
>         unsigned long page_size;
>         unsigned long size;
>         struct page *page;
>         void *va;
>
>         size =3D nc->size & align_mask;
>         va =3D nc->va;
> #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>         page_order =3D (unsigned long)va & PAGE_FRAG_CACHE_ORDER_MASK;
>         page_size =3D PAGE_SIZE << page_order;
> #else
>         page_size =3D PAGE_SIZE;
> #endif

So I notice you got rid of the loops within the function. One of the
reasons for structuring it the way it was is to enable better code
caching. By unfolding the loops you are increasing the number of
instructions that have to be fetched and processed in order to
allocate the buffers.

>
>         if (unlikely(fragsz > size)) {
>                 if (unlikely(!va))
>                         return __page_frag_cache_refill(nc, fragsz, gfp_m=
ask,
>                                                         align_mask);
>
>                 /* fragsz is not supposed to be bigger than PAGE_SIZE as =
we are
>                  * allowing order 3 page allocation to fail easily under =
low
>                  * memory condition.
>                  */
>                 if (WARN_ON_ONCE(fragsz > PAGE_SIZE))
>                         return NULL;
>
>                 page =3D virt_to_page(va);
>                 if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
>                         return __page_frag_cache_refill(nc, fragsz, gfp_m=
ask,
>                                                         align_mask);
>
>                 if (unlikely((unsigned long)va &
>                              PAGE_FRAG_CACHE_PFMEMALLOC_BIT)) {
>                         free_unref_page(page, compound_order(page));
>                         return __page_frag_cache_refill(nc, fragsz, gfp_m=
ask,
>                                                         align_mask);
>                 }
>
>                 /* OK, page count is 0, we can safely set it */
>                 set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
>
>                 /* reset page count bias and offset to start of new frag =
*/
>                 nc->pagecnt_bias =3D PAGE_FRAG_CACHE_MAX_SIZE + 1;
>                 size =3D page_size;
>         }
>
>         va =3D (void *)((unsigned long)va & PAGE_MASK);
>         va =3D va + (page_size - size);
>         nc->size =3D size - fragsz;
>         nc->pagecnt_bias--;
>
>         return va;
> }
> EXPORT_SYMBOL(__page_frag_alloc_va_align);
>