From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4F0CCD4857 for ; Wed, 4 Sep 2024 16:15:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 290408D025C; Wed, 4 Sep 2024 12:15:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1CB288D0253; Wed, 4 Sep 2024 12:15:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 01E328D025C; Wed, 4 Sep 2024 12:15:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D33DE8D0253 for ; Wed, 4 Sep 2024 12:15:36 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 80042A4B60 for ; Wed, 4 Sep 2024 16:15:36 +0000 (UTC) X-FDA: 82527556272.29.90AE356 Received: from mail-wr1-f41.google.com (mail-wr1-f41.google.com [209.85.221.41]) by imf11.hostedemail.com (Postfix) with ESMTP id 9F05640016 for ; Wed, 4 Sep 2024 16:15:34 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DGrww70F; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.221.41 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725466427; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Z2PynwD3abVbxCbe/0Pq2Wq8DiRMu8mGh4sTgPawR0s=; b=M/QV8BIG6WEqAuJaFnp6jjyl3meF83a61+3ra6MxkNdfL7xPdYmHl+Ci3JCJcXBpYBPgaf lRmRJut3x0YnLE+2pP9DJoj6J5UUqb2NgDvJz3ermzjgXaXQCLOPOhd18keIlUKVzki9oD 80L7QtH/nTzfpddPDbEBxGW88cCwG+k= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725466427; a=rsa-sha256; cv=none; b=D1otKEiHLmUmlwNP59Yp4jBdcbWHpCugCzlX81035EqbON3byGFpnT01m+l8swY4/MqpHv KGF/ad/GIdfdKG2K22v7wguJf2C98QY24tXbyoge01eA56YLusXGIx61o63zt2z/JJF1Qy 2dUyHzeCoMyUJDK7unuCFYgsWjX9tPs= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DGrww70F; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.221.41 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com Received: by mail-wr1-f41.google.com with SMTP id ffacd0b85a97d-3787ddbd5a2so14478f8f.0 for ; Wed, 04 Sep 2024 09:15:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725466533; x=1726071333; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Z2PynwD3abVbxCbe/0Pq2Wq8DiRMu8mGh4sTgPawR0s=; b=DGrww70FVorkuZdD0IOLau5CwTk5wD3wflXLMZEmZ643NAklAlqL2rjFz+iSo691NH upnjeo97K/aXnVxKw094ogCX0BdGvcOvGolqKPN/48QoyHq/cf7VRa4cO4WvIdaaBY3i VA5/lJaZlm5TtyiKsIfByTjQrWfrZXr7p3Pa+GQPhT62m//JvngHkG7fMivNycLVQieh KSXHlZn87JrPH7KeI2LMZYyiUIW5RvUGZ5LCb2V4D48VQiZLMwKkzQfaYirH9Qt/dgyj ZOQYp4ZnX1gMter1TNcMZ7vR6wpFUbIOXKLRUGQhaVBYTtmsmeXUQvpzcHn8eO6DotRQ IKWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725466533; x=1726071333; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Z2PynwD3abVbxCbe/0Pq2Wq8DiRMu8mGh4sTgPawR0s=; b=tO4Q9uvktHq8Qy6paiIP+yx0AQAcHg7MP9DNn/zeOtiNgiZ4KenKcqp8R+3g5dkJnw AR45p13Ie4yaR1fhvFnLzIfAi3lNfUx6M0lD6EcPvEQDwgL28RGObUWhnXbTQwoL85wL frfWv0/+rd0WRBTOROWhJH+AAybvbr3DT4c++eDs+bATJbFWKb8ipX0XQKdid9Za/l98 7ujN/1SD56A0PV+wGYWMEtHFa8enBNlU3+UeWRjuG+Yt6miyWd1YO2i4VIlEPodVJU+E iCeSVNqr4YM/Q0RpVi9f7kaNxGPCFEOHbxgH8zzq4m+2Pl4G3rDSYsMs1byQ/b9O3obl fy+Q== X-Forwarded-Encrypted: i=1; AJvYcCUbhrL9jRBKSIJilfnZyziw/+KBJI2nurwslz64xvgte/00lf0WrIl2GFOejowhJU2hjE/C9ATPrQ==@kvack.org X-Gm-Message-State: AOJu0Yzukbctiv8xBAXo+Sqr+N7rLu0N4e8Jpe5D3uXENRc9T3Cxb4GT qgT1nrLOUx1M1AhAHD64+ukEpgA4XZY7oXQRkLp04vLteiJUZLlgs08lOSr0cKeNRjH7PXqJc2R 0EfIDDnMNwov2argLTQfgy+3cHcE= X-Google-Smtp-Source: AGHT+IHw4lvnMg9tOw9SWPpY8LNK4ymzOGV/1CUO4kGZ8s/KnSFHPrdH9ddQZjTrgpiLmzqfRpSBbke4je9chDLoWSQ= X-Received: by 2002:adf:a451:0:b0:374:c57b:a909 with SMTP id ffacd0b85a97d-374f9e476aemr5437589f8f.48.1725466532534; Wed, 04 Sep 2024 09:15:32 -0700 (PDT) MIME-Version: 1.0 References: <20240902120314.508180-1-linyunsheng@huawei.com> <20240902120314.508180-7-linyunsheng@huawei.com> In-Reply-To: <20240902120314.508180-7-linyunsheng@huawei.com> From: Alexander Duyck Date: Wed, 4 Sep 2024 09:14:55 -0700 Message-ID: Subject: Re: [PATCH net-next v17 06/14] mm: page_frag: reuse existing space for 'size' and 'pfmemalloc' To: Yunsheng Lin Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Morton , linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 9F05640016 X-Stat-Signature: u7trh3a759i48acmkys9tqbi9wdy774q X-Rspam-User: X-HE-Tag: 1725466534-405408 X-HE-Meta: U2FsdGVkX1+8kjCHQ3csONt/m+1i/ZzGOdpYnQZex/X6aw6F7buD2gH7esAuDX4tZFQHvxT5SnGADdBd2ZojZYnbpXyzObBILYPJ0LtibVJCgKDoo6MqSWCoaTyxq5uimM1uYnflewx7ee1UOa7bPPgEbCRMU/ROCI4JORM+mpAClSn8k2elxD0tzzAslSSZeLdzG99XJf+YE+cSxZMkznNsRqNYS/4eZCkd/uwZaSnFVMQWyTtIts1H6Y5VtFN2Pg+UKWrdgbAn8fM3Ogn/qVwergbcmOHvPaINMBNkGm26wYmFx5aHTutZfVAOhg41OfnqTFaxLldZysUajosEEnn4h9QUvej59jgRSqvgOFUfyFdBXPkHyJKaVo2kf+qtlkcv+rQly+yDGGCROa23eJZhPxaduuakMrci627bR4jojtFdpTlsyN0Vwx7QVNV5RQzCMw8svlXefkmKo3AaqGWWqkBKQZyZtgYc5CSxAuyqIp310nKEP7+SDDjml4rR9e4giXfatLcGrYT1jYyAIwk3mfC08WfOEhz5hbdb6SPnpNYo9I7YhZgE4+7e53RMhafIydvZgLfJZDGOKIpophjMaEEJl4bD4j7qOq6IMjR4pJYtB62tU7q0S1275bwiJXoGiOqHp87iiFId/5q008q8zTd+WMLFBG1OK3xr3G4D80jvS2pzGKb7xoUFXoerOZubxoPOqj0jg1xXWfXpjZGD5N/C6zhsUQs75Jv9+G3DJaHSFtbJFblvsDXOsi2OWuXB+t5gBZpCKmY9xqw7MtRjhI39wQypfKl7kOJCoWeeIgWs3vGJDr4bsC7o3lB/o3tf6JyVNsdWHT5GBJVSz23A1SKsL0D/DLneYjEtugosueRcwwUl69ddkoK21GcdEwC5so4AZlh2Yk1Rq/6NfmtQfpmJe6kxCM8JkCk+uP8i83VQIbDWT/ukoSLIoL3yjU4BnaHIDqIszvwa2N/ tn4PWqVo RhqolikNHvh5ZBx6CKeLo1JHhR20JHULuUl52QWT17YzUtaDevvrHFEOeu5zdSx4JCTATRPzewktBmUWRMgBK9md58UzBnG6EmNuAD3bJYsQ4piuJnVHz0vPfebAutmpLB4ueRha0eodp6L5ODHnnOR7B8ulX6dVb2T3MllpP3BVyhJxUrlVdGi3S52d5P+YHsVmz8Xa1hBEavFIjMX0gc6hnGS0+ok8vJGaztkvnVXUDIH9F6AEYSKHV8Jz2sAxos/LxROTIi+3Sj0RvIMT6e0gmtHC7/rjgCWE0yvOXr7LDUE7QW/8Zki/xt1xGlyI2Ar2BdJ3MQowczXn3G+2AKP2phymHwz2E+v4XSdvkqWohoBO7SnJbMF03rTpleBVwO/2OBGBe0bhYiRonT/EkAX0fZqA+FPDlueYSRiJ1dnmmC4WtJ5N2AbB+jVNq6zKw4X92 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Sep 2, 2024 at 5:09=E2=80=AFAM Yunsheng Lin wrote: > > Currently there is one 'struct page_frag' for every 'struct > sock' and 'struct task_struct', we are about to replace the > 'struct page_frag' with 'struct page_frag_cache' for them. > Before begin the replacing, we need to ensure the size of > 'struct page_frag_cache' is not bigger than the size of > 'struct page_frag', as there may be tens of thousands of > 'struct sock' and 'struct task_struct' instances in the > system. > > By or'ing the page order & pfmemalloc with lower bits of > 'va' instead of using 'u16' or 'u32' for page size and 'u8' > for pfmemalloc, we are able to avoid 3 or 5 bytes space waste. > And page address & pfmemalloc & order is unchanged for the > same page in the same 'page_frag_cache' instance, it makes > sense to fit them together. > > After this patch, the size of 'struct page_frag_cache' should be > the same as the size of 'struct page_frag'. > > CC: Alexander Duyck > Signed-off-by: Yunsheng Lin > --- > include/linux/mm_types_task.h | 19 +++++----- > include/linux/page_frag_cache.h | 47 ++++++++++++++++++++++-- > mm/page_frag_cache.c | 63 +++++++++++++++++++++------------ > 3 files changed, 96 insertions(+), 33 deletions(-) > > diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.= h > index cdc1e3696439..73a574a0e8f9 100644 > --- a/include/linux/mm_types_task.h > +++ b/include/linux/mm_types_task.h > @@ -50,18 +50,21 @@ struct page_frag { > #define PAGE_FRAG_CACHE_MAX_SIZE __ALIGN_MASK(32768, ~PAGE_MASK) > #define PAGE_FRAG_CACHE_MAX_ORDER get_order(PAGE_FRAG_CACHE_MAX_SIZ= E) > struct page_frag_cache { > - void *va; > -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) > + /* encoded_page consists of the virtual address, pfmemalloc bit a= nd > + * order of a page. > + */ > + unsigned long encoded_page; > + > + /* we maintain a pagecount bias, so that we dont dirty cache line > + * containing page->_refcount every time we allocate a fragment. > + */ > +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) && (BITS_PER_LONG <=3D 32) > __u16 offset; > - __u16 size; > + __u16 pagecnt_bias; > #else > __u32 offset; > + __u32 pagecnt_bias; > #endif > - /* we maintain a pagecount bias, so that we dont dirty cache line > - * containing page->_refcount every time we allocate a fragment. > - */ > - unsigned int pagecnt_bias; > - bool pfmemalloc; > }; > > /* Track pages that require TLB flushes */ > diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_ca= che.h > index 0a52f7a179c8..cb89cd792fcc 100644 > --- a/include/linux/page_frag_cache.h > +++ b/include/linux/page_frag_cache.h > @@ -3,18 +3,61 @@ > #ifndef _LINUX_PAGE_FRAG_CACHE_H > #define _LINUX_PAGE_FRAG_CACHE_H > > +#include > #include > +#include > #include > #include > > +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) > +/* Use a full byte here to enable assembler optimization as the shift > + * operation is usually expecting a byte. > + */ > +#define PAGE_FRAG_CACHE_ORDER_MASK GENMASK(7, 0) > +#define PAGE_FRAG_CACHE_PFMEMALLOC_SHIFT 8 > +#define PAGE_FRAG_CACHE_PFMEMALLOC_BIT BIT(PAGE_FRAG_CACHE_PFMEM= ALLOC_SHIFT) > +#else > +/* Compiler should be able to figure out we don't read things as any val= ue > + * ANDed with 0 is 0. > + */ > +#define PAGE_FRAG_CACHE_ORDER_MASK 0 > +#define PAGE_FRAG_CACHE_PFMEMALLOC_SHIFT 0 > +#define PAGE_FRAG_CACHE_PFMEMALLOC_BIT BIT(PAGE_FRAG_CACHE_PFMEM= ALLOC_SHIFT) > +#endif > + > +static inline unsigned long page_frag_encoded_page_order(unsigned long e= ncoded_page) > +{ > + return encoded_page & PAGE_FRAG_CACHE_ORDER_MASK; > +} > + > +static inline bool page_frag_encoded_page_pfmemalloc(unsigned long encod= ed_page) > +{ > + return !!(encoded_page & PAGE_FRAG_CACHE_PFMEMALLOC_BIT); > +} > + > +static inline void *page_frag_encoded_page_address(unsigned long encoded= _page) > +{ > + return (void *)(encoded_page & PAGE_MASK); > +} > + > +static inline struct page *page_frag_encoded_page_ptr(unsigned long enco= ded_page) > +{ > + return virt_to_page((void *)encoded_page); > +} > + > static inline void page_frag_cache_init(struct page_frag_cache *nc) > { > - nc->va =3D NULL; > + nc->encoded_page =3D 0; > } > > static inline bool page_frag_cache_is_pfmemalloc(struct page_frag_cache = *nc) > { > - return !!nc->pfmemalloc; > + return page_frag_encoded_page_pfmemalloc(nc->encoded_page); > +} > + > +static inline unsigned int page_frag_cache_page_size(unsigned long encod= ed_page) > +{ > + return PAGE_SIZE << page_frag_encoded_page_order(encoded_page); > } > > void page_frag_cache_drain(struct page_frag_cache *nc); Still not a huge fan of adding all these functions that expose the internals. It might be better to just place them in page_frag_cache.c and pull them out to the .h file as needed. > diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c > index 4c8e04379cb3..a5c5373cb70e 100644 > --- a/mm/page_frag_cache.c > +++ b/mm/page_frag_cache.c > @@ -12,16 +12,28 @@ > * be used in the "frags" portion of skb_shared_info. > */ > > +#include > #include > #include > #include > -#include > #include > #include "internal.h" > > +static unsigned long page_frag_encode_page(struct page *page, unsigned i= nt order, > + bool pfmemalloc) > +{ > + BUILD_BUG_ON(PAGE_FRAG_CACHE_MAX_ORDER > PAGE_FRAG_CACHE_ORDER_MA= SK); > + BUILD_BUG_ON(PAGE_FRAG_CACHE_PFMEMALLOC_BIT >=3D PAGE_SIZE); > + > + return (unsigned long)page_address(page) | > + (order & PAGE_FRAG_CACHE_ORDER_MASK) | > + ((unsigned long)pfmemalloc << PAGE_FRAG_CACHE_PFMEMALLOC_= SHIFT); > +} > + > static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, > gfp_t gfp_mask) > { > + unsigned long order =3D PAGE_FRAG_CACHE_MAX_ORDER; > struct page *page =3D NULL; > gfp_t gfp =3D gfp_mask; > > @@ -30,23 +42,31 @@ static struct page *__page_frag_cache_refill(struct p= age_frag_cache *nc, > __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC; > page =3D alloc_pages_node(NUMA_NO_NODE, gfp_mask, > PAGE_FRAG_CACHE_MAX_ORDER); > - nc->size =3D page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE; > #endif > - if (unlikely(!page)) > + if (unlikely(!page)) { > page =3D alloc_pages_node(NUMA_NO_NODE, gfp, 0); > + if (unlikely(!page)) { > + nc->encoded_page =3D 0; > + return NULL; > + } I would recommend just skipping the conditional here. No need to do that. You can basically just not encode the page below if you failed to allocate it. > + > + order =3D 0; > + } > > - nc->va =3D page ? page_address(page) : NULL; > + nc->encoded_page =3D page_frag_encode_page(page, order, > + page_is_pfmemalloc(page)= ); I would just follow the same logic with the ternary operator here. If page is set then encode the page, else just set it to 0. > > return page; > } > > void page_frag_cache_drain(struct page_frag_cache *nc) > { > - if (!nc->va) > + if (!nc->encoded_page) > return; > > - __page_frag_cache_drain(virt_to_head_page(nc->va), nc->pagecnt_bi= as); > - nc->va =3D NULL; > + __page_frag_cache_drain(page_frag_encoded_page_ptr(nc->encoded_pa= ge), > + nc->pagecnt_bias); > + nc->encoded_page =3D 0; > } > EXPORT_SYMBOL(page_frag_cache_drain); > > @@ -63,31 +83,27 @@ void *__page_frag_alloc_align(struct page_frag_cache = *nc, > unsigned int fragsz, gfp_t gfp_mask, > unsigned int align_mask) > { > -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) > - unsigned int size =3D nc->size; > -#else > - unsigned int size =3D PAGE_SIZE; > -#endif > - unsigned int offset; > + unsigned long encoded_page =3D nc->encoded_page; > + unsigned int size, offset; > struct page *page; > > - if (unlikely(!nc->va)) { > + size =3D page_frag_cache_page_size(encoded_page); > + > + if (unlikely(!encoded_page)) { > refill: > page =3D __page_frag_cache_refill(nc, gfp_mask); > if (!page) > return NULL; > > -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) > - /* if size can vary use size else just use PAGE_SIZE */ > - size =3D nc->size; > -#endif > + encoded_page =3D nc->encoded_page; > + size =3D page_frag_cache_page_size(encoded_page); > + > /* Even if we own the page, we do not use atomic_set(). > * This would break get_page_unless_zero() users. > */ > page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE); > > /* reset page count bias and offset to start of new frag = */ > - nc->pfmemalloc =3D page_is_pfmemalloc(page); > nc->pagecnt_bias =3D PAGE_FRAG_CACHE_MAX_SIZE + 1; > nc->offset =3D 0; > } It would probably make sense to move the getting of the size to just after this if statement since you are doing it in two different paths and I don't think you use size at all in the "if(unlikely(!encoded_page))" path otherwise. > @@ -107,13 +123,14 @@ void *__page_frag_alloc_align(struct page_frag_cach= e *nc, > return NULL; > } > > - page =3D virt_to_page(nc->va); > + page =3D page_frag_encoded_page_ptr(encoded_page); > > if (!page_ref_sub_and_test(page, nc->pagecnt_bias)) > goto refill; > > - if (unlikely(nc->pfmemalloc)) { > - free_unref_page(page, compound_order(page)); > + if (unlikely(page_frag_encoded_page_pfmemalloc(encoded_pa= ge))) { > + free_unref_page(page, > + page_frag_encoded_page_order(enco= ded_page)); > goto refill; > } > > @@ -128,7 +145,7 @@ void *__page_frag_alloc_align(struct page_frag_cache = *nc, > nc->pagecnt_bias--; > nc->offset =3D offset + fragsz; > > - return nc->va + offset; > + return page_frag_encoded_page_address(encoded_page) + offset; > } > EXPORT_SYMBOL(__page_frag_alloc_align); > > -- > 2.33.0 >