From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AECA6CD6E56 for ; Thu, 5 Sep 2024 16:56:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3AB616B008A; Thu, 5 Sep 2024 12:56:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 35B7A6B008C; Thu, 5 Sep 2024 12:56:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24A816B0092; Thu, 5 Sep 2024 12:56:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 028866B008A for ; Thu, 5 Sep 2024 12:56:00 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 6C852A06C1 for ; Thu, 5 Sep 2024 16:56:00 +0000 (UTC) X-FDA: 82531286880.10.664FCC3 Received: from mail-wm1-f43.google.com (mail-wm1-f43.google.com [209.85.128.43]) by imf21.hostedemail.com (Postfix) with ESMTP id 8DE431C0007 for ; Thu, 5 Sep 2024 16:55:58 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=d95Uqbbe; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.128.43 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725555286; a=rsa-sha256; cv=none; b=w91Z8mpPHTV1IbUyEv2Rm5ePPdTCU+glm8o8Dz1nBO40iKUqq3mHPrPkN2FpSO/Dkzdp97 XZ1229AqlD9ZgiV1zNGLnahIdfWcGzvmxuWi/DtDWcDxygF2lzMeEtgOVRoEEV2TSmD04f Q03CTEGbZf5xTSZloJ+G8EAkGu/n6VM= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=d95Uqbbe; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.128.43 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725555286; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=71wgFdMdO0+Jj9nre97VRx58TCNy/d1lEQrwnYTY1tU=; b=JEjBXJ3gbHR+mLc2MFhTxXznrF36/HO24M3EqhZ4JKMgpUKV02MaT8XVeI9ZKcdN1g7csC qaKcEEnZGkIGlDaRKEjV7TsczIsNGtpv1JRULgVKyRi1yPLA37Q6bMRnFkgE9Lzq2i/7wm MUoqWL3Ih0g688zM4dRVaYETUwIAE5k= Received: by mail-wm1-f43.google.com with SMTP id 5b1f17b1804b1-42bb7298bdeso10915225e9.1 for ; Thu, 05 Sep 2024 09:55:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725555357; x=1726160157; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=71wgFdMdO0+Jj9nre97VRx58TCNy/d1lEQrwnYTY1tU=; b=d95UqbbeEMkqcJDWfLB+sEqQGE3FojDIaxDpgV5thvPVjPQ1LZD9pnKBIhgksq7vHB M6WNa6JNP3LCG9i/Wl/5x9HfQJNyFXZNGgi4SGN16sQ+ZPUPMF0oLGdGk8GFzZWT9rct GKkH4C8uVC+wsWJURASsLppbsi235ZE03neXwXMMQ/4FHpGOYITHa9enOFRGMlNLS9of 8qS3s6Ss5jOfBRt/cQr78dTiHLx1qF9uYCv+yVKQCm/CrFAE0XpD6+0K+Q9t9wDpfOVU Tl9GVbW2+HVZ5uDzm3skbkHmsvzJFLwbEwU3+hPL0oIYadN4HmFGSudIWAwOsxynrFrH +Q7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725555357; x=1726160157; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=71wgFdMdO0+Jj9nre97VRx58TCNy/d1lEQrwnYTY1tU=; b=PZbq/M1SvJFj2toeJi59X0QZmA35/yzzznEYQRpUB0haHel3K1/xb/rDyyy56BBAln TC5wGukJws7EGImva2fWfjTTjPX/LpH2ygzKJwt7/rJd+q4rvb+8Uo9jl0A2GiSGHJpp 6DL5+qtwfSPFe60oHP7zjxIjxL+WRuiyo+hsys0ECe1PR3O4Gp64QyDoiERFxGCYyrtK kd0VTEkZSPCnQcR2cym0GxJGoGN3tUqLk6z7fEFWOOwOqA+i3PGp5ap/cmmfYoh1xsxP L7unNty/j5voQUHm4RPH6niWPO7xRnjFIfL+ts9Xk1dEmJJZsOYQRkrYQZQJOqiLdfCG aMGA== X-Forwarded-Encrypted: i=1; AJvYcCWWCkYFVufzLDyysCe5vroZxOZPcLWP9Xm9uEKA8NJpZ3bdiiOcQDa3q3ip1hEYhXC8OTx+NftpmQ==@kvack.org X-Gm-Message-State: AOJu0Yy4lXi8BASKqNxZwzH7DHHPlbaqTbJmwoElGEdmj78i+Lvqgnn+ RPL/7nJH7u8DVJs7QNiVRR0n6e9TaIEd6BVMw+MageuPqRqfo4chAlVpzodowEPL/NIdiDlsWRo mHbEV+TRN6goGOIBGemlxVqBv678= X-Google-Smtp-Source: AGHT+IFxZl9zYv0EQPNJJTgG8Bd0GDrtwaOwV30EFXsbcM5DUY9tjzTh9JVFwul5GlsSWDnmj2psuVH4aiA3cRqqqo8= X-Received: by 2002:a05:600c:358a:b0:426:6714:5415 with SMTP id 5b1f17b1804b1-42c9a38b275mr30666255e9.30.1725555356878; Thu, 05 Sep 2024 09:55:56 -0700 (PDT) MIME-Version: 1.0 References: <20240902120314.508180-1-linyunsheng@huawei.com> <20240902120314.508180-7-linyunsheng@huawei.com> <2cf46afa-9e80-4f34-a734-22009e277cc2@huawei.com> In-Reply-To: <2cf46afa-9e80-4f34-a734-22009e277cc2@huawei.com> From: Alexander Duyck Date: Thu, 5 Sep 2024 09:55:20 -0700 Message-ID: Subject: Re: [PATCH net-next v17 06/14] mm: page_frag: reuse existing space for 'size' and 'pfmemalloc' To: Yunsheng Lin Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Morton , linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 8DE431C0007 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: ew3bpwzxwccpnuapom7xwgzo1aeok736 X-HE-Tag: 1725555358-62068 X-HE-Meta: U2FsdGVkX1+TYvKiOspKJ5HWw68OOsDug3NyiFNT+ngrJClD/fSF4GXuqv+yE5Luzqpn6udFWOx/dOynvSZmtK4o2HhO/aqvIkJYsDSoQVGEgcROt47sWSSFHEQyXi3amUuaAjCuk9ESmgSN5rJj6pYEtULyXd7xCrF6MJotTNdOEYWQGO6sSXrQ+dnXAtB4hCcXtbga7f64YE2l0aiX8o6p1GGZSCE18Ebs78BCiUcY6KhCMQcP/p3C7s9YvUdJ8B1ZR63hGIxG+w73HqbAtZQHwkX14DG0Cn3sh5K5OavlCYRa9IOQo0kFJVASchKYPh/553TpbUrxFZszpJfRA8FPMsJsTjVbkfad5Mxzb/6Dyq9K31P3F0dX0nmNAK/oKbR9UZiX+0djSwwWHq6aTFDq5xbPXQHa5kUoyOOviGOGPus2ry5U9TZaDfzAGwVxLpTrQ1O8obb1nhvT3c6DTeItijo+Ao9VmBMrt1A76wnKGEQ2klwtQLo9U/woH+iV+Yo6jg870aVfKRAlWhtrlEC5n6OHxqJkb3gYuHGBtaRIpOl7YPVSliJpQiwgn9mT9mwasvV7mNC1DkxGAa5PCUQelqZSFdzazEKdnCOcw8TT+eeGGtB/pzu9Gws4FyO0z/m8S0L3EZDAojtuYfGNvudsVjclwfezByI1PyIsUsQy17gKmoGPP9TKdJBgUYu3YkTDJ9WEidjLz4OUy00BUwvZNhayuYlW6S9ufT8OQcWAK1CadyjAd3skW+3t672Ejfr0gUFPPZYvhzMOrgUSpVrVkMIa7IZ/exDb597YI0oeeQ+6ZPvu0M1X5qJS2ma3fzLa9WSnCA2vm3XTyMb+tZq4Yyn4QaCui5OAd34tqHaEjJgz+WrvbYLwlcg8Nl/AtRN6oPM3RlXcLmiKSzBITB0qhi2R1hNXO4RZh7s7+BHa0ynmQN6Zj/v5TUNubLeNj9hJ5jwaE5pWwXV0/Tq qsG+xCYl G2otYdkpOOnQ5pu8nFYk/MC5t1lPD+dXsBEj5TYUYyYil68QwOaK7kQt4CMXaZ1DSNX6UN1sEjrW3bne5Ty5FVJ4EZC2ZgEjadOwIoJev4ZPNF6iTQIhlElflyIoWRBkGpySso3Zy/qW9P+m57jKjL9aHlwlFjjlzTVPttYOM0OQ5/fQDswCYP931OBi2SUl08uBKqSkcWCUQcMgAqTYAkCKZkiV4CN9//t3CTnhYL7WiAQnnnGMTgTk7wKKQyqAAmvEs3U2lJmFEjQcjWvGWL33DePx0Ue01HVmc0dSExT4/ejG2/YxsxapMGUMA+5O/wMLBrcwqzfoacg+HYMN5D9dIG1TxoCJDdBTTsyfF8p7xXbhRUt7N1cqrwNZBBHT9IkcB1LgTilO2/fzyOvAhxXVrCxHgX/thnsT0saCcimFTtxBDQvjx8iN8HH5h4xWyG0WY X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Sep 5, 2024 at 5:13=E2=80=AFAM Yunsheng Lin wrote: > > On 2024/9/5 0:14, Alexander Duyck wrote: > > On Mon, Sep 2, 2024 at 5:09=E2=80=AFAM Yunsheng Lin wrote: > >> > >> Currently there is one 'struct page_frag' for every 'struct > >> sock' and 'struct task_struct', we are about to replace the > >> 'struct page_frag' with 'struct page_frag_cache' for them. > >> Before begin the replacing, we need to ensure the size of > >> 'struct page_frag_cache' is not bigger than the size of > >> 'struct page_frag', as there may be tens of thousands of > >> 'struct sock' and 'struct task_struct' instances in the > >> system. > >> > >> By or'ing the page order & pfmemalloc with lower bits of > >> 'va' instead of using 'u16' or 'u32' for page size and 'u8' > >> for pfmemalloc, we are able to avoid 3 or 5 bytes space waste. > >> And page address & pfmemalloc & order is unchanged for the > >> same page in the same 'page_frag_cache' instance, it makes > >> sense to fit them together. > >> > >> After this patch, the size of 'struct page_frag_cache' should be > >> the same as the size of 'struct page_frag'. > >> > >> CC: Alexander Duyck > >> Signed-off-by: Yunsheng Lin > >> --- > >> include/linux/mm_types_task.h | 19 +++++----- > >> include/linux/page_frag_cache.h | 47 ++++++++++++++++++++++-- > >> mm/page_frag_cache.c | 63 +++++++++++++++++++++-----------= - > >> 3 files changed, 96 insertions(+), 33 deletions(-) > >> > >> diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_ta= sk.h > >> index cdc1e3696439..73a574a0e8f9 100644 > >> --- a/include/linux/mm_types_task.h > >> +++ b/include/linux/mm_types_task.h > >> @@ -50,18 +50,21 @@ struct page_frag { > >> #define PAGE_FRAG_CACHE_MAX_SIZE __ALIGN_MASK(32768, ~PAGE_MASK= ) > >> #define PAGE_FRAG_CACHE_MAX_ORDER get_order(PAGE_FRAG_CACHE_MAX_= SIZE) > >> struct page_frag_cache { > >> - void *va; > >> -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) > >> + /* encoded_page consists of the virtual address, pfmemalloc bi= t and > >> + * order of a page. > >> + */ > >> + unsigned long encoded_page; > >> + > >> + /* we maintain a pagecount bias, so that we dont dirty cache l= ine > >> + * containing page->_refcount every time we allocate a fragmen= t. > >> + */ > >> +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) && (BITS_PER_LONG <=3D 32) > >> __u16 offset; > >> - __u16 size; > >> + __u16 pagecnt_bias; > >> #else > >> __u32 offset; > >> + __u32 pagecnt_bias; > >> #endif > >> - /* we maintain a pagecount bias, so that we dont dirty cache l= ine > >> - * containing page->_refcount every time we allocate a fragmen= t. > >> - */ > >> - unsigned int pagecnt_bias; > >> - bool pfmemalloc; > >> }; > >> > >> /* Track pages that require TLB flushes */ > >> diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag= _cache.h > >> index 0a52f7a179c8..cb89cd792fcc 100644 > >> --- a/include/linux/page_frag_cache.h > >> +++ b/include/linux/page_frag_cache.h > >> @@ -3,18 +3,61 @@ > >> #ifndef _LINUX_PAGE_FRAG_CACHE_H > >> #define _LINUX_PAGE_FRAG_CACHE_H > >> > >> +#include > >> #include > >> +#include > >> #include > >> #include > >> > >> +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) > >> +/* Use a full byte here to enable assembler optimization as the shift > >> + * operation is usually expecting a byte. > >> + */ > >> +#define PAGE_FRAG_CACHE_ORDER_MASK GENMASK(7, 0) > >> +#define PAGE_FRAG_CACHE_PFMEMALLOC_SHIFT 8 > >> +#define PAGE_FRAG_CACHE_PFMEMALLOC_BIT BIT(PAGE_FRAG_CACHE_PF= MEMALLOC_SHIFT) > >> +#else > >> +/* Compiler should be able to figure out we don't read things as any = value > >> + * ANDed with 0 is 0. > >> + */ > >> +#define PAGE_FRAG_CACHE_ORDER_MASK 0 > >> +#define PAGE_FRAG_CACHE_PFMEMALLOC_SHIFT 0 > >> +#define PAGE_FRAG_CACHE_PFMEMALLOC_BIT BIT(PAGE_FRAG_CACHE_PF= MEMALLOC_SHIFT) > >> +#endif > >> + > >> +static inline unsigned long page_frag_encoded_page_order(unsigned lon= g encoded_page) > >> +{ > >> + return encoded_page & PAGE_FRAG_CACHE_ORDER_MASK; > >> +} > >> + > >> +static inline bool page_frag_encoded_page_pfmemalloc(unsigned long en= coded_page) > >> +{ > >> + return !!(encoded_page & PAGE_FRAG_CACHE_PFMEMALLOC_BIT); > >> +} > >> + > >> +static inline void *page_frag_encoded_page_address(unsigned long enco= ded_page) > >> +{ > >> + return (void *)(encoded_page & PAGE_MASK); > >> +} > >> + > >> +static inline struct page *page_frag_encoded_page_ptr(unsigned long e= ncoded_page) > >> +{ > >> + return virt_to_page((void *)encoded_page); > >> +} > >> + > >> static inline void page_frag_cache_init(struct page_frag_cache *nc) > >> { > >> - nc->va =3D NULL; > >> + nc->encoded_page =3D 0; > >> } > >> > >> static inline bool page_frag_cache_is_pfmemalloc(struct page_frag_cac= he *nc) > >> { > >> - return !!nc->pfmemalloc; > >> + return page_frag_encoded_page_pfmemalloc(nc->encoded_page); > >> +} > >> + > >> +static inline unsigned int page_frag_cache_page_size(unsigned long en= coded_page) > >> +{ > >> + return PAGE_SIZE << page_frag_encoded_page_order(encoded_page)= ; > >> } > >> > >> void page_frag_cache_drain(struct page_frag_cache *nc); > > > > Still not a huge fan of adding all these functions that expose the > > internals. It might be better to just place them in page_frag_cache.c > > and pull them out to the .h file as needed. > > Are you suggesting to move the above to page_frag_cache.c, and move > it back to page_frag_cache.h if needed in the following patch of the > same patchset? My thought for now is to look at moving it back and forth if we need to. > Or are you really preferring not to expose any internals over the > performance here and thinking the moving them back to .h file is > unneeded? I'm debating it. My main concern is that if we expose too much of the internals it makes it likely for people to start abusing it like what happened with people trying to allocate higher order pages from the page frag just because we were already using them as a part of the internal implementation when the intention was supposed to be for 2K or less fragment size. > > > >> diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c > >> index 4c8e04379cb3..a5c5373cb70e 100644 > >> --- a/mm/page_frag_cache.c > >> +++ b/mm/page_frag_cache.c > >> @@ -12,16 +12,28 @@ > >> * be used in the "frags" portion of skb_shared_info. > >> */ > >> > >> +#include > >> #include > >> #include > >> #include > >> -#include > >> #include > >> #include "internal.h" > >> > >> +static unsigned long page_frag_encode_page(struct page *page, unsigne= d int order, > >> + bool pfmemalloc) > >> +{ > >> + BUILD_BUG_ON(PAGE_FRAG_CACHE_MAX_ORDER > PAGE_FRAG_CACHE_ORDER= _MASK); > >> + BUILD_BUG_ON(PAGE_FRAG_CACHE_PFMEMALLOC_BIT >=3D PAGE_SIZE); > >> + > >> + return (unsigned long)page_address(page) | > >> + (order & PAGE_FRAG_CACHE_ORDER_MASK) | > >> + ((unsigned long)pfmemalloc << PAGE_FRAG_CACHE_PFMEMALL= OC_SHIFT); > >> +} > >> + > >> static struct page *__page_frag_cache_refill(struct page_frag_cache *= nc, > >> gfp_t gfp_mask) > >> { > >> + unsigned long order =3D PAGE_FRAG_CACHE_MAX_ORDER; > >> struct page *page =3D NULL; > >> gfp_t gfp =3D gfp_mask; > >> > >> @@ -30,23 +42,31 @@ static struct page *__page_frag_cache_refill(struc= t page_frag_cache *nc, > >> __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC; > >> page =3D alloc_pages_node(NUMA_NO_NODE, gfp_mask, > >> PAGE_FRAG_CACHE_MAX_ORDER); > >> - nc->size =3D page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE; > >> #endif > >> - if (unlikely(!page)) > >> + if (unlikely(!page)) { > >> page =3D alloc_pages_node(NUMA_NO_NODE, gfp, 0); > >> + if (unlikely(!page)) { > >> + nc->encoded_page =3D 0; > >> + return NULL; > >> + } > > > > I would recommend just skipping the conditional here. No need to do > > that. You can basically just not encode the page below if you failed > > to allocate it. > > > >> + > >> + order =3D 0; > >> + } > >> > >> - nc->va =3D page ? page_address(page) : NULL; > >> + nc->encoded_page =3D page_frag_encode_page(page, order, > >> + page_is_pfmemalloc(pa= ge)); > > > > I would just follow the same logic with the ternary operator here. If > > page is set then encode the page, else just set it to 0. > > I am assuming you meant something like below, right? Yes, basically what we did in the function below. > static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, > gfp_t gfp_mask) > { > unsigned long order =3D PAGE_FRAG_CACHE_MAX_ORDER; > struct page *page =3D NULL; > gfp_t gfp =3D gfp_mask; > > #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) > gfp_mask =3D (gfp_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_COMP | > __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC; > page =3D __alloc_pages(gfp_mask, PAGE_FRAG_CACHE_MAX_ORDER, > numa_mem_id(), NULL); > #endif > if (unlikely(!page)) { > page =3D __alloc_pages(gfp, 0, numa_mem_id(), NULL); > order =3D 0; > } > > nc->encoded_page =3D page ? > page_frag_encode_page(page, order, page_is_pfmemalloc(pag= e)) : 0; > > return page; > } >