From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 264A0C5321D for ; Mon, 26 Aug 2024 16:47:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 893176B0083; Mon, 26 Aug 2024 12:47:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8437E6B0085; Mon, 26 Aug 2024 12:47:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 70AA16B0088; Mon, 26 Aug 2024 12:47:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 53D7B6B0083 for ; Mon, 26 Aug 2024 12:47:30 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id E6C64C109F for ; Mon, 26 Aug 2024 16:47:29 +0000 (UTC) X-FDA: 82494977418.08.5E312DB Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) by imf28.hostedemail.com (Postfix) with ESMTP id 0F0BBC0004 for ; Mon, 26 Aug 2024 16:47:27 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=CE7+4B4f; spf=pass (imf28.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.128.48 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724690780; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Szx7H5nibeV9OXvQnlfmFvfwbgNiPLWHXneUeuVFRCE=; b=3rKV8RTX47mAiIY9YqrUQLgbiLU0n6UTWvtlDFzTMZMl3B5yvg7nds7WAkGzT6uJVX8ZEM taHVCXqlbM+R/3A/QVRFCHYq71ZKD6A1Eb82uMCAB5v6nIxHXphv6OD3GDfruu/tkfMGNq CzDY5XNMSvuQ6NH+iQnEn/qD/xicymQ= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=CE7+4B4f; spf=pass (imf28.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.128.48 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724690780; a=rsa-sha256; cv=none; b=OvwjqOcYYwtbs1niJFpKMearWD+d7irOej98gbUwy+jE1ehfhhiIMPtqkxmuCQgeLoxuxZ KbC02Ws+hGKX8y0vDdRcdS44DIZp8owKQSNM207/0WTK8WpheNGhNcSuCkGvlKCBgljLP+ Mz+h7DkZuxRmSQJLu9cpyRbvlOkx7Co= Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-4280bbdad3dso39267965e9.0 for ; Mon, 26 Aug 2024 09:47:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724690846; x=1725295646; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Szx7H5nibeV9OXvQnlfmFvfwbgNiPLWHXneUeuVFRCE=; b=CE7+4B4f2iKxTlo1gaKOF/CpetHhXshb3kRW8WT3LrS6NprcVzEptWGklK4ivn3Xjy dfna8gb8a1IyUUBnj59J3wOLaOxWW7m3OaykDxbG9QkEEFWkUNi7DjkGx7OUPlZHr8tQ vJJU0rJTH2Ror+7WiUOIYg5F32UnC2kf6jQTaZxivl4/FDoDvAFbSe4kRWnt94tXCFB2 Rkf9wFEH5VcSXTEnySazMW1ZrLL88Q0K8jVh5nN7jmM0zQdqiUs+WcgPeMk5rUXOzHZR QrAiGDDDBA53o9NceQqoTDBXsbmsiof/cGKgRpsgHh8JD5oZjUI1DCsQu0UEeYgMFfur GAlg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724690846; x=1725295646; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Szx7H5nibeV9OXvQnlfmFvfwbgNiPLWHXneUeuVFRCE=; b=bOoDujLG3uBROOY8PQgxVblh3kjqp3o1iAHVgkZTGUl2peTmzcWNhPaO69JGNR1uul efkpYa/vXHXhjXE/zXWsE4VN+FLYoXXei0tr9pMdiX+avNJQJe6sbj47NB7vQRoWIrs8 CG9LiipO+YYJ1HmbitgFxtNJWROpZOlzNvMkrqoJ2JXZtSnK5uYQdwo1QVlLkRmpTkT+ vLn+O3YNfeCNzrU8t01K01XaRNkjQEY/nvIEJnnb3SfyDIh0R2TDeosrJcmKiyven++f q7x3Pm6RSqi0cZFpXQOvY/OAXIR1DnEoxV/+L16Y0e7E8HgdvHiMJAzdYpFU/17IZSWW sZPw== X-Forwarded-Encrypted: i=1; AJvYcCVkFK/v/XzlW0zhqSJVQnXTN8wrRcHO8NNzp8yAVHjhjwXcOpV7YbyPwMfb/iTF6BZacH7Qc0A3nQ==@kvack.org X-Gm-Message-State: AOJu0YyKkKFBCWkRSfRNhd22U9gLjb2o34Q9EXIV6ISkNwsJ+Bd9mYpm ue/3brtb9TedKXn8eLXolkTYCfkPQqIU6Y8+UgB0M4jiDxT/wNwJOzB5xGNM2BCoTMDmaBi/rqu Onb0NH9jrYIbNnyZ8M+RGqTriumOBDQ== X-Google-Smtp-Source: AGHT+IEx3LpkKHy8rXxvKdKLggv+6qoGi/cT0E5RSpquS8L+eYvXm7iMq1KqgRqmbOCqXhlUNkBsnxCmM2ap1ORjATY= X-Received: by 2002:a05:6000:82:b0:373:591:14e4 with SMTP id ffacd0b85a97d-373118c86d4mr6482636f8f.49.1724690846258; Mon, 26 Aug 2024 09:47:26 -0700 (PDT) MIME-Version: 1.0 References: <20240826124021.2635705-1-linyunsheng@huawei.com> <20240826124021.2635705-7-linyunsheng@huawei.com> In-Reply-To: <20240826124021.2635705-7-linyunsheng@huawei.com> From: Alexander Duyck Date: Mon, 26 Aug 2024 09:46:49 -0700 Message-ID: Subject: Re: [PATCH net-next v15 06/13] mm: page_frag: reuse existing space for 'size' and 'pfmemalloc' To: Yunsheng Lin Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Morton , linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: qfah484wm4btbo7wzokifn8f3w36mqni X-Rspam-User: X-Rspamd-Queue-Id: 0F0BBC0004 X-Rspamd-Server: rspam02 X-HE-Tag: 1724690847-62292 X-HE-Meta: U2FsdGVkX1+fKAdaZGEIcO1sjxgRe2mURb0YPrIUlzepN7/VsW6f5F6+fnV9qHHVUNLj15BooF7/KC5gd9hz/Tq+WMVUt1uzk9/XZBO0tlayWpVtMnMh9B8g41OeETJc5PzJUEbYo60ZNf2s13WFuq16/A281hL0sxpT0VnCE2dh1ah2pHIY/Qp+x8TAseaSYZYFBpJVMbfwQUEB17wTngxZ3xZ7ii6hSESDnqfPtinL9KoW8rSM2vaK0Y02cmKk40diyh8ogSGsJGaReG3J8nI/AUmw3bMEW2Yt/E9pR3VVKa0EeX00cekj6ITg0TYOGQV2PM0i7K2KzCilkEHSqqJsFO9MxOKt0dXcKkXcOzyazwkm3+04zizPwNBM2fsYxJsWMFPY4IFh2fU200S35KBaugj6oBnhvXuOO/gb/oYYXKuEDUoY1qDxyMzPH9UK4OfbGlIx8XeFg4HKnGT13S+CFMr9Mm3v3R91uVkiUg/PPAEVtYriW/N+uDL7Oijoz7Y2j2O5kM5A00uqBf+fsDKaBzibdWIx9ve2p3WFuE6J2ITmTD73i+LHcXFeAdPylBc98sAae9Tap79oZFZX2leuZ8fDVY6Qyt3TEyrY6XX7t4CeMuIzoS6xRlN0OodRpTZkN06od86eVjo9nR7jKiPzVfBO8h0fAXbYmxhQymJLgURqvxyVHRrE0cF2qSBpXScktmu2r3XWc4usMhvd7O9xEDDPfiXDoqD7JvjRNIPHzl4f4JBV+xi9Sl9y1ukaIaeH+5ZpJ05+BsJxWW2fIEROEHf7JWqK/m0C9qGQND3BQ1nQawCjfj/hiEgedcoLGkUnb6sgp7YXl4w/nQ7xo/zXgqpgzWhduGHdS6HaR1oi78hDbhbTOZv9hiqOTwSBzYHt3jyi9YX1AAKQAEzeP+mVNZ14ZyLes1401NjExK61oZmvaM3vce9mXx6Y76xQ41s5vUnXerNvfvBDD56 fEPUA6oQ 2yVm/b/luxqxZlUphpYstGj9a/hPsH/CrtxU8gUf+L4ix9qqa9C22ITXxmg9CA2iKX7eyR4TCEIxnvNYyM6gpeeFtmslM98fhOW+JkfIUCGPgehauWahve2/TCTm6enMyk7wcvFYQPxy6P0v/6fCajbb0zsOuO0XOWfk/27oJNw0ChcEmuDmzmeP7LTAKA/JhrpJUneaZMr3ZB8e9UWHN0E6AXV5Hyl7YyvZ2NIwdxOv+rQg3Pgi5WtzAZT+THgUJLCN0gnaL2XeLTI8NqRw5LFZtBNs1B5qHZ13bTUKUVUqYfKkrb3TDyYZI+7Y3ezWYsjqzdOSqXXf+u0s5I55nBZiYNI1KSNqCp1Z2lT+H/kOZenWfJ9f3Shbi3hzdZp6PkZNfu/W7gKvdVCES3NjcgtSH5WZfbBFgMqSF X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Aug 26, 2024 at 5:46=E2=80=AFAM Yunsheng Lin wrote: > > Currently there is one 'struct page_frag' for every 'struct > sock' and 'struct task_struct', we are about to replace the > 'struct page_frag' with 'struct page_frag_cache' for them. > Before begin the replacing, we need to ensure the size of > 'struct page_frag_cache' is not bigger than the size of > 'struct page_frag', as there may be tens of thousands of > 'struct sock' and 'struct task_struct' instances in the > system. > > By or'ing the page order & pfmemalloc with lower bits of > 'va' instead of using 'u16' or 'u32' for page size and 'u8' > for pfmemalloc, we are able to avoid 3 or 5 bytes space waste. > And page address & pfmemalloc & order is unchanged for the > same page in the same 'page_frag_cache' instance, it makes > sense to fit them together. > > After this patch, the size of 'struct page_frag_cache' should be > the same as the size of 'struct page_frag'. > > CC: Alexander Duyck > Signed-off-by: Yunsheng Lin > --- > include/linux/mm_types_task.h | 19 ++++++----- > include/linux/page_frag_cache.h | 60 +++++++++++++++++++++++++++++++-- > mm/page_frag_cache.c | 51 +++++++++++++++------------- > 3 files changed, 97 insertions(+), 33 deletions(-) > > diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.= h > index cdc1e3696439..a8635460e027 100644 > --- a/include/linux/mm_types_task.h > +++ b/include/linux/mm_types_task.h > @@ -50,18 +50,21 @@ struct page_frag { > #define PAGE_FRAG_CACHE_MAX_SIZE __ALIGN_MASK(32768, ~PAGE_MASK) > #define PAGE_FRAG_CACHE_MAX_ORDER get_order(PAGE_FRAG_CACHE_MAX_SIZ= E) > struct page_frag_cache { > - void *va; > -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) > + /* encoded_page consists of the virtual address, pfmemalloc bit a= nd order > + * of a page. > + */ > + unsigned long encoded_page; > + > + /* we maintain a pagecount bias, so that we dont dirty cache line > + * containing page->_refcount every time we allocate a fragment. > + */ > +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) && (BITS_PER_LONG <=3D 32) > __u16 offset; > - __u16 size; > + __u16 pagecnt_bias; > #else > __u32 offset; > + __u32 pagecnt_bias; > #endif > - /* we maintain a pagecount bias, so that we dont dirty cache line > - * containing page->_refcount every time we allocate a fragment. > - */ > - unsigned int pagecnt_bias; > - bool pfmemalloc; > }; > > /* Track pages that require TLB flushes */ > diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_ca= che.h > index 0a52f7a179c8..372d6ed7e20a 100644 > --- a/include/linux/page_frag_cache.h > +++ b/include/linux/page_frag_cache.h > @@ -3,18 +3,74 @@ > #ifndef _LINUX_PAGE_FRAG_CACHE_H > #define _LINUX_PAGE_FRAG_CACHE_H > > +#include > +#include > #include > +#include > #include > #include > > +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) > +/* Use a full byte here to enable assembler optimization as the shift > + * operation is usually expecting a byte. > + */ > +#define PAGE_FRAG_CACHE_ORDER_MASK GENMASK(7, 0) > +#define PAGE_FRAG_CACHE_PFMEMALLOC_SHIFT 8 > +#define PAGE_FRAG_CACHE_PFMEMALLOC_BIT BIT(PAGE_FRAG_CACHE_PFMEM= ALLOC_SHIFT) > +#else > +/* Compiler should be able to figure out we don't read things as any val= ue > + * ANDed with 0 is 0. > + */ > +#define PAGE_FRAG_CACHE_ORDER_MASK 0 > +#define PAGE_FRAG_CACHE_PFMEMALLOC_SHIFT 0 > +#define PAGE_FRAG_CACHE_PFMEMALLOC_BIT BIT(PAGE_FRAG_CACHE_PFMEM= ALLOC_SHIFT) > +#endif > + > +static inline unsigned long page_frag_encode_page(struct page *page, > + unsigned int order, > + bool pfmemalloc) > +{ > + BUILD_BUG_ON(PAGE_FRAG_CACHE_MAX_ORDER > PAGE_FRAG_CACHE_ORDER_MA= SK); > + BUILD_BUG_ON(PAGE_FRAG_CACHE_PFMEMALLOC_BIT >=3D PAGE_SIZE); > + > + return (unsigned long)page_address(page) | > + (order & PAGE_FRAG_CACHE_ORDER_MASK) | > + ((unsigned long)pfmemalloc << PAGE_FRAG_CACHE_PFMEMALLOC_= SHIFT); > +} > + > +static inline unsigned long page_frag_encoded_page_order(unsigned long e= ncoded_page) > +{ > + return encoded_page & PAGE_FRAG_CACHE_ORDER_MASK; > +} > + > +static inline bool page_frag_encoded_page_pfmemalloc(unsigned long encod= ed_page) > +{ > + return !!(encoded_page & PAGE_FRAG_CACHE_PFMEMALLOC_BIT); > +} > + > +static inline void *page_frag_encoded_page_address(unsigned long encoded= _page) > +{ > + return (void *)(encoded_page & PAGE_MASK); > +} > + > +static inline struct page *page_frag_encoded_page_ptr(unsigned long enco= ded_page) > +{ > + return virt_to_page((void *)encoded_page); > +} > + > static inline void page_frag_cache_init(struct page_frag_cache *nc) > { > - nc->va =3D NULL; > + nc->encoded_page =3D 0; > } > > static inline bool page_frag_cache_is_pfmemalloc(struct page_frag_cache = *nc) > { > - return !!nc->pfmemalloc; > + return page_frag_encoded_page_pfmemalloc(nc->encoded_page); > +} > + > +static inline unsigned int page_frag_cache_page_size(unsigned long encod= ed_page) > +{ > + return PAGE_SIZE << page_frag_encoded_page_order(encoded_page); > } > > void page_frag_cache_drain(struct page_frag_cache *nc); So how many of these additions are actually needed outside of the page_frag_cache.c file? I'm just wondering if we could look at moving most of these into the c file itself instead of making them accessible to all callers as I don't believe we currently have anyone looking into the size of the frag cache or anything like that and I would prefer to avoid exposing such functionality if possible. As the non-order0 allocation problem with this has pointed out people will exploit any interface exposed even if unintentionally. I would want to move the size/order logic as well as splitting out the virtual address as we shouldn't be allowing the user to look at that without going through an allocation function.