From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30E8AC04FFE for ; Fri, 26 Apr 2024 09:38:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A5A9C6B0087; Fri, 26 Apr 2024 05:38:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A0AF26B0089; Fri, 26 Apr 2024 05:38:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8D2F36B0092; Fri, 26 Apr 2024 05:38:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 6EFE66B0087 for ; Fri, 26 Apr 2024 05:38:13 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E261C1A080F for ; Fri, 26 Apr 2024 09:38:12 +0000 (UTC) X-FDA: 82051182024.09.A0CFBA0 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by imf29.hostedemail.com (Postfix) with ESMTP id 1700F120014 for ; Fri, 26 Apr 2024 09:38:09 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714124291; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=I1+PYbJ9CjKY8J5MKYD47hoc5+/2+VZFxN4rjNvyAts=; b=BvZt9GixdnRnU0H6Bzw+xfWGj4V+xwwof4JoaZVCuUEpirOdcYcFdQJeL/cs/OHO6wekEg EDBvWWIAyNPb1z5yG2uL8psrSXNJf6M7bUUVXrf9i9tw5FQLsxX/x5mEdnSrNi4gOvkNSo K8LwcAzQnemMgg/36SNugIsIRtxdLxM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714124291; a=rsa-sha256; cv=none; b=b3f+7D3hYldHcGPhF0R/mhGNeyeutGPcMaVMaQG9d6f/u/dHVlRZETvVgVpmpkXEJ6b4fh YNqXWyApwteu3UjqlSA6Os37v0FUFWaDwHeqFTp6To0hxHU7XZ5RFIqJJGzzuzcrecmlTI 10ESNGm16ig22vOP1LbcR2V8cptvCPg= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.174]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4VQncr3VWLzwVGG; Fri, 26 Apr 2024 17:34:52 +0800 (CST) Received: from dggpemm500005.china.huawei.com (unknown [7.185.36.74]) by mail.maildlp.com (Postfix) with ESMTPS id 68444140382; Fri, 26 Apr 2024 17:38:05 +0800 (CST) Received: from [10.69.30.204] (10.69.30.204) by dggpemm500005.china.huawei.com (7.185.36.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Fri, 26 Apr 2024 17:38:05 +0800 Subject: Re: [PATCH net-next v2 09/15] mm: page_frag: reuse MSB of 'size' field for pfmemalloc From: Yunsheng Lin To: Alexander H Duyck , , , CC: , , Andrew Morton , References: <20240415131941.51153-1-linyunsheng@huawei.com> <20240415131941.51153-10-linyunsheng@huawei.com> <37d012438d4850c3d7090e784e09088d02a2780c.camel@gmail.com> <8b7361c2-6f45-72e8-5aca-92e8a41a7e5e@huawei.com> <17066b6a4f941eea3ef567767450b311096da22b.camel@gmail.com> Message-ID: Date: Fri, 26 Apr 2024 17:38:04 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.69.30.204] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpemm500005.china.huawei.com (7.185.36.74) X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 1700F120014 X-Stat-Signature: p9ggcksdxxnw56tip456erydsgck48x1 X-Rspam-User: X-HE-Tag: 1714124289-205461 X-HE-Meta: U2FsdGVkX1+RNPYngDKgdB3buDuoHuZ8+uJg2xuEpaxLnal5D7ikKi6pFt/H+R2bGCfJdnWehxzS6zUV3rTmTBsVHYF7k2umlRtE3bZL5vfHyYo1B/apCmhA2IK/0tmqfFzLHxxY6ZBleZ2w7Kg7iCgyDv7T7wjD6fHL/ncLjpn+mcZ0jN0yRwqJ1pvWYlGq0dtW/8t92c+W44dptuJ6x51JtcUHM/F09Mi6aeeP1DunpHQXiYQ3udeO8Rd5FKAhRp6ysH9HA7IF7dt6lVMo7dLjbAPsEKPOyPTYvc5L+tanLAHM2TVO4vCVFvwi4MJU0UXRK+xIbyE5VfnhIJoOlXgl3tXEers0IZhVW9Elewhk3Y1ATYmSP4il8ngKKH5bEdDv++b4mZ7/mHyjCmZDvLvbgjlMiIqc3S7ik2N3uwD2SQTrEUPHUCOBXranFlk99RY3WFlPe9+2DJnzBDV3r+wd1rVHKepg+3K4S50bKqPWM1btF5CU/hEx4oNbk8ef6Oa3hTqgaBjl6BiHl7QZdzn0qW8ptOjLaxG87K5ZFX9UYIvCywnyWRs2+g2j8LPxYGINnt0WK7dMckmzzIWt23TWINjq8PJtn0b1yEO6UwmNvhrgGyLft6mcg20mcqEoIpycs5593JDO+upRa+vA2BL1Wxp8kezw5VFXBUE0+l1x6W9BJYTfZmI4IlHOM6oKm1Ba16nB6CB4cxLNvC0F2dE3wOBRTnxT9LU2YE2hk61AXdojafEZDkJajoOnSDA98LtYkiEV7IY/B4sAHCkR+S4YoE1EgH+QqSSmKfuC3tSoaw2dbrvDcy9D+UjotoR1znwUjAK8gjTzNgyayC607Jq0HFWGHWk6ZUFCU7ybbYakXjFBp3q7ZT7hwWPNVDbY/MnqvAX69C4e/c9VQkFniOox+JXuteF6d+b3RRLbXgg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/4/18 17:39, Yunsheng Lin wrote: ... > >> combining the pagecnt_bias with the va. I'm wondering if it wouldn't >> make more sense to look at putting together the structure something >> like: >> >> #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) >> typedef u16 page_frag_bias_t; >> #else >> typedef u32 page_frag_bias_t; >> #endif >> >> struct page_frag_cache { >> /* page address and offset */ >> void *va; > > Generally I am agreed with combining the virtual address with the > offset for the reason you mentioned below. > >> page_frag_bias_t pagecnt_bias; >> u8 pfmemalloc; >> u8 page_frag_order; >> } > > The issue with the 'page_frag_order' I see is that we might need to do > a 'PAGE << page_frag_order' to get actual size, and we might also need > to do 'size - 1' to get the size_mask if we want to mask out the offset > from the 'va'. > > For page_frag_order, we need to: > size = PAGE << page_frag_order > size_mask = size - 1 > > For size_mask, it seem we only need to do: > size = size_mask + 1 > > And as PAGE_FRAG_CACHE_MAX_SIZE = 32K, which can be fitted into 15 bits > if we use size_mask instead of size. > > Does it make sense to use below, so that we only need to use bitfield > for SIZE < PAGE_FRAG_CACHE_MAX_SIZE in 32 bits system? And 'struct > page_frag' is using a similar '(BITS_PER_LONG > 32)' checking trick. > > struct page_frag_cache { > /* page address and offset */ > void *va; > > #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) && (BITS_PER_LONG <= 32) > u16 pagecnt_bias; > u16 size_mask:15; > u16 pfmemalloc:1; > #else > u32 pagecnt_bias; > u16 size_mask; > u16 pfmemalloc; > #endif > }; > After considering a few different layouts for 'struct page_frag_cache', it seems the below is more optimized: struct page_frag_cache { /* page address & pfmemalloc & order */ void *va; #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) && (BITS_PER_LONG <= 32) u16 pagecnt_bias; u16 size; #else u32 pagecnt_bias; u32 size; #endif } The lower bits of 'va' is or'ed with the page order & pfmemalloc instead of offset or pagecnt_bias, so that we don't have to add more checking for handling the problem of not having enough space for offset or pagecnt_bias for PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE and 32 bits system. And page address & pfmemalloc & order is unchanged for the same page in the same 'page_frag_cache' instance, it makes sense to fit them together. Also, it seems it is better to replace 'offset' with 'size', which indicates the remaining size for the cache in a 'page_frag_cache' instance, and we might be able to do a single 'size >= fragsz' checking for the case of cache being enough, which should be the fast path if we ensure size is zoro when 'va' == NULL. Something like below: #define PAGE_FRAG_CACHE_ORDER_MASK GENMASK(1, 0) #define PAGE_FRAG_CACHE_PFMEMALLOC_BIT BIT(2) struct page_frag_cache { /* page address & pfmemalloc & order */ void *va; #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) && (BITS_PER_LONG <= 32) u16 pagecnt_bias; u16 size; #else u32 pagecnt_bias; u32 size; #endif }; static void *__page_frag_cache_refill(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask, unsigned int align_mask) { gfp_t gfp = gfp_mask; struct page *page; void *va; #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) /* Ensure free_unref_page() can be used to free the page fragment */ BUILD_BUG_ON(PAGE_FRAG_CACHE_MAX_ORDER > PAGE_ALLOC_COSTLY_ORDER); gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC; page = alloc_pages_node(NUMA_NO_NODE, gfp_mask, PAGE_FRAG_CACHE_MAX_ORDER); if (likely(page)) { nc->size = PAGE_FRAG_CACHE_MAX_SIZE - fragsz; va = page_address(page); nc->va = (void *)((unsigned long)va | PAGE_FRAG_CACHE_MAX_ORDER | (page_is_pfmemalloc(page) ? PAGE_FRAG_CACHE_PFMEMALLOC_BIT : 0)); page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE); nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE; return va; } #endif page = alloc_pages_node(NUMA_NO_NODE, gfp, 0); if (likely(page)) { nc->size = PAGE_SIZE - fragsz; va = page_address(page); nc->va = (void *)((unsigned long)va | (page_is_pfmemalloc(page) ? PAGE_FRAG_CACHE_PFMEMALLOC_BIT : 0)); page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE); nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE; return va; } nc->va = NULL; nc->size = 0; return NULL; } void *__page_frag_alloc_va_align(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask, unsigned int align_mask) { #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) unsigned long page_order; #endif unsigned long page_size; unsigned long size; struct page *page; void *va; size = nc->size & align_mask; va = nc->va; #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) page_order = (unsigned long)va & PAGE_FRAG_CACHE_ORDER_MASK; page_size = PAGE_SIZE << page_order; #else page_size = PAGE_SIZE; #endif if (unlikely(fragsz > size)) { if (unlikely(!va)) return __page_frag_cache_refill(nc, fragsz, gfp_mask, align_mask); /* fragsz is not supposed to be bigger than PAGE_SIZE as we are * allowing order 3 page allocation to fail easily under low * memory condition. */ if (WARN_ON_ONCE(fragsz > PAGE_SIZE)) return NULL; page = virt_to_page(va); if (!page_ref_sub_and_test(page, nc->pagecnt_bias)) return __page_frag_cache_refill(nc, fragsz, gfp_mask, align_mask); if (unlikely((unsigned long)va & PAGE_FRAG_CACHE_PFMEMALLOC_BIT)) { free_unref_page(page, compound_order(page)); return __page_frag_cache_refill(nc, fragsz, gfp_mask, align_mask); } /* OK, page count is 0, we can safely set it */ set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); /* reset page count bias and offset to start of new frag */ nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; size = page_size; } va = (void *)((unsigned long)va & PAGE_MASK); va = va + (page_size - size); nc->size = size - fragsz; nc->pagecnt_bias--; return va; } EXPORT_SYMBOL(__page_frag_alloc_va_align);