From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F3ACC25B75 for ; Wed, 15 May 2024 13:12:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C44036B03BF; Wed, 15 May 2024 09:12:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BF4106B03C1; Wed, 15 May 2024 09:12:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ABD7B6B03C3; Wed, 15 May 2024 09:12:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 8E0BF6B03BF for ; Wed, 15 May 2024 09:12:55 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 076AD141523 for ; Wed, 15 May 2024 13:12:55 +0000 (UTC) X-FDA: 82120670310.12.2200047 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by imf15.hostedemail.com (Postfix) with ESMTP id 63722A0029 for ; Wed, 15 May 2024 13:12:51 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=none; spf=pass (imf15.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715778773; a=rsa-sha256; cv=none; b=XTtnoM886tKF0zypx0WsH6m6dnVGm4p82r6ZrkprcRdPEXa2cgbUCk0XM/r3rpcuctbFOD WJ861vuHPByWJCiEzFFhfV/tWRPa0bPDGDo2sYSXGsjT0rBdQE3C3giiVebuxatAhP05jW 2MELfuWlYksZhor9OBNWrF7bI8XzIRA= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=none; spf=pass (imf15.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715778773; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yCjipTYX6NeIbyvwvOFcZRg33GjA6EnXlF/ICoHJ5Fw=; b=p33WTHKRVvhq72dXSIrqDEjA3SwN9aBxWGLrc1i6zKwOiV86yaS3tUsw6hdb6dkNBYaWtV pnSGlH8ZJmPLi6676pppcLxtMlbKGhYTAf7L6adMT2x9UVWAg3c6Tddfc3CJgI0PvasNrI tsptPndk8ERET6ssznIC9LTWRTV7ZEc= Received: from mail.maildlp.com (unknown [172.19.163.252]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4VfYTW6PTqz1S5G8; Wed, 15 May 2024 21:09:19 +0800 (CST) Received: from dggpemm500005.china.huawei.com (unknown [7.185.36.74]) by mail.maildlp.com (Postfix) with ESMTPS id EAFB4180080; Wed, 15 May 2024 21:12:48 +0800 (CST) Received: from localhost.localdomain (10.69.192.56) by dggpemm500005.china.huawei.com (7.185.36.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Wed, 15 May 2024 21:12:48 +0800 From: Yunsheng Lin To: , , CC: , , Yunsheng Lin , Alexander Duyck , Andrew Morton , Subject: [RFC v4 05/13] mm: page_frag: use initial zero offset for page_frag_alloc_align() Date: Wed, 15 May 2024 21:09:24 +0800 Message-ID: <20240515130932.18842-6-linyunsheng@huawei.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20240515130932.18842-1-linyunsheng@huawei.com> References: <20240515130932.18842-1-linyunsheng@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.69.192.56] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To dggpemm500005.china.huawei.com (7.185.36.74) X-Rspamd-Queue-Id: 63722A0029 X-Rspam-User: X-Rspamd-Server: rspam12 X-Stat-Signature: xrcnmb3fsuwmpb6h3ozj7m5qwtn1rt1e X-HE-Tag: 1715778771-555005 X-HE-Meta: U2FsdGVkX1/S9rZEMVMsepjGc0UIM41xOZDCoANpbdL58HehFVjdOFvTadEUbU5FrRJm+giconHZogAh6qhQiMyUbslpQIMozHi6VN322hIPeXx81jWETzKZmVAUU9Y+exa2maY1vLOkJnUC7oUTOXlOZzoABkJ8M+AsjCClKoO4PLcAWiZxiCOe7vl4Pb4Sa2yWQC8R045Kgg+8l2czrl7PuxtLGvAWRhJuhaAAe2030jSH44lrutTvk3C4vV71W4YEzum78u/XC0Xw59Kr7YLz8VK95TjAMt7GIw5lD/GqTYYCW0zZiOrjewUS18U+7q5kJKXZvzmTWF79Lx7jfAyu3ZaxgPTtwR3JY28uOTl9E0S4xLnexJkjaQJJkb/QYFUnUjpksnWv2MG8iC71zm06MZ7xDAOL6M1FG0Cf2pshhiwYa0G3uRluvp7bAnMQGFmDEYHNv65nHxqtB1mAcrFnuUxAT0f7CZQQr7B1neoMnGiKXAlSoTQMJCJ208v2zVv0SG8sXjqvQu9MWSoGLx8pPmdwoNUDDR31guQTukr35uTE/ruIazoUMpvTnyMBbKOaN0btHw5EPDpN3n3hRTRM3oRk6VoQb6Xtp50g3IdsNx7oVxSqZ2P3SeHpN8GJBvI869+WG3jVaifN7vJzDET/UbIgO48l1ipH+Dj1DW50H03zM0QATEW2q8k+w7jQHrQ4/gO8aQ8KKEyhWHjU3qFEIOnKqXrX1UXWaCZoBcNxOJ9wZuc7tEE7IsK0rf2M0bFTd+ElrOjnLZx/nzF+UieV+duqv43+EdqX13OKfTXiLgfxFZHbWOdQAlSy9b295XcxB85jIwCQjfZDz4+10ymvwAv/l0KkGVPaliKl0SBY+r9Vubz/p0dW04ccFreYYlPbred7ufvyKojNEUUerqoXXGQB9BcX8KVX8ehy+GYDZY9QMny6JbqamOKR3DophzvN9yFU/fxaPpQd5V2 /WaV1sNW kkjf51nCKj9CqDLa2ksG/iUS7WT4kKQ2b0fDKpD3bsDAARyYXXK1sfB9Rf1e2dPektNjk+GEiQ7X0FkVzBvGMhmRdGgFarr0o2LgjLrXQIG6eW2nUi23yPXvFG4IVf5RVp4xDeTDOfiwVQIpPjPujtswjMpWUzu0h9mXbqfTY+0MUn1pds69SSQrva2wWGYhYDR9PAsHFYSFe3Zj0dKvdceFtOr0XH6qkCUgsUiibjO630KCGt9ZTHUCCwb5ZOG7cGoCHdXoGCSXxBO3D7epWsTLvhCMZ96e1+UE8/cxDndl44E/8VZEpJDmfV8L/wGPq27E2cU42i5HINysxDbC/ePVykA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: We are above to use page_frag_alloc_*() API to not just allocate memory for skb->data, but also use them to do the memory allocation for skb frag too. Currently the implementation of page_frag in mm subsystem is running the offset as a countdown rather than count-up value, there may have several advantages to that as mentioned in [1], but it may have some disadvantages, for example, it may disable skb frag coaleasing and more correct cache prefetching We have a trade-off to make in order to have a unified implementation and API for page_frag, so use a initial zero offset in this patch, and the following patch will try to make some optimization to aovid the disadvantages as much as possible. As offsets is added due to alignment requirement before actually checking if the cache is enough, which might make it exploitable if caller passes a align value bigger than 32K mistakenly. As we are allowing order 3 page allocation to fail easily under low memory condition, align value bigger than PAGE_SIZE is not really allowed, so add a 'align > PAGE_SIZE' checking in page_frag_alloc_va_align() to catch that. 1. https://lore.kernel.org/all/f4abe71b3439b39d17a6fb2d410180f367cadf5c.camel@gmail.com/ CC: Alexander Duyck Signed-off-by: Yunsheng Lin --- include/linux/page_frag_cache.h | 2 +- mm/page_frag_cache.c | 48 ++++++++++++++------------------- 2 files changed, 21 insertions(+), 29 deletions(-) diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h index 635b67ceb939..9da7cbd0ee47 100644 --- a/include/linux/page_frag_cache.h +++ b/include/linux/page_frag_cache.h @@ -32,7 +32,7 @@ static inline void *page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask, unsigned int align) { - WARN_ON_ONCE(!is_power_of_2(align)); + WARN_ON_ONCE(!is_power_of_2(align) || align > PAGE_SIZE); return __page_frag_alloc_align(nc, fragsz, gfp_mask, -align); } diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c index 64993b5d1243..152ae5dec58a 100644 --- a/mm/page_frag_cache.c +++ b/mm/page_frag_cache.c @@ -65,9 +65,8 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask, unsigned int align_mask) { - unsigned int size = PAGE_SIZE; + unsigned int size, offset; struct page *page; - int offset; if (unlikely(!nc->va)) { refill: @@ -75,10 +74,6 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc, if (!page) return NULL; -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size = nc->size; -#endif /* Even if we own the page, we do not use atomic_set(). * This would break get_page_unless_zero() users. */ @@ -87,11 +82,25 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc, /* reset page count bias and offset to start of new frag */ nc->pfmemalloc = page_is_pfmemalloc(page); nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - nc->offset = size; + nc->offset = 0; } - offset = nc->offset - fragsz; - if (unlikely(offset < 0)) { +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + /* if size can vary use size else just use PAGE_SIZE */ + size = nc->size; +#else + size = PAGE_SIZE; +#endif + + offset = __ALIGN_KERNEL_MASK(nc->offset, ~align_mask); + if (unlikely(offset + fragsz > size)) { + /* fragsz is not supposed to be bigger than PAGE_SIZE as we are + * allowing order 3 page allocation to fail easily under low + * memory condition. + */ + if (WARN_ON_ONCE(fragsz > PAGE_SIZE)) + return NULL; + page = virt_to_page(nc->va); if (!page_ref_sub_and_test(page, nc->pagecnt_bias)) @@ -102,33 +111,16 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc, goto refill; } -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size = nc->size; -#endif /* OK, page count is 0, we can safely set it */ set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); /* reset page count bias and offset to start of new frag */ nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - offset = size - fragsz; - if (unlikely(offset < 0)) { - /* - * The caller is trying to allocate a fragment - * with fragsz > PAGE_SIZE but the cache isn't big - * enough to satisfy the request, this may - * happen in low memory conditions. - * We don't release the cache page because - * it could make memory pressure worse - * so we simply return NULL here. - */ - return NULL; - } + offset = 0; } nc->pagecnt_bias--; - offset &= align_mask; - nc->offset = offset; + nc->offset = offset + fragsz; return nc->va + offset; } -- 2.33.0