From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D6CBC3DA49 for ; Tue, 23 Jul 2024 13:31:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DF5D96B0092; Tue, 23 Jul 2024 09:31:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D569A6B008C; Tue, 23 Jul 2024 09:31:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BF8386B0092; Tue, 23 Jul 2024 09:31:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 9ED9B6B008A for ; Tue, 23 Jul 2024 09:31:39 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E40D5161ECF for ; Tue, 23 Jul 2024 13:31:38 +0000 (UTC) X-FDA: 82371104676.25.B28C7E7 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by imf25.hostedemail.com (Postfix) with ESMTP id 717B0A1A43 for ; Tue, 23 Jul 2024 13:19:06 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=none; spf=pass (imf25.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721740701; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=N548hP0KgfCtH6TgiBpsq2aqDz/xv/rlUuEnE0YL03I=; b=qujy14vp0QPusu8B3QtkgBs9ecqJE44jo/Hl6knAEe2VuLOgN9tQhO/yN4AYm51kZ5Utqk EqvJ766YWlDu6qyh+5+9/OzTPO6Ht8g9PwemJ6ZWI4DmDdcNYLaORX9w7Q7gwtCIE2YOiA TCFrxBRqcTpuhr4LNcLrwFfgYKVmQzs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721740701; a=rsa-sha256; cv=none; b=vwsJbE8cF3pnzpV2KoYy7b6m4fI+CIYakSxNx1M/1Ez4UY0INbGyemv59qXtGnOY7jclqS lHyKSCTD4AmEGSVKCxyXU2+cZt1Fu8S+fqGrKjP0sThV40Z2IuCTvuVkwXepNjGEfjPJUn 2xW9m9yZwUQIv8QNqIG2bluz9D4PLaI= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=none; spf=pass (imf25.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.252]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4WSyKK1QkWzyN4B; Tue, 23 Jul 2024 21:14:13 +0800 (CST) Received: from dggpemf200006.china.huawei.com (unknown [7.185.36.61]) by mail.maildlp.com (Postfix) with ESMTPS id C584918009D; Tue, 23 Jul 2024 21:19:02 +0800 (CST) Received: from [10.67.120.129] (10.67.120.129) by dggpemf200006.china.huawei.com (7.185.36.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 23 Jul 2024 21:19:02 +0800 Message-ID: Date: Tue, 23 Jul 2024 21:19:02 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC v11 01/14] mm: page_frag: add a test module for page_frag To: Alexander Duyck CC: , , , , , Andrew Morton , References: <20240719093338.55117-1-linyunsheng@huawei.com> <20240719093338.55117-2-linyunsheng@huawei.com> Content-Language: en-US From: Yunsheng Lin In-Reply-To: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.120.129] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggpemf200006.china.huawei.com (7.185.36.61) X-Stat-Signature: y9aihgdixusrjg3idthamprx9tub3acd X-Rspamd-Queue-Id: 717B0A1A43 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1721740746-424864 X-HE-Meta: U2FsdGVkX18s3JFNvAurLTvpff2BWy2QrDSxGmTy9mLqfGtpRR2h7PrHGMfxIo+g7D0/aKFCJpEito07wugbRyVcxs0MWnHunklMav+eBqvB6vE7Xb2IK9fC8SZNoJsi1zwHKdVi00M+6aIT5x8CL/zjNfSoY0QE+yvasfuECUdjDQ3uY6AzZZx1MyyddIP3dSDLGXRt3GsjV2mjYdis6wN+/XgP9lT4oVc7KZQlC77hHUqJ+XFjqC6iiB1itNvz5uS6sezSYxsyhR3NMP38/gQFgx7WhUdg2wTNDzrtzBtayziB6iDymzz7Q3hjqhsZPk2RzNU15v/4mphCO42l+84YZh8Qu8BWvutJ83dgoCz0r2zp7qcHM7BenunVRAqjHNLSLlU0Lpu7OtrYqpGQZxBhw8JEtBBBzwur/IbM9cFUKct6A4YYCohSS32K9A5RLu0Mbi2ErR7+krlO13t9t2x38XjnDmo+BIMiyeFVbKGvOP89+/I2+wy3bwBdVsPSiJ+UbHKpOwyFI15sq+AFBFp2mkJ0PsApX7ZN3LNg9NIk5sBJXuufHnpxVEk8K/mNOVVIJm7ltFeSyQ5E6IR89I6XfB3gru13q+piAEBOn05scTNfI5FPLxNEF442eMuVug+5H6lF6Jl0j03aoN8Az/8LzZf2hJ6Xh61shOO4Abo6kkDC7CjDzhrfMyNqY32J7L4BerCJRV9l+me76Cc5un3It259qa8pusRceM6Cghkar7m66hggxP3eXuND25k9kCjSYPbEHzF0Np73UB0oDUAm6XKQa3IoXkHMmzT5m+32OaFGEiZ4paDdzLZPF/Lpvn05pQ7+kBC6FEksUGzLFjaoIWBUjhSnIiSwhDHNZI76bGCEWRqkw7zlUAWztRlINDO/cL0GUFgDzQWcS2e+GTpOjUom0XE3frfmuFhaaUM21wQu1JispmGFGFL6B8oEemr3gpAkSL9ohJB5BEV 1p+c+/IK rX+jT5LNYMt1OFAN9tzqb1cUg+72FBlwKuYUPZo8KDCeWD5qDtc3R+HEEFjWcufcNADZaZGXdSkF0BkbILV+ZEn4BuedHBqmZ/a7xOqLtP75koqvcTbPbCFdYPdsD/7CrFc2z5aWwJyFWmo5cvu+R4yLupALRcggUNHQiZibuot99fOw0aJywxnf9i6lXukUEnukI+mksPegkLMUYmldq+ynFXyzO8XU9zdxUEgJ7rFubf31kuVnHC5Qrv2FcqufE/okow7esWzOV8O1dNncbOOY6BJm2VJyfL7+DqrZoY7/dBEDe14divaTbHPtAJwz6LVj9CItB6oJwHJwhQ5dvCMQws0wQS8kxvDQMsY12bsv+9f0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/7/22 1:34, Alexander Duyck wrote: > On Fri, Jul 19, 2024 at 2:36 AM Yunsheng Lin wrote: >> >> Basing on the lib/objpool.c, change it to something like a >> ptrpool, so that we can utilize that to test the correctness >> and performance of the page_frag. >> >> The testing is done by ensuring that the fragment allocated >> from a frag_frag_cache instance is pushed into a ptrpool >> instance in a kthread binded to a specified cpu, and a kthread >> binded to a specified cpu will pop the fragment from the >> ptrpool and free the fragment. >> >> We may refactor out the common part between objpool and ptrpool >> if this ptrpool thing turns out to be helpful for other place. >> >> CC: Alexander Duyck >> Signed-off-by: Yunsheng Lin >> --- >> mm/Kconfig.debug | 8 + >> mm/Makefile | 1 + >> mm/page_frag_test.c | 393 ++++++++++++++++++++++++++++++++++++++++++++ >> 3 files changed, 402 insertions(+) >> create mode 100644 mm/page_frag_test.c > > I might have missed it somewhere. Is there any reason why this isn't > in the selftests/mm/ directory? Seems like that would be a better fit > for this. > >> diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug >> index afc72fde0f03..1ebcd45f47d4 100644 >> --- a/mm/Kconfig.debug >> +++ b/mm/Kconfig.debug >> @@ -142,6 +142,14 @@ config DEBUG_PAGE_REF >> kernel code. However the runtime performance overhead is virtually >> nil until the tracepoints are actually enabled. >> >> +config DEBUG_PAGE_FRAG_TEST > > This isn't a "DEBUG" feature. This is a test feature. > >> + tristate "Test module for page_frag" >> + default n >> + depends on m && DEBUG_KERNEL > > I am not sure it is valid to have a tristate depend on being built as a module. Perhaps I was copying the wrong pattern from TEST_OBJPOOL in lib/Kconfig.debug. Perhaps mm/dmapool_test.c and DMAPOOL_TEST* *was more appropriate pattern for test module for page_frag? > > I think if you can set it up as a selftest it will have broader use as > you could compile it against any target kernel going forward and add > it as a module rather than having to build it as a part of a debug > kernel. It seems tools/testing/selftests/mm/* are all about userspace testing tool, and testing kernel module seems to be in the same directory with the code to be tested? > >> + help >> + This builds the "page_frag_test" module that is used to test the >> + correctness and performance of page_frag's implementation. >> + >> config DEBUG_RODATA_TEST >> bool "Testcase for the marking rodata read-only" ... >> + >> + /* >> + * here we allocate percpu-slot & objs together in a single >> + * allocation to make it more compact, taking advantage of >> + * warm caches and TLB hits. in default vmalloc is used to >> + * reduce the pressure of kernel slab system. as we know, >> + * minimal size of vmalloc is one page since vmalloc would >> + * always align the requested size to page size >> + */ >> + if (gfp & GFP_ATOMIC) >> + slot = kmalloc_node(size, gfp, cpu_to_node(i)); >> + else >> + slot = __vmalloc_node(size, sizeof(void *), gfp, >> + cpu_to_node(i), >> + __builtin_return_address(0)); > > When would anyone ever call this with atomic? This is just for your > test isn't it? > >> + if (!slot) >> + return -ENOMEM; >> + >> + memset(slot, 0, size); >> + pool->cpu_slots[i] = slot; >> + >> + objpool_init_percpu_slot(pool, slot); >> + } >> + >> + return 0; >> +} ... >> +/* release whole objpool forcely */ >> +static void objpool_free(struct objpool_head *pool) >> +{ >> + if (!pool->cpu_slots) >> + return; >> + >> + /* release percpu slots */ >> + objpool_fini_percpu_slots(pool); >> +} >> + > > Why add all this extra objpool overhead? This seems like overkill for > what should be a simple test. Seems like you should just need a simple > array located on one of your CPUs. I'm not sure what is with all the > extra overhead being added here. As mentioned in the commit log: "We may refactor out the common part between objpool and ptrpool if this ptrpool thing turns out to be helpful for other place." The next thing I am trying to do is to use ptrpool to optimization the pcp for mm subsystem. so I would rather not tailor the ptrpool for page_frag_test, and it doesn't seem to affect the testing that much. > >> +static struct objpool_head ptr_pool; >> +static int nr_objs = 512; >> +static atomic_t nthreads; >> +static struct completion wait; >> +static struct page_frag_cache test_frag; >> + >> +static int nr_test = 5120000; >> +module_param(nr_test, int, 0); >> +MODULE_PARM_DESC(nr_test, "number of iterations to test"); >> + >> +static bool test_align; >> +module_param(test_align, bool, 0); >> +MODULE_PARM_DESC(test_align, "use align API for testing"); >> + >> +static int test_alloc_len = 2048; >> +module_param(test_alloc_len, int, 0); >> +MODULE_PARM_DESC(test_alloc_len, "alloc len for testing"); >> + >> +static int test_push_cpu; >> +module_param(test_push_cpu, int, 0); >> +MODULE_PARM_DESC(test_push_cpu, "test cpu for pushing fragment"); >> + >> +static int test_pop_cpu; >> +module_param(test_pop_cpu, int, 0); >> +MODULE_PARM_DESC(test_pop_cpu, "test cpu for popping fragment"); >> + >> +static int page_frag_pop_thread(void *arg) >> +{ >> + struct objpool_head *pool = arg; >> + int nr = nr_test; >> + >> + pr_info("page_frag pop test thread begins on cpu %d\n", >> + smp_processor_id()); >> + >> + while (nr > 0) { >> + void *obj = objpool_pop(pool); >> + >> + if (obj) { >> + nr--; >> + page_frag_free(obj); >> + } else { >> + cond_resched(); >> + } >> + } >> + >> + if (atomic_dec_and_test(&nthreads)) >> + complete(&wait); >> + >> + pr_info("page_frag pop test thread exits on cpu %d\n", >> + smp_processor_id()); >> + >> + return 0; >> +} >> + >> +static int page_frag_push_thread(void *arg) >> +{ >> + struct objpool_head *pool = arg; >> + int nr = nr_test; >> + >> + pr_info("page_frag push test thread begins on cpu %d\n", >> + smp_processor_id()); >> + >> + while (nr > 0) { >> + void *va; >> + int ret; >> + >> + if (test_align) { >> + va = page_frag_alloc_align(&test_frag, test_alloc_len, >> + GFP_KERNEL, SMP_CACHE_BYTES); >> + >> + WARN_ONCE((unsigned long)va & (SMP_CACHE_BYTES - 1), >> + "unaligned va returned\n"); >> + } else { >> + va = page_frag_alloc(&test_frag, test_alloc_len, GFP_KERNEL); >> + } >> + >> + if (!va) >> + continue; >> + >> + ret = objpool_push(va, pool); >> + if (ret) { >> + page_frag_free(va); >> + cond_resched(); >> + } else { >> + nr--; >> + } >> + } >> + >> + pr_info("page_frag push test thread exits on cpu %d\n", >> + smp_processor_id()); >> + >> + if (atomic_dec_and_test(&nthreads)) >> + complete(&wait); >> + >> + return 0; >> +} >> + > > So looking over these functions they seem to overlook how the network > stack works in many cases. One of the main motivations for the page > frags approach is page recycling. For example with GRO enabled the > headers allocated to record the frags might be freed for all but the > first. As such you can end up with 17 fragments being allocated, and > 16 freed within the same thread as NAPI will just be recycling the > buffers. > > With this setup it doesn't seem very likely to be triggered since you > are operating in two threads. One test you might want to look at > adding is a test where you are allocating and freeing in the same > thread at a fairly constant rate to test against the "ideal" scenario. I am not sure if the above is still the "ideal" scenario, as you mentioned that most drivers are turning to use page_pool for rx, the page frag is really mostly for tx or skb->data for rx. >