From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5D6CBC3DA49
	for <linux-mm@archiver.kernel.org>; Tue, 23 Jul 2024 13:31:40 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id DF5D96B0092; Tue, 23 Jul 2024 09:31:39 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id D569A6B008C; Tue, 23 Jul 2024 09:31:39 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id BF8386B0092; Tue, 23 Jul 2024 09:31:39 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14])
	by kanga.kvack.org (Postfix) with ESMTP id 9ED9B6B008A
	for <linux-mm@kvack.org>; Tue, 23 Jul 2024 09:31:39 -0400 (EDT)
Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay07.hostedemail.com (Postfix) with ESMTP id E40D5161ECF
	for <linux-mm@kvack.org>; Tue, 23 Jul 2024 13:31:38 +0000 (UTC)
X-FDA: 82371104676.25.B28C7E7
Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187])
	by imf25.hostedemail.com (Postfix) with ESMTP id 717B0A1A43
	for <linux-mm@kvack.org>; Tue, 23 Jul 2024 13:19:06 +0000 (UTC)
Authentication-Results: imf25.hostedemail.com;
	dkim=none;
	spf=pass (imf25.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com;
	dmarc=pass (policy=quarantine) header.from=huawei.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1721740701;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=N548hP0KgfCtH6TgiBpsq2aqDz/xv/rlUuEnE0YL03I=;
	b=qujy14vp0QPusu8B3QtkgBs9ecqJE44jo/Hl6knAEe2VuLOgN9tQhO/yN4AYm51kZ5Utqk
	EqvJ766YWlDu6qyh+5+9/OzTPO6Ht8g9PwemJ6ZWI4DmDdcNYLaORX9w7Q7gwtCIE2YOiA
	TCFrxBRqcTpuhr4LNcLrwFfgYKVmQzs=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721740701; a=rsa-sha256;
	cv=none;
	b=vwsJbE8cF3pnzpV2KoYy7b6m4fI+CIYakSxNx1M/1Ez4UY0INbGyemv59qXtGnOY7jclqS
	lHyKSCTD4AmEGSVKCxyXU2+cZt1Fu8S+fqGrKjP0sThV40Z2IuCTvuVkwXepNjGEfjPJUn
	2xW9m9yZwUQIv8QNqIG2bluz9D4PLaI=
ARC-Authentication-Results: i=1;
	imf25.hostedemail.com;
	dkim=none;
	spf=pass (imf25.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com;
	dmarc=pass (policy=quarantine) header.from=huawei.com
Received: from mail.maildlp.com (unknown [172.19.163.252])
	by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4WSyKK1QkWzyN4B;
	Tue, 23 Jul 2024 21:14:13 +0800 (CST)
Received: from dggpemf200006.china.huawei.com (unknown [7.185.36.61])
	by mail.maildlp.com (Postfix) with ESMTPS id C584918009D;
	Tue, 23 Jul 2024 21:19:02 +0800 (CST)
Received: from [10.67.120.129] (10.67.120.129) by
 dggpemf200006.china.huawei.com (7.185.36.61) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.2.1544.11; Tue, 23 Jul 2024 21:19:02 +0800
Message-ID: <c7497b53-2dd7-4176-bb70-4a14558d90ab@huawei.com>
Date: Tue, 23 Jul 2024 21:19:02 +0800
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [RFC v11 01/14] mm: page_frag: add a test module for page_frag
To: Alexander Duyck <alexander.duyck@gmail.com>
CC: <davem@davemloft.net>, <kuba@kernel.org>, <pabeni@redhat.com>,
	<netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>, Andrew Morton
	<akpm@linux-foundation.org>, <linux-mm@kvack.org>
References: <20240719093338.55117-1-linyunsheng@huawei.com>
 <20240719093338.55117-2-linyunsheng@huawei.com>
 <CAKgT0UcsBGKR+AGU6wDUpXY48FnEA4hdvvti-YC87=8zfGPLdg@mail.gmail.com>
Content-Language: en-US
From: Yunsheng Lin <linyunsheng@huawei.com>
In-Reply-To: <CAKgT0UcsBGKR+AGU6wDUpXY48FnEA4hdvvti-YC87=8zfGPLdg@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
X-Originating-IP: [10.67.120.129]
X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To
 dggpemf200006.china.huawei.com (7.185.36.61)
X-Stat-Signature: y9aihgdixusrjg3idthamprx9tub3acd
X-Rspamd-Queue-Id: 717B0A1A43
X-Rspam-User: 
X-Rspamd-Server: rspam08
X-HE-Tag: 1721740746-424864
X-HE-Meta: U2FsdGVkX18s3JFNvAurLTvpff2BWy2QrDSxGmTy9mLqfGtpRR2h7PrHGMfxIo+g7D0/aKFCJpEito07wugbRyVcxs0MWnHunklMav+eBqvB6vE7Xb2IK9fC8SZNoJsi1zwHKdVi00M+6aIT5x8CL/zjNfSoY0QE+yvasfuECUdjDQ3uY6AzZZx1MyyddIP3dSDLGXRt3GsjV2mjYdis6wN+/XgP9lT4oVc7KZQlC77hHUqJ+XFjqC6iiB1itNvz5uS6sezSYxsyhR3NMP38/gQFgx7WhUdg2wTNDzrtzBtayziB6iDymzz7Q3hjqhsZPk2RzNU15v/4mphCO42l+84YZh8Qu8BWvutJ83dgoCz0r2zp7qcHM7BenunVRAqjHNLSLlU0Lpu7OtrYqpGQZxBhw8JEtBBBzwur/IbM9cFUKct6A4YYCohSS32K9A5RLu0Mbi2ErR7+krlO13t9t2x38XjnDmo+BIMiyeFVbKGvOP89+/I2+wy3bwBdVsPSiJ+UbHKpOwyFI15sq+AFBFp2mkJ0PsApX7ZN3LNg9NIk5sBJXuufHnpxVEk8K/mNOVVIJm7ltFeSyQ5E6IR89I6XfB3gru13q+piAEBOn05scTNfI5FPLxNEF442eMuVug+5H6lF6Jl0j03aoN8Az/8LzZf2hJ6Xh61shOO4Abo6kkDC7CjDzhrfMyNqY32J7L4BerCJRV9l+me76Cc5un3It259qa8pusRceM6Cghkar7m66hggxP3eXuND25k9kCjSYPbEHzF0Np73UB0oDUAm6XKQa3IoXkHMmzT5m+32OaFGEiZ4paDdzLZPF/Lpvn05pQ7+kBC6FEksUGzLFjaoIWBUjhSnIiSwhDHNZI76bGCEWRqkw7zlUAWztRlINDO/cL0GUFgDzQWcS2e+GTpOjUom0XE3frfmuFhaaUM21wQu1JispmGFGFL6B8oEemr3gpAkSL9ohJB5BEV
 1p+c+/IK
 rX+jT5LNYMt1OFAN9tzqb1cUg+72FBlwKuYUPZo8KDCeWD5qDtc3R+HEEFjWcufcNADZaZGXdSkF0BkbILV+ZEn4BuedHBqmZ/a7xOqLtP75koqvcTbPbCFdYPdsD/7CrFc2z5aWwJyFWmo5cvu+R4yLupALRcggUNHQiZibuot99fOw0aJywxnf9i6lXukUEnukI+mksPegkLMUYmldq+ynFXyzO8XU9zdxUEgJ7rFubf31kuVnHC5Qrv2FcqufE/okow7esWzOV8O1dNncbOOY6BJm2VJyfL7+DqrZoY7/dBEDe14divaTbHPtAJwz6LVj9CItB6oJwHJwhQ5dvCMQws0wQS8kxvDQMsY12bsv+9f0=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On 2024/7/22 1:34, Alexander Duyck wrote:
> On Fri, Jul 19, 2024 at 2:36 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>
>> Basing on the lib/objpool.c, change it to something like a
>> ptrpool, so that we can utilize that to test the correctness
>> and performance of the page_frag.
>>
>> The testing is done by ensuring that the fragment allocated
>> from a frag_frag_cache instance is pushed into a ptrpool
>> instance in a kthread binded to a specified cpu, and a kthread
>> binded to a specified cpu will pop the fragment from the
>> ptrpool and free the fragment.
>>
>> We may refactor out the common part between objpool and ptrpool
>> if this ptrpool thing turns out to be helpful for other place.
>>
>> CC: Alexander Duyck <alexander.duyck@gmail.com>
>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>> ---
>>  mm/Kconfig.debug    |   8 +
>>  mm/Makefile         |   1 +
>>  mm/page_frag_test.c | 393 ++++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 402 insertions(+)
>>  create mode 100644 mm/page_frag_test.c
> 
> I might have missed it somewhere. Is there any reason why this isn't
> in the selftests/mm/ directory? Seems like that would be a better fit
> for this.
> 
>> diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
>> index afc72fde0f03..1ebcd45f47d4 100644
>> --- a/mm/Kconfig.debug
>> +++ b/mm/Kconfig.debug
>> @@ -142,6 +142,14 @@ config DEBUG_PAGE_REF
>>           kernel code.  However the runtime performance overhead is virtually
>>           nil until the tracepoints are actually enabled.
>>
>> +config DEBUG_PAGE_FRAG_TEST
> 
> This isn't a "DEBUG" feature. This is a test feature.
> 
>> +       tristate "Test module for page_frag"
>> +       default n
>> +       depends on m && DEBUG_KERNEL
> 
> I am not sure it is valid to have a tristate depend on being built as a module.

Perhaps I was copying the wrong pattern from TEST_OBJPOOL in lib/Kconfig.debug.
Perhaps mm/dmapool_test.c and DMAPOOL_TEST* *was more appropriate pattern
for test module for page_frag?

> 
> I think if you can set it up as a selftest it will have broader use as
> you could compile it against any target kernel going forward and add
> it as a module rather than having to build it as a part of a debug
> kernel.

It seems tools/testing/selftests/mm/* are all about userspace testing
tool, and testing kernel module seems to be in the same directory with
the code to be tested?

> 
>> +       help
>> +         This builds the "page_frag_test" module that is used to test the
>> +         correctness and performance of page_frag's implementation.
>> +
>>  config DEBUG_RODATA_TEST
>>      bool "Testcase for the marking rodata read-only"

...

>> +
>> +               /*
>> +                * here we allocate percpu-slot & objs together in a single
>> +                * allocation to make it more compact, taking advantage of
>> +                * warm caches and TLB hits. in default vmalloc is used to
>> +                * reduce the pressure of kernel slab system. as we know,
>> +                * minimal size of vmalloc is one page since vmalloc would
>> +                * always align the requested size to page size
>> +                */
>> +               if (gfp & GFP_ATOMIC)
>> +                       slot = kmalloc_node(size, gfp, cpu_to_node(i));
>> +               else
>> +                       slot = __vmalloc_node(size, sizeof(void *), gfp,
>> +                                             cpu_to_node(i),
>> +                                             __builtin_return_address(0));
> 
> When would anyone ever call this with atomic? This is just for your
> test isn't it?
> 
>> +               if (!slot)
>> +                       return -ENOMEM;
>> +
>> +               memset(slot, 0, size);
>> +               pool->cpu_slots[i] = slot;
>> +
>> +               objpool_init_percpu_slot(pool, slot);
>> +       }
>> +
>> +       return 0;
>> +}

...

>> +/* release whole objpool forcely */
>> +static void objpool_free(struct objpool_head *pool)
>> +{
>> +       if (!pool->cpu_slots)
>> +               return;
>> +
>> +       /* release percpu slots */
>> +       objpool_fini_percpu_slots(pool);
>> +}
>> +
> 
> Why add all this extra objpool overhead? This seems like overkill for
> what should be a simple test. Seems like you should just need a simple
> array located on one of your CPUs. I'm not sure what is with all the
> extra overhead being added here.

As mentioned in the commit log:
"We may refactor out the common part between objpool and ptrpool
if this ptrpool thing turns out to be helpful for other place."

The next thing I am trying to do is to use ptrpool to optimization
the pcp for mm subsystem. so I would rather not tailor the ptrpool
for page_frag_test, and it doesn't seem to affect the testing that
much.

> 
>> +static struct objpool_head ptr_pool;
>> +static int nr_objs = 512;
>> +static atomic_t nthreads;
>> +static struct completion wait;
>> +static struct page_frag_cache test_frag;
>> +
>> +static int nr_test = 5120000;
>> +module_param(nr_test, int, 0);
>> +MODULE_PARM_DESC(nr_test, "number of iterations to test");
>> +
>> +static bool test_align;
>> +module_param(test_align, bool, 0);
>> +MODULE_PARM_DESC(test_align, "use align API for testing");
>> +
>> +static int test_alloc_len = 2048;
>> +module_param(test_alloc_len, int, 0);
>> +MODULE_PARM_DESC(test_alloc_len, "alloc len for testing");
>> +
>> +static int test_push_cpu;
>> +module_param(test_push_cpu, int, 0);
>> +MODULE_PARM_DESC(test_push_cpu, "test cpu for pushing fragment");
>> +
>> +static int test_pop_cpu;
>> +module_param(test_pop_cpu, int, 0);
>> +MODULE_PARM_DESC(test_pop_cpu, "test cpu for popping fragment");
>> +
>> +static int page_frag_pop_thread(void *arg)
>> +{
>> +       struct objpool_head *pool = arg;
>> +       int nr = nr_test;
>> +
>> +       pr_info("page_frag pop test thread begins on cpu %d\n",
>> +               smp_processor_id());
>> +
>> +       while (nr > 0) {
>> +               void *obj = objpool_pop(pool);
>> +
>> +               if (obj) {
>> +                       nr--;
>> +                       page_frag_free(obj);
>> +               } else {
>> +                       cond_resched();
>> +               }
>> +       }
>> +
>> +       if (atomic_dec_and_test(&nthreads))
>> +               complete(&wait);
>> +
>> +       pr_info("page_frag pop test thread exits on cpu %d\n",
>> +               smp_processor_id());
>> +
>> +       return 0;
>> +}
>> +
>> +static int page_frag_push_thread(void *arg)
>> +{
>> +       struct objpool_head *pool = arg;
>> +       int nr = nr_test;
>> +
>> +       pr_info("page_frag push test thread begins on cpu %d\n",
>> +               smp_processor_id());
>> +
>> +       while (nr > 0) {
>> +               void *va;
>> +               int ret;
>> +
>> +               if (test_align) {
>> +                       va = page_frag_alloc_align(&test_frag, test_alloc_len,
>> +                                                  GFP_KERNEL, SMP_CACHE_BYTES);
>> +
>> +                       WARN_ONCE((unsigned long)va & (SMP_CACHE_BYTES - 1),
>> +                                 "unaligned va returned\n");
>> +               } else {
>> +                       va = page_frag_alloc(&test_frag, test_alloc_len, GFP_KERNEL);
>> +               }
>> +
>> +               if (!va)
>> +                       continue;
>> +
>> +               ret = objpool_push(va, pool);
>> +               if (ret) {
>> +                       page_frag_free(va);
>> +                       cond_resched();
>> +               } else {
>> +                       nr--;
>> +               }
>> +       }
>> +
>> +       pr_info("page_frag push test thread exits on cpu %d\n",
>> +               smp_processor_id());
>> +
>> +       if (atomic_dec_and_test(&nthreads))
>> +               complete(&wait);
>> +
>> +       return 0;
>> +}
>> +
> 
> So looking over these functions they seem to overlook how the network
> stack works in many cases. One of the main motivations for the page
> frags approach is page recycling. For example with GRO enabled the
> headers allocated to record the frags might be freed for all but the
> first. As such you can end up with 17 fragments being allocated, and
> 16 freed within the same thread as NAPI will just be recycling the
> buffers.
> 
> With this setup it doesn't seem very likely to be triggered since you
> are operating in two threads. One test you might want to look at
> adding is a test where you are allocating and freeing in the same
> thread at a fairly constant rate to test against the "ideal" scenario.

I am not sure if the above is still the "ideal" scenario, as you mentioned
that most drivers are turning to use page_pool for rx, the page frag is really
mostly for tx or skb->data for rx.

>