From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2DD79C19F32 for ; Wed, 5 Mar 2025 19:23:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DE958280016; Wed, 5 Mar 2025 14:23:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D7208280004; Wed, 5 Mar 2025 14:23:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C3BC2280016; Wed, 5 Mar 2025 14:23:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id A66AA280004 for ; Wed, 5 Mar 2025 14:23:40 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id DD779A4F58 for ; Wed, 5 Mar 2025 12:18:11 +0000 (UTC) X-FDA: 83187399582.18.C635C4C Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by imf11.hostedemail.com (Postfix) with ESMTP id E60EB4000F for ; Wed, 5 Mar 2025 12:18:08 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741177089; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5axWVH4vxcdmXj0pRTUTkNgAW9xJ7Bi1RbvV/2S5I8E=; b=LcRQBG55RoDtHqafDgKwfiiixi+e/FFSLvRYedevwpxpmW8b8Bq1/BjDuTwASnMsn0Bk/Q hCruJe0iOUZxzCvmsP6rK3nPB4cdwaTqRJqUi81FOkA/p3HmaA5Rr/5t1RQEqQ6Yjrv5v6 jg7xlkgSQxDwnVA/h4jsJvE7Hcxi2z0= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741177089; a=rsa-sha256; cv=none; b=FwWC5Vq3eY28HqfaOoAMB3yBbt+yqzOmEOF2vWwFs/nduWVByMF40wbwxUD+IZrV5u7P1K fd7eXCxJJqRx87uRVs35ITNVDN41WiCdb3FmZbmNJToHW2kKiww0YWfi74HQaz+hjkbPSb fy1EjDIzURMX49TihsPfZa5RbDSQttQ= Received: from mail.maildlp.com (unknown [172.19.163.48]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4Z7BJz4R0yzwWy8; Wed, 5 Mar 2025 20:13:07 +0800 (CST) Received: from dggpemf200006.china.huawei.com (unknown [7.185.36.61]) by mail.maildlp.com (Postfix) with ESMTPS id 2AA2318009E; Wed, 5 Mar 2025 20:18:02 +0800 (CST) Received: from [10.67.120.129] (10.67.120.129) by dggpemf200006.china.huawei.com (7.185.36.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Wed, 5 Mar 2025 20:17:59 +0800 Message-ID: <18c68e7a-88c9-49d1-8ff8-17c63bcc44f4@huawei.com> Date: Wed, 5 Mar 2025 20:17:59 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] mm: alloc_pages_bulk: remove assumption of populating only NULL elements To: Qu Wenruo , Yishai Hadas , Jason Gunthorpe , Shameer Kolothum , Kevin Tian , Alex Williamson , Chris Mason , Josef Bacik , David Sterba , Gao Xiang , Chao Yu , Yue Hu , Jeffle Xu , Sandeep Dhavale , Carlos Maiolino , "Darrick J. Wong" , Andrew Morton , Jesper Dangaard Brouer , Ilias Apalodimas , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Trond Myklebust , Anna Schumaker , Chuck Lever , Jeff Layton , Neil Brown , Olga Kornievskaia , Dai Ngo , Tom Talpey CC: Luiz Capitulino , Mel Gorman , Dave Chinner , , , , , , , , , References: <20250228094424.757465-1-linyunsheng@huawei.com> <6f4017dc-2b3d-4b1a-b819-423acb42d999@suse.com> Content-Language: en-US From: Yunsheng Lin In-Reply-To: <6f4017dc-2b3d-4b1a-b819-423acb42d999@suse.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.120.129] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To dggpemf200006.china.huawei.com (7.185.36.61) X-Rspam-User: X-Rspamd-Queue-Id: E60EB4000F X-Rspamd-Server: rspam09 X-Stat-Signature: muwbn86w7hz5619ed9fpkrxqewu9njqu X-HE-Tag: 1741177088-476949 X-HE-Meta: U2FsdGVkX1+toPazOQrgl1u/FkQjm4E3ullXBe+FgiZGDV5aw+mmyvJoAQ5J6WYNPmVx6XR0G2/gYirDKHmWOQ9/Akj8ofUJFzap5/cmOfUg+XMhbIALPCuwiAM9C3qDlE9RV4xsFK6TWN8a+v/lCUb6G5r08jtBC1Qrxd+IDzFaba3qNZYBn0GXrXt6q1WhNeBJ4PSkmzCduD7nVdWxDDvteRPHGt5LRVM6pkBJ6AjTVW3So1+xosTctM8FcN+P4QQ+SoOmbrO2epyGgeGMbIwYqCIzjl3uLVn58R40XZiI9V5ZX12tUVHzIzbhjmo0wv/NybJemrEdzOy3aaBWJ+x0qvAaqmXpG2ipYqcNzZNDfYCfRsZqtB0lGFbWPVPLW9sDDI3ZvJoDVEaa3DrujNDOuc1ntpy6fVighBFjQtNFNIgiHp4r9X+Wrtf8YZJ0qCItrn6CccU00HCHXfemOE55ZByfk9y+tcpkJO960aAcWF9O/gkedhxFSMNmM/gnXasBEMGG9Oaq7s0tw35cRszCW48yNnFZaN6zR6MlqXdtQ8unRjaWpWieBMzBWjS9+JHJFFmXGjflc6dEpsb1mz1unyQ+9/h3AkxLqkiRFdZ1wNJph2a9PLBQ9+qg4syQN/9KSeB8DiLykQ0boaUJ5NxTE4033fN/ZlNIeaG5M9ZL02x3dJchOtJ/O7sGGpb8xquUQLwMkYQVEpuagPViduLZHVSkOYSzus1PgZb+C7mkzjQibu4BZ6Th5tfDORjiCvIM3IikoDZq7a0PwIVP4Zj08vidscYRQ2VxaVoyS3ioj1fef1kWg5guRcPfPAIYbPNLQ1WBIi/plazzszsLyQkoz3Pu/Bwg1aaNb2h3vKH3vdkYjtLBJzwM8t7q5ukpgdxLpRzeCNAJtofLL9a1N5UYbOiIpu9nUV3jBlCrua84p343VjXO+bAK1VtyOfJcU2jSEatOfwD3jQJmf1A ROfgzQYn Qiy2E9GkQrt7rTHmOiDlPlm21o9YITJyPfnbFHy8J1DRWIdM2C1eFtcbzm07afqOQoX+PYW9rOoUrRTk4kxxfgaoQQF0Xw2BGnf1MAnpxf4W/1BcqVzVrEOKdCpAj7FcyvZjkggLvz2+nP5WNM5RnqxyOpbG2dkrsBM0qQSgGAyL2rtaeULodi092KYG+CIc80MQ6yfKvsqoAsoHgYe3SI8spe56Z49Hn915wYNlrA85D2Vc8lapesSTwQhcBrweelFNMWaoimCZo7GKpFHNojchfyoY7JgEpy2Q1eAywWgU5kM4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/3/4 17:17, Qu Wenruo wrote: > > > 在 2025/2/28 20:14, Yunsheng Lin 写道: >> As mentioned in [1], it seems odd to check NULL elements in >> the middle of page bulk allocating, and it seems caller can >> do a better job of bulk allocating pages into a whole array >> sequentially without checking NULL elements first before >> doing the page bulk allocation for most of existing users. >> >> Through analyzing of bulk allocation API used in fs, it >> seems that the callers are depending on the assumption of >> populating only NULL elements in fs/btrfs/extent_io.c and >> net/sunrpc/svc_xprt.c while erofs and btrfs don't, see: I should have said 'while erofs and xfs don't depend on the assumption of populating only NULL elements'. >> commit 91d6ac1d62c3 ("btrfs: allocate page arrays using bulk page allocator") > > If you want to change the btrfs part, please run full fstests with SCRATCH_DEV_POOL populated at least. The above is a helpful suggestion/comment to someone like me, who is not very familiar with fs yet, thanks for the suggestion. But I am not sure about some of the other comments below. > > [...] >> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c >> index f0a1da40d641..ef52cedd9873 100644 >> --- a/fs/btrfs/extent_io.c >> +++ b/fs/btrfs/extent_io.c >> @@ -623,13 +623,26 @@ int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array, >>                  bool nofail) >>   { >>       const gfp_t gfp = nofail ? (GFP_NOFS | __GFP_NOFAIL) : GFP_NOFS; >> -    unsigned int allocated; >> +    unsigned int allocated, ret; >>   -    for (allocated = 0; allocated < nr_pages;) { >> -        unsigned int last = allocated; >> +    /* Defragment page_array so pages can be bulk allocated into remaining >> +     * NULL elements sequentially. >> +     */ >> +    for (allocated = 0, ret = 0; ret < nr_pages; ret++) { >> +        if (page_array[ret]) { > > You just prove how bad the design is. > Below is the reason you think the design is bad? If not, it would be good to be more specific about why it is a bad design. > All the callers have their page array members to initialized to NULL, or do not care and just want alloc_pages_bulk() to overwrite the uninitialized values. Actually there are two use cases here as mentioned in the commit log: 1. Allocating an array of pages sequentially by providing an array as output parameter. 2. Refilling pages to NULL elements in an array by providing an array as both input and output parameter. Most of users calling the bulk alloc API is allocating an array of pages sequentially except btrfs and sunrpc, the current page bulk alloc API implementation is not only doing the unnecessay NULL checking for most users, but also require most of its callers to pass all NULL array via memset, kzalloc, etc, which is also unnecessary overhead. That means there is some space for improvement from performance and easy-to-use perspective for most existing use cases here, this patch just change alloc_pages_bulk() API to treat the page_array as only the output parameter by mirroring kmem_cache_alloc_bulk() API. For the existing btrfs and sunrpc case, I am agreed that there might be valid use cases too, we just need to discuss how to meet the requirements of different use cases using simpler, more unified and effective APIs. > > The best example here is btrfs_encoded_read_regular(). > Now your code will just crash encoded read. It would be good to be more specific about the 'crash' here, as simple testing mentioned in below seems fine for btrfs fs too, but I will do more testing by running full fstests with SCRATCH_DEV_POOL populated after I learn how to use the fstests. https://lore.kernel.org/all/91fcdfca-3e7b-417c-ab26-7d5e37853431@huawei.com/ > > Read the context before doing stupid things. > > I find it unacceptable that you just change the code, without any testing, nor even just check all the involved callers. What exactly is the above 'context' is referring to? If it is a good advice, I think I will take it seriously. May I suggest that it might be better to be more humble and discuss more before jumpping to conclusion here as it seems hard for one person to be familiar with all the subsystem in the kernel? > >> +            page_array[allocated] = page_array[ret]; >> +            if (ret != allocated) >> +                page_array[ret] = NULL; >> + >> +            allocated++; >> +        } >> +    } >>   -        allocated = alloc_pages_bulk(gfp, nr_pages, page_array); >> -        if (unlikely(allocated == last)) { >> +    while (allocated < nr_pages) { >> +        ret = alloc_pages_bulk(gfp, nr_pages - allocated, >> +                       page_array + allocated); > > I see the new interface way worse than the existing one. > > All btrfs usage only wants a simple retry-until-all-fulfilled behavior. As above, I am agreed that the above might be what btrfs usage want, so let's discuss how to meet the requirements of different use cases using simpler, more unified and effective API, like introducing a function like alloc_pages_refill_array() to meet the above requirement as mentioned in below? https://lore.kernel.org/all/74827bc7-ec6e-4e3a-9d19-61c4a9ba6b2c@huawei.com/ > > NACK for btrfs part, and I find you very unresponsible not even bother running any testsuit and just submit such a mess. > > Just stop this, no one will ever take you serious anymore. > > Thanks, > Qu > >