From: Yunsheng Lin <linyunsheng@huawei.com>
To: Chuck Lever <chuck.lever@oracle.com>,
Yishai Hadas <yishaih@nvidia.com>, Jason Gunthorpe <jgg@ziepe.ca>,
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>,
Kevin Tian <kevin.tian@intel.com>,
Alex Williamson <alex.williamson@redhat.com>,
Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.com>, Gao Xiang <xiang@kernel.org>,
Chao Yu <chao@kernel.org>, Yue Hu <zbestahu@gmail.com>,
Jeffle Xu <jefflexu@linux.alibaba.com>,
Sandeep Dhavale <dhavale@google.com>,
Carlos Maiolino <cem@kernel.org>,
"Darrick J. Wong" <djwong@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>,
Trond Myklebust <trondmy@kernel.org>,
Anna Schumaker <anna@kernel.org>,
Jeff Layton <jlayton@kernel.org>, Neil Brown <neilb@suse.de>,
Olga Kornievskaia <okorniev@redhat.com>,
Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>
Cc: Luiz Capitulino <luizcap@redhat.com>,
Mel Gorman <mgorman@techsingularity.net>,
Dave Chinner <david@fromorbit.com>, <kvm@vger.kernel.org>,
<virtualization@lists.linux.dev>, <linux-kernel@vger.kernel.org>,
<linux-btrfs@vger.kernel.org>, <linux-erofs@lists.ozlabs.org>,
<linux-xfs@vger.kernel.org>, <linux-mm@kvack.org>,
<netdev@vger.kernel.org>, <linux-nfs@vger.kernel.org>,
Suren Baghdasaryan <surenb@google.com>
Subject: Re: [PATCH v2] mm: alloc_pages_bulk: remove assumption of populating only NULL elements
Date: Tue, 4 Mar 2025 20:04:03 +0800 [thread overview]
Message-ID: <74827bc7-ec6e-4e3a-9d19-61c4a9ba6b2c@huawei.com> (raw)
In-Reply-To: <a81f9270-8fa8-4a05-a33a-901dd777a71f@oracle.com>
On 2025/3/4 6:13, Chuck Lever wrote:
> On 2/28/25 4:44 AM, Yunsheng Lin wrote:
>> As mentioned in [1], it seems odd to check NULL elements in
>> the middle of page bulk allocating, and it seems caller can
>> do a better job of bulk allocating pages into a whole array
>> sequentially without checking NULL elements first before
>> doing the page bulk allocation for most of existing users.
>
> Sorry, but this still makes a claim without providing any data
> to back it up. Why can callers "do a better job"?
What I meant "do a better job" is that callers are already keeping
track of how many pages have been allocated, and it seems convenient
to just pass 'page_array + nr_allocated' and 'nr_pages - nr_allocated'
when calling subsequent page bulk alloc API so that NULL checking
can be avoided, which seems to be the pattern I see in
alloc_pages_bulk_interleave().
>
>
>> Through analyzing of bulk allocation API used in fs, it
>> seems that the callers are depending on the assumption of
>> populating only NULL elements in fs/btrfs/extent_io.c and
>> net/sunrpc/svc_xprt.c while erofs and btrfs don't, see:
>> commit 91d6ac1d62c3 ("btrfs: allocate page arrays using bulk page allocator")
>> commit d6db47e571dc ("erofs: do not use pagepool in z_erofs_gbuf_growsize()")
>> commit c9fa563072e1 ("xfs: use alloc_pages_bulk_array() for buffers")
>> commit f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator")
>>
>> Change SUNRPC and btrfs to not depend on the assumption.
>> Other existing callers seems to be passing all NULL elements
>> via memset, kzalloc, etc.
>>
>> Remove assumption of populating only NULL elements and treat
>> page_array as output parameter like kmem_cache_alloc_bulk().
>> Remove the above assumption also enable the caller to not
>> zero the array before calling the page bulk allocating API,
>> which has about 1~2 ns performance improvement for the test
>> case of time_bench_page_pool03_slow() for page_pool in a
>> x86 vm system, this reduces some performance impact of
>> fixing the DMA API misuse problem in [2], performance
>> improves from 87.886 ns to 86.429 ns.
>>
>> 1. https://lore.kernel.org/all/bd8c2f5c-464d-44ab-b607-390a87ea4cd5@huawei.com/
>> 2. https://lore.kernel.org/all/20250212092552.1779679-1-linyunsheng@huawei.com/
>> CC: Jesper Dangaard Brouer <hawk@kernel.org>
>> CC: Luiz Capitulino <luizcap@redhat.com>
>> CC: Mel Gorman <mgorman@techsingularity.net>
>> CC: Dave Chinner <david@fromorbit.com>
>> CC: Chuck Lever <chuck.lever@oracle.com>
>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>> Acked-by: Jeff Layton <jlayton@kernel.org>
>> ---
>> V2:
>> 1. Drop RFC tag and rebased on latest linux-next.
>> 2. Fix a compile error for xfs.
>> 3. Defragmemt the page_array for SUNRPC and btrfs.
>> ---
>> drivers/vfio/pci/virtio/migrate.c | 2 --
>> fs/btrfs/extent_io.c | 23 +++++++++++++++++-----
>> fs/erofs/zutil.c | 12 ++++++------
>> fs/xfs/xfs_buf.c | 9 +++++----
>> mm/page_alloc.c | 32 +++++--------------------------
>> net/core/page_pool.c | 3 ---
>> net/sunrpc/svc_xprt.c | 22 +++++++++++++++++----
>> 7 files changed, 52 insertions(+), 51 deletions(-)
>
> 52:51 is not an improvement. 1-2 ns is barely worth mentioning. The
> sunrpc and btrfs callers are more complex and carry duplicated code.
Yes, the hard part is to find common file to place the common function
as something as below:
void defragment_pointer_array(void** array, int size) {
int slow = 0;
for (int fast = 0; fast < size; fast++) {
if (array[fast] != NULL) {
swap(&array[fast], &array[slow]);
slow++;
}
}
}
Or introduce a function as something like alloc_pages_refill_array()
for the usecase of sunrpc and xfs and do the array defragment in
alloc_pages_refill_array() before calling alloc_pages_bulk()?
Any suggestion?
>
> Not an outright objection from me, but it's hard to justify this change.
The above should reduce the number to something like 40:51.
Also, I looked more closely at other callers calling alloc_pages_bulk(),
it seems vm_module_tags_populate() doesn't clear next_page[i] to NULL after
__free_page() when doing 'Clean up and error out', I am not sure if
vm_module_tags_populate() will be called multi-times as vm_module_tags->pages
seems to be only set to all NULL once, see alloc_tag_init() -> alloc_mod_tags_mem().
Cc Suren to see if there is some clarifying for the above.
next prev parent reply other threads:[~2025-03-04 12:04 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-28 9:44 Yunsheng Lin
2025-03-03 22:13 ` Chuck Lever
2025-03-04 12:04 ` Yunsheng Lin [this message]
2025-03-04 8:18 ` Dave Chinner
2025-03-04 12:09 ` Yunsheng Lin
2025-03-08 6:43 ` Dave Chinner
2025-03-09 13:40 ` Yunsheng Lin
2025-03-10 0:32 ` Gao Xiang
2025-03-10 12:31 ` Yunsheng Lin
2025-03-10 12:59 ` Gao Xiang
2025-03-11 22:55 ` NeilBrown
2025-03-12 1:45 ` Gao Xiang
2025-03-12 12:05 ` Yunsheng Lin
2025-03-12 12:41 ` Gao Xiang
2025-03-04 9:17 ` Qu Wenruo
2025-03-05 12:17 ` Yunsheng Lin
2025-03-05 23:41 ` NeilBrown
2025-03-06 11:43 ` Yunsheng Lin
2025-03-06 21:14 ` NeilBrown
2025-03-07 9:23 ` Yunsheng Lin
2025-03-07 21:02 ` NeilBrown
2025-03-09 13:23 ` Yunsheng Lin
2025-03-10 0:10 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=74827bc7-ec6e-4e3a-9d19-61c4a9ba6b2c@huawei.com \
--to=linyunsheng@huawei.com \
--cc=Dai.Ngo@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=alex.williamson@redhat.com \
--cc=anna@kernel.org \
--cc=cem@kernel.org \
--cc=chao@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=clm@fb.com \
--cc=davem@davemloft.net \
--cc=david@fromorbit.com \
--cc=dhavale@google.com \
--cc=djwong@kernel.org \
--cc=dsterba@suse.com \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=jefflexu@linux.alibaba.com \
--cc=jgg@ziepe.ca \
--cc=jlayton@kernel.org \
--cc=josef@toxicpanda.com \
--cc=kevin.tian@intel.com \
--cc=kuba@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-erofs@lists.ozlabs.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=luizcap@redhat.com \
--cc=mgorman@techsingularity.net \
--cc=neilb@suse.de \
--cc=netdev@vger.kernel.org \
--cc=okorniev@redhat.com \
--cc=pabeni@redhat.com \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=surenb@google.com \
--cc=tom@talpey.com \
--cc=trondmy@kernel.org \
--cc=virtualization@lists.linux.dev \
--cc=xiang@kernel.org \
--cc=yishaih@nvidia.com \
--cc=zbestahu@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox