From: jane.chu@oracle.com
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: logane@deltatee.com, hch@lst.de, gregkh@linuxfoundation.org,
willy@infradead.org, kch@nvidia.com, axboe@kernel.dk,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org,
linux-block@vger.kernel.org
Subject: Re: Report: Performance regression from ib_umem_get on zone device pages
Date: Mon, 28 Apr 2025 12:11:40 -0700 [thread overview]
Message-ID: <bab1c156-ed5a-4c1d-8f0a-dd1e39e17c99@oracle.com> (raw)
In-Reply-To: <20250424120143.GX1213339@ziepe.ca>
On 4/24/2025 5:01 AM, Jason Gunthorpe wrote:
> On Wed, Apr 23, 2025 at 10:35:06PM -0700, jane.chu@oracle.com wrote:
>>
>> On 4/23/2025 4:28 PM, Jason Gunthorpe wrote:
>>>> The flow of a single test run:
>>>> 1. reserve virtual address space for (61440 * 2MB) via mmap with PROT_NONE
>>>> and MAP_ANONYMOUS | MAP_NORESERVE| MAP_PRIVATE
>>>> 2. mmap ((61440 * 2MB) / 12) from each of the 12 device-dax to the
>>>> reserved virtual address space sequentially to form a continual VA
>>>> space
>>> Like is there any chance that each of these 61440 VMA's is a single
>>> 2MB folio from device-dax, or could it be?
>>>
>>> IIRC device-dax does could not use folios until 6.15 so I'm assuming
>>> it is not folios even if it is a pmd mapping?
>>
>> I just ran the mr registration stress test in 6.15-rc3, much better!
>>
>> What's changed? is it folio for device-dax? none of the code in
>> ib_umem_get() has changed though, it still loops through 'npages' doing
>
> I don't know, it is kind of strange that it changed. If device-dax is
> now using folios then it does change the access pattern to the struct
> page array somewhat, especially it moves all the writes to the head
> page of the 2MB section which maybe impacts the the caching?
6.15-rc3 is orders of magnitude better.
Agreed that device-dax's using folio are likely the heros. I've yet to
check the code and bisect, maybe pin_user_page_fast() adds folios to
page_list[] instead of 4K pages? if so, with 511/512 size reduction in
page_list[], that could drastically improve the dowstream call
performance in spite of the thrashing, that is, if thrashing is still there.
I'll report my findings.
Thanks,
-jane
>
> Jason
next prev parent reply other threads:[~2025-04-28 19:12 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-23 19:21 jane.chu
2025-04-23 19:34 ` Resend: " jane.chu
2025-04-23 23:28 ` Jason Gunthorpe
2025-04-24 1:49 ` jane.chu
2025-04-24 2:55 ` jane.chu
2025-04-24 3:00 ` jane.chu
2025-04-24 5:35 ` jane.chu
2025-04-24 12:01 ` Jason Gunthorpe
2025-04-28 19:11 ` jane.chu [this message]
2025-04-29 12:29 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bab1c156-ed5a-4c1d-8f0a-dd1e39e17c99@oracle.com \
--to=jane.chu@oracle.com \
--cc=axboe@kernel.dk \
--cc=gregkh@linuxfoundation.org \
--cc=hch@lst.de \
--cc=jgg@ziepe.ca \
--cc=kch@nvidia.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-pci@vger.kernel.org \
--cc=logane@deltatee.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox