Re: Report: Performance regression from ib_umem_get on zone device pages

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: jane.chu@oracle.com
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: logane@deltatee.com, hch@lst.de, gregkh@linuxfoundation.org,
	willy@infradead.org, kch@nvidia.com, axboe@kernel.dk,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-block@vger.kernel.org
Subject: Re: Report: Performance regression from ib_umem_get on zone device pages
Date: Wed, 23 Apr 2025 18:49:30 -0700	[thread overview]
Message-ID: <914dfa88-d36c-44c2-a7d6-22f6fbd2b86f@oracle.com> (raw)
In-Reply-To: <20250423232828.GV1213339@ziepe.ca>


On 4/23/2025 4:28 PM, Jason Gunthorpe wrote:
> On Wed, Apr 23, 2025 at 12:21:15PM -0700, jane.chu@oracle.com wrote:
> 
>> So this looks like a case of CPU cache thrashing, but I don't know to fix
>> it. Could someone help address the issue?  I'd be happy to help verifying.
> 
> I don't know that we can even really fix it if that is the cause.. But
> it seems suspect, if you are only doing 2M at a time per CPU core then
> that is only 512 struct pages or 32k of data. The GUP process will
> have touched all of that if device-dax is not creating folios. So why
> did it fall out of the cache?
> 
> If it is creating folios then maybe we can improve things by
> recovering the folios before adding the pages.
> 
> Or is something weird going on like the device-dax is using 1G folios
> and all of these pins and checks are sharing and bouncing the same
> struct page cache lines?

I used ndctl to create 12 device-dax instances in 2M alignment by 
default, and mmap the device-dax memory in 2M alignment and 2M-multiple 
size, that should lead to the default 2MB hugepage mapping.

> 
> Can the device-dax implement memfd_pin_folios()?

Could you elaborate? or perhaps Dan Williams could comment?

> 
>> The flow of a single test run:
>>    1. reserve virtual address space for (61440 * 2MB) via mmap with PROT_NONE
>> and MAP_ANONYMOUS | MAP_NORESERVE| MAP_PRIVATE
>>    2. mmap ((61440 * 2MB) / 12) from each of the 12 device-dax to the
>> reserved virtual address space sequentially to form a continual VA
>> space
> 
> Like is there any chance that each of these 61440 VMA's is a single
> 2MB folio from device-dax, or could it be?

That's 61440 mrs of 2MB each, they came from 12 device-dax.
The test process mmap them into its pre-reserved VMA, so the entire VMA 
range is 61440 * 2M = 122880MB, or about 31million 4K-pages.

When it comes to mr registration via ibv_reg_mr(), there'll be about 
31million of ->pgmap dereferences from "a->pgmap == b->pgmap", give the 
small L1 Dcache, that is how I see the cache thrashing happening.

> 
> IIRC device-dax does could not use folios until 6.15 so I'm assuming
> it is not folios even if it is a pmd mapping?

Probably not, there are very little change to device-dax, but Dan can 
correct me.

In theory, the problem could be observed by using any kind of zone 
device pages for the mrs, have you seen anything like this?

thanks,
-jane

> 
> Jason
>

next prev parent reply	other threads:[~2025-04-24  1:49 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-23 19:21 jane.chu
2025-04-23 19:34 ` Resend: " jane.chu
2025-04-23 23:28 ` Jason Gunthorpe
2025-04-24  1:49   ` jane.chu [this message]
2025-04-24  2:55   ` jane.chu
2025-04-24  3:00     ` jane.chu
2025-04-24  5:35   ` jane.chu
2025-04-24 12:01     ` Jason Gunthorpe
2025-04-28 19:11       ` jane.chu
2025-04-29 12:29         ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=914dfa88-d36c-44c2-a7d6-22f6fbd2b86f@oracle.com \
    --to=jane.chu@oracle.com \
    --cc=axboe@kernel.dk \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@lst.de \
    --cc=jgg@ziepe.ca \
    --cc=kch@nvidia.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=logane@deltatee.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox