linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Kasireddy, Vivek" <vivek.kasireddy@intel.com>
To: David Hildenbrand <david@redhat.com>, Peter Xu <peterx@redhat.com>
Cc: "dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Gerd Hoffmann <kraxel@redhat.com>,
	"Kim, Dongwon" <dongwon.kim@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"James Houghton" <jthoughton@google.com>,
	Jerome Marchand <jmarchan@redhat.com>,
	"Chang, Junxiao" <junxiao.chang@intel.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	"Hocko, Michal" <mhocko@suse.com>,
	"Muchun Song" <muchun.song@linux.dev>,
	Jason Gunthorpe <jgg@nvidia.com>,
	John Hubbard <jhubbard@nvidia.com>
Subject: RE: [PATCH v1 0/2] udmabuf: Add back support for mapping hugetlb pages
Date: Wed, 28 Jun 2023 08:04:10 +0000	[thread overview]
Message-ID: <IA0PR11MB7185F61FFD9A0A05459DF759F824A@IA0PR11MB7185.namprd11.prod.outlook.com> (raw)
In-Reply-To: <4a98a381-f184-1857-a134-efd606a3b807@redhat.com>

Hi David,

> 
> On 27.06.23 08:37, Kasireddy, Vivek wrote:
> > Hi David,
> >
> 
> Hi!
> 
> sorry for taking a bit longer to reply lately.
No problem.

> 
> [...]
> 
> >>> Sounds right, maybe it needs to go back to the old GUP solution, though,
> as
> >>> mmu notifiers are also mm-based not fd-based. Or to be explicit, I think
> >>> it'll be pin_user_pages(FOLL_LONGTERM) with the new API.  It'll also
> solve
> >>> the movable pages issue on pinning.
> >>
> >> It better should be pin_user_pages(FOLL_LONGTERM). But I'm afraid we
> >> cannot achieve that without breaking the existing kernel interface ...
> > Yeah, as you suggest, we unfortunately cannot go back to using GUP
> > without breaking udmabuf_create UAPI that expects memfds and file
> > offsets.
> >
> >>
> >> So we might have to implement the same page migration as gup does on
> >> FOLL_LONGTERM here ... maybe there are more such cases/drivers that
> >> actually require that handling when simply taking pages out of the
> >> memfd, believing they can hold on to them forever.
> > IIUC, I don't think just handling the page migration in udmabuf is going to
> > cut it. It might require active cooperation of the Guest GPU driver as well
> > if this is even feasible.
> 
> The idea is, that once you extract the page from the memfd and it
> resides somewhere bad (MIGRATE_CMA, ZONE_MOVABLE), you trigger page
> migration. Essentially what migrate_longterm_unpinnable_pages() does:
So, IIUC, it looks like calling check_and_migrate_movable_pages() at the time
of creation (udmabuf_create) and when we get notified about something like
FALLOC_FL_PUNCH_HOLE will be all that needs to be done in udmabuf?

> 
> Why would the guest driver have to be involved? It shouldn't care about
> page migration in the hypervisor.
Yeah, it appears that the page migration would be transparent to the Guest
driver.

> 
> [...]
> 
> >> balloon, and then using that memory for communicating with the device]
> >>
> >> Maybe it's all fine with udmabuf because of the way it is setup/torn
> >> down by the guest driver. Unfortunately I can't tell.
> > Here are the functions used by virtio-gpu (Guest GPU driver) to allocate
> > pages for its resources:
> > __drm_gem_shmem_create:
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_gem_sh
> mem_helper.c#L97
> > Interestingly, the comment in the above function says that the pages
> > should not be allocated from the MOVABLE zone.
> 
> It doesn't add GFP_MOVABLE, so pages don't end up in
> ZONE_MOVABLE/MIGRATE_CMA *in the guest*. But we care about the
> ZONE_MOVABLE /MIGRATE_CMA *in the host*. (what the guest does is
> right,
> though)
> 
> IOW, what udmabuf does with guest memory on the hypervisor side, not the
> guest driver on the guest side.
Ok, got it.

> 
> > The pages along with their dma addresses are then extracted and shared
> > with Qemu using these two functions:
> > drm_gem_get_pages:
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_gem.c#
> L534
> > virtio_gpu_object_shmem_init:
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/virtio/virtgpu
> _object.c#L135
> 
> ^ so these two target the guest driver as well, right? IOW, there is a
> memfd (shmem) in the guest that the guest driver uses to allocate pages
> from and there is the memfd in the hypervisor to back guest RAM.
> 
> The latter gets registered with udmabuf.
Yes, that's exactly what happens.

> 
> > Qemu then translates the dma addresses into file offsets and creates
> > udmabufs -- as an optimization to avoid data copies only if blob is set
> > to true.
> 
> If the guest OS doesn't end up freeing/reallocating that memory while
> it's registered with udmabuf in the hypervisor, then we should be fine.
IIUC, udmabuf does get notified when something like that happens.

Thanks,
Vivek

> 
> Because that way, the guest won't end up trigger MADV_DONTNEED by
> "accident".
> 
> --
> Cheers,
> 
> David / dhildenb


  reply	other threads:[~2023-06-28  8:04 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-22  7:27 Vivek Kasireddy
2023-06-22  7:27 ` [PATCH v1 1/2] udmabuf: Use vmf_insert_pfn and VM_PFNMAP for handling mmap Vivek Kasireddy
2023-06-22  7:27 ` [PATCH v1 2/2] udmabuf: Add back support for mapping hugetlb pages Vivek Kasireddy
2023-06-22 22:10   ` kernel test robot
2023-06-22  8:25 ` [PATCH v1 0/2] " David Hildenbrand
2023-06-22 21:33   ` Mike Kravetz
2023-06-23  6:13   ` Kasireddy, Vivek
2023-06-23 16:35     ` Peter Xu
2023-06-23 16:37       ` Jason Gunthorpe
2023-06-23 17:28         ` Peter Xu
2023-06-26 12:57           ` Jason Gunthorpe
2023-06-26  7:45       ` Kasireddy, Vivek
2023-06-26 17:52         ` Peter Xu
2023-06-26 18:14           ` David Hildenbrand
2023-06-26 18:18             ` Jason Gunthorpe
2023-06-26 19:04               ` Peter Xu
2023-06-27 15:52                 ` Jason Gunthorpe
2023-06-27 16:00                   ` Peter Xu
2023-06-27 16:04                     ` Jason Gunthorpe
2023-06-27  6:37             ` Kasireddy, Vivek
2023-06-27  7:10               ` David Hildenbrand
2023-06-28  8:04                 ` Kasireddy, Vivek [this message]
2023-08-08 16:17   ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=IA0PR11MB7185F61FFD9A0A05459DF759F824A@IA0PR11MB7185.namprd11.prod.outlook.com \
    --to=vivek.kasireddy@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=dongwon.kim@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=jmarchan@redhat.com \
    --cc=jthoughton@google.com \
    --cc=junxiao.chang@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kraxel@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox