Re: [RFC 00/16] padata, vfio, sched: Multithreaded VFIO page pinning

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Daniel Jordan <daniel.m.jordan@oracle.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alexander Duyck <alexanderduyck@fb.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ben Segall <bsegall@google.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	Ingo Molnar <mingo@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Michal Hocko <mhocko@suse.com>, Nico Pache <npache@redhat.com>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Steffen Klassert <steffen.klassert@secunet.com>,
	Steve Sistare <steven.sistare@oracle.com>,
	Tejun Heo <tj@kernel.org>, Tim Chen <tim.c.chen@linux.intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	linux-mm@kvack.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org
Subject: Re: [RFC 00/16] padata, vfio, sched: Multithreaded VFIO page pinning
Date: Thu, 6 Jan 2022 22:03:30 -0500	[thread overview]
Message-ID: <20220107030330.2kcpekbtxn7xmsth@oracle.com> (raw)
In-Reply-To: <20220106011306.GY2328285@nvidia.com>

On Wed, Jan 05, 2022 at 09:13:06PM -0400, Jason Gunthorpe wrote:
> On Wed, Jan 05, 2022 at 07:46:40PM -0500, Daniel Jordan wrote:
> 
> > Get ready to parallelize.  In particular, pinning can fail, so make jobs
> > undo-able.
> > 
> >      5  vfio/type1: Pass mm to vfio_pin_pages_remote()
> >      6  vfio/type1: Refactor dma map removal
> >      7  vfio/type1: Parallelize vfio_pin_map_dma()
> >      8  vfio/type1: Cache locked_vm to ease mmap_lock contention
> 
> In some ways this kind of seems like overkill, why not just have
> userspace break the guest VA into chunks and call map in parallel?
> Similar to how it already does the prealloc in parallel?
>
> This is a simpler kernel job of optimizing locking to allow
> concurrency.

I didn't consider doing it that way, and am not seeing a fundamental
reason it wouldn't work right off the bat.

At a glance, I think pinning would need to be moved out from under
vfio's iommu->lock.  I haven't checked to see how hard it would be, but
that plus the locking optimizations might end up being about the same
amount of complexity as the multithreading in the vfio driver, and doing
this in the kernel would speed things up for all vfio users without
having to duplicate the parallelism in userspace.

But yes, agreed, the lock optimization could definitely be split out and
used in a different approach.

> It is also not good that this inserts arbitary cuts in the IOVA
> address space, that will cause iommu_map() to be called with smaller
> npages, and could result in a long term inefficiency in the iommu.
> 
> I don't know how the kernel can combat this without prior knowledge of
> the likely physical memory layout (eg is the VM using 1G huge pages or
> something)..

The cuts aren't arbitrary, padata controls where they happen.  This is
optimizing for big memory ranges, so why isn't it enough that padata
breaks up the work along a big enough page-aligned chunk?  The vfio
driver does one iommu mapping per physically contiguous range, and I
don't think those will be large enough to be affected using such a chunk
size.  If cuts in per-thread ranges are an issue, I *think* userspace
has the same problem?

> Personally I'd rather see the results from Matthew's work to allow GUP
> to work on folios efficiently before reaching to this extreme.
> 
> The results you got of only 1.2x improvement don't seem so
> compelling.

I know you understand, but just to be clear for everyone, that 1.2x is
the overall improvement to qemu init from multithreaded pinning alone
when prefaulting is done in both base and test.

Pinning itself, the only thing being optimized, improves 8.5x in that
experiment, bringing the time from 1.8 seconds to .2 seconds.  That's a
significant savings IMHO

> Based on the unpin work I fully expect that folio
> optimized GUP will do much better than that with single threaded..

Yeah, I'm curious to see how folio will do as well.  And there are some
very nice, efficiently gained speedups in the unpin work.  Changes like
that benefit all gup users, too, as you've pointed out before.

But, I'm skeptical that singlethreaded optimization alone will remove
the bottleneck with the enormous memory sizes we use.  For instance,
scaling up the times from the unpin results with both optimizations (the
IB specific one too, which would need to be done for vfio), a 1T guest
would still take almost 2 seconds to pin/unpin.  

If people feel strongly that we should try optimizing other ways first,
ok, but I think these are complementary approaches.  I'm coming at this
problem this way because this is fundamentally a memory-intensive
operation where more bandwidth can help, and there are other kernel
paths we and others want this infrastructure for.

In any case, thanks a lot for the super quick feedback!

next prev parent reply	other threads:[~2022-01-07  3:04 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-06  0:46 Daniel Jordan
2022-01-06  0:46 ` [RFC 01/16] padata: Remove __init from multithreading functions Daniel Jordan
2022-01-06  0:46 ` [RFC 02/16] padata: Return first error from a job Daniel Jordan
2022-01-06  0:46 ` [RFC 03/16] padata: Add undo support Daniel Jordan
2022-01-06  0:46 ` [RFC 04/16] padata: Detect deadlocks between main and helper threads Daniel Jordan
2022-01-06  0:46 ` [RFC 05/16] vfio/type1: Pass mm to vfio_pin_pages_remote() Daniel Jordan
2022-01-06  0:46 ` [RFC 06/16] vfio/type1: Refactor dma map removal Daniel Jordan
2022-01-06  0:46 ` [RFC 07/16] vfio/type1: Parallelize vfio_pin_map_dma() Daniel Jordan
2022-01-06  0:46 ` [RFC 08/16] vfio/type1: Cache locked_vm to ease mmap_lock contention Daniel Jordan
2022-01-06  0:53   ` Jason Gunthorpe
2022-01-06  1:17     ` Daniel Jordan
2022-01-06 12:34       ` Jason Gunthorpe
2022-01-06 21:05         ` Alex Williamson
2022-01-07  0:19           ` Jason Gunthorpe
2022-01-07  3:06             ` Daniel Jordan
2022-01-07 15:18               ` Jason Gunthorpe
2022-01-07 16:39                 ` Daniel Jordan
2022-01-06  0:46 ` [RFC 09/16] padata: Use kthreads in do_multithreaded Daniel Jordan
2022-01-06  0:46 ` [RFC 10/16] padata: Helpers should respect main thread's CPU affinity Daniel Jordan
2022-01-06  0:46 ` [RFC 11/16] padata: Cap helpers started to online CPUs Daniel Jordan
2022-01-06  0:46 ` [RFC 12/16] sched, padata: Bound max threads with max_cfs_bandwidth_cpus() Daniel Jordan
2022-01-06  0:46 ` [RFC 13/16] padata: Run helper threads at MAX_NICE Daniel Jordan
2022-01-06  0:46 ` [RFC 14/16] padata: Nice helper threads one by one to prevent starvation Daniel Jordan
2022-01-06  0:46 ` [RFC 15/16] sched/fair: Account kthread runtime debt for CFS bandwidth Daniel Jordan
2022-01-11 11:58   ` Peter Zijlstra
2022-01-11 16:29     ` Daniel Jordan
2022-01-12 20:18       ` Tejun Heo
2022-01-13 21:08         ` Daniel Jordan
2022-01-13 21:11           ` Daniel Jordan
2022-01-14  9:31   ` Peter Zijlstra
2022-01-14  9:40     ` Peter Zijlstra
2022-01-14 16:38       ` Tejun Heo
2022-01-18 17:40       ` Daniel Jordan
2022-01-14 16:30     ` Tejun Heo
2022-01-18 17:32     ` Daniel Jordan
2022-01-06  0:46 ` [RFC 16/16] sched/fair: Consider kthread debt in cputime Daniel Jordan
2022-01-06  1:13 ` [RFC 00/16] padata, vfio, sched: Multithreaded VFIO page pinning Jason Gunthorpe
2022-01-07  3:03   ` Daniel Jordan [this message]
2022-01-07 17:12     ` Jason Gunthorpe
2022-01-10 22:27       ` Daniel Jordan
2022-01-11  0:17         ` Jason Gunthorpe
2022-01-11 16:20           ` Daniel Jordan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220107030330.2kcpekbtxn7xmsth@oracle.com \
    --to=daniel.m.jordan@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=alexanderduyck@fb.com \
    --cc=bsegall@google.com \
    --cc=cohuck@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=jgg@nvidia.com \
    --cc=josh@joshtriplett.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=npache@redhat.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=peterz@infradead.org \
    --cc=steffen.klassert@secunet.com \
    --cc=steven.sistare@oracle.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox