linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Arnd Bergmann <arnd@arndb.de>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: virtualization@lists.linux-foundation.org,
	netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mingo@elte.hu, linux-mm@kvack.org,
	akpm@linux-foundation.org, hpa@zytor.com,
	gregory.haskins@gmail.com, Or Gerlitz <ogerlitz@voltaire.com>
Subject: Re: [PATCHv3 2/2] vhost_net: a kernel-level virtio server
Date: Wed, 19 Aug 2009 15:46:44 +0200	[thread overview]
Message-ID: <200908191546.44193.arnd@arndb.de> (raw)
In-Reply-To: <20090819130417.GB3080@redhat.com>

On Wednesday 19 August 2009, Michael S. Tsirkin wrote:
> > Maybe we could instead extend the 'splice' system call to work on a
> > vhost_net file descriptor.  If we do that, we can put the access back
> > into a user thread (or two) that stays in splice indefinetely
> 
> An issue with exposing internal threading model to userspace
> in this way is that we lose control of e.g. CPU locality -
> and it is very hard for userspace to get it right.

Good point, I hadn't thought about that in this context.

For macvtap, my idea was to open the same tap char device multiple
times and use each fd exclusively on one *guest* CPU. I'm not sure
if virtio-net can already handle SMP guests efficiently. We might
actually need to extend it to have more pairs of virtqueues, one
for each guest CPU, which can then be bound to a host queue (or queue
pair) in the physical nic.

Leaving that aside for now, you could replace VHOST_NET_SET_SOCKET,
VHOST_SET_OWNER, VHOST_RESET_OWNER and your kernel thread with a new
VHOST_NET_SPLICE blocking ioctl that does all the transfers in the
context of the calling thread.

This would improve the driver on various fronts:

- no need for playing tricks with use_mm/unuse_mm
- possibly fewer global TLB flushes from switch_mm, which
  may improve performance.
- ability to pass down error codes from socket or guest to
  user space by returning from ioctl
- based on that, the ability to use any kind of file
  descriptor that can do writev/readv or sendmsg/recvmsg
  without the nastiness you mentioned.

The disadvantage of course is that you need to add a user
thread for each guest device to make up for the workqueue
that you save.

> > to
> > avoid some of the implications of kernel threads like the missing
> > ability to handle transfer errors in user space.
> 
> Are you talking about TCP here?
> Transfer errors are typically asynchronous - possibly eventfd
> as I expose for vhost net is sufficient there.

I mean errors in general if we allow random file descriptors to be used.
E.g. tun_chr_aio_read could return EBADFD, EINVAL, EFAULT, ERESTARTSYS,
EIO, EAGAIN and possibly others. We can handle some in kernel, others
should never happen with vhost_net, but if something unexpected happens
it would be nice to just bail out to user space.

> > > I wonder - can we expose the underlying socket used by tap, or will that
> > > create complex lifetime issues?
> > 
> > I think this could get more messy in the long run than calling vfs_readv
> > on a random fd. It would mean deep internal knowledge of the tap driver
> > in vhost_net, which I really would prefer to avoid.
> 
> No, what I had in mind is adding a GET_SOCKET ioctl to tap.
> vhost would then just use the socket.

Right, that would work with tun/tap at least. It sounds a bit fishy
but I can't see a reason why it would be hard to do.
I'd have to think about how to get it working with macvtap, or if
there is much value left in macvtap after that anyway.

> > So how about making the qemu command line interface an extension to
> > what Or Gerlitz has done for the raw packet sockets?
>
> Not sure I see the connection, but I have not thought about qemu
> side of things too much yet - trying to get kernel bits in place
> first so that there's a stable ABI to work with.

Ok, fair enough. The kernel bits are obviously more time critical
right now, since they should get into 2.6.32.

	Arnd <><

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-08-19 13:50 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1250187913.git.mst@redhat.com>
2009-08-13 18:29 ` [PATCHv3 1/2] mm: export use_mm/unuse_mm to modules Michael S. Tsirkin
2009-08-13 18:29 ` [PATCHv3 2/2] vhost_net: a kernel-level virtio server Michael S. Tsirkin
2009-08-14 11:40   ` Arnd Bergmann
2009-08-16  6:51     ` Michael S. Tsirkin
2009-08-19  9:04       ` Arnd Bergmann
2009-08-19 13:04         ` Michael S. Tsirkin
2009-08-19 13:46           ` Arnd Bergmann [this message]
2009-08-19 14:20             ` Michael S. Tsirkin
2009-08-19 15:27               ` Arnd Bergmann
2009-08-20  8:31                 ` Michael S. Tsirkin
2009-08-20 13:10                   ` Arnd Bergmann
2009-08-20 13:38                     ` Michael S. Tsirkin
2009-08-20 14:31                       ` Arnd Bergmann
2009-08-20 14:42                         ` Michael S. Tsirkin
2009-08-20 15:10                           ` Arnd Bergmann
2009-08-21 13:20                       ` Gleb Natapov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200908191546.44193.arnd@arndb.de \
    --to=arnd@arndb.de \
    --cc=akpm@linux-foundation.org \
    --cc=gregory.haskins@gmail.com \
    --cc=hpa@zytor.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=ogerlitz@voltaire.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox