linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Arnd Bergmann <arnd@arndb.de>
Cc: virtualization@lists.linux-foundation.org,
	netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mingo@elte.hu, linux-mm@kvack.org,
	akpm@linux-foundation.org, hpa@zytor.com,
	gregory.haskins@gmail.com
Subject: Re: [PATCHv3 2/2] vhost_net: a kernel-level virtio server
Date: Sun, 16 Aug 2009 09:51:10 +0300	[thread overview]
Message-ID: <20090816065110.GA3008@redhat.com> (raw)
In-Reply-To: <200908141340.36176.arnd@arndb.de>

On Fri, Aug 14, 2009 at 01:40:36PM +0200, Arnd Bergmann wrote:
> On Thursday 13 August 2009, Michael S. Tsirkin wrote:
> > What it is: vhost net is a character device that can be used to reduce
> > the number of system calls involved in virtio networking.
> > Existing virtio net code is used in the guest without modification.
> 
> AFAICT, you have addressed all my comments, mostly by convincing me
> that you got it right anyway ;-).
> 
> I hope this gets into 2.6.32, good work!
> 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> 
> One idea though:
> 
> > +	/* Parameter checking */
> > +	if (sock->sk->sk_type != SOCK_RAW) {
> > +		r = -ESOCKTNOSUPPORT;
> > +		goto done;
> > +	}
> > +
> > +	r = sock->ops->getname(sock, (struct sockaddr *)&uaddr.sa,
> > +			       &uaddr_len, 0);
> > +	if (r)
> > +		goto done;
> > +
> > +	if (uaddr.sa.sll_family != AF_PACKET) {
> > +		r = -EPFNOSUPPORT;
> > +		goto done;
> > +	}
> 
> You currently limit the scope of the driver by only allowing raw packet
> sockets to be passed into the network driver. In qemu, we currently support
> some very similar transports:
> 
> * raw packet (not in a release yet)
> * tcp connection
> * UDP multicast
> * tap character device
> * VDE with Unix local sockets
> 
> My primary interest right now is the tap support, but I think it would
> be interesting in general to allow different file descriptor types
> in vhost_net_set_socket. AFAICT, there are two major differences
> that we need to handle for this:
> 
> * most of the transports are sockets, tap uses a character device.
>   This could be dealt with by having both a struct socket * in
>   struct vhost_net *and* a struct file *, or by always keeping the
>   struct file and calling vfs_readv/vfs_writev for the data transport
>   in both cases.

I am concerned that character devices might have weird side effects with
read/write operations and that calling them from kernel thread the way I
do might have security implications. Can't point at anything specific
though at the moment.
I wonder - can we expose the underlying socket used by tap, or will that
create complex lifetime issues?

> * Each transport has a slightly different header, we have
>   - raw ethernet frames (raw, udp multicast, tap)
>   - 32-bit length + raw frames, possibly fragmented (tcp)
>   - 80-bit header + raw frames, possibly fragmented (tap with vnet_hdr)
>   To handle these three cases, we need either different ioctl numbers
>   so that vhost_net can choose the right one, or a flags field in
>   VHOST_NET_SET_SOCKET, like
> 
>   #define VHOST_NET_RAW		1
>   #define VHOST_NET_LEN_HDR	2
>   #define VHOST_NET_VNET_HDR	4
> 
>   struct vhost_net_socket {
> 	unsigned int flags;
> 	int fd;
>   };
>   #define VHOST_NET_SET_SOCKET _IOW(VHOST_VIRTIO, 0x30, struct vhost_net_socket)

It seems we can query the socket to find out the type, or use the
features ioctl.

> If both of those are addressed, we can treat vhost_net as a generic
> way to do network handling in the kernel independent of the qemu
> model (raw, tap, ...) for it. 
> 
> Your qemu patch would have to work differently, so instead of 
> 
> qemu -net nic,vhost=eth0
> 
> you would do the same as today with the raw packet socket extension
> 
> qemu -net nic -net raw,ifname=eth0 
> 
> Qemu could then automatically try to use vhost_net, if it's available
> in the kernel, or just fall back on software vlan otherwise.
> Does that make sense?
> 
> 	Arnd <>

I agree, long term it should be enabled automatically when possible.

-- 
MST

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-08-16  6:52 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1250187913.git.mst@redhat.com>
2009-08-13 18:29 ` [PATCHv3 1/2] mm: export use_mm/unuse_mm to modules Michael S. Tsirkin
2009-08-13 18:29 ` [PATCHv3 2/2] vhost_net: a kernel-level virtio server Michael S. Tsirkin
2009-08-14 11:40   ` Arnd Bergmann
2009-08-16  6:51     ` Michael S. Tsirkin [this message]
2009-08-19  9:04       ` Arnd Bergmann
2009-08-19 13:04         ` Michael S. Tsirkin
2009-08-19 13:46           ` Arnd Bergmann
2009-08-19 14:20             ` Michael S. Tsirkin
2009-08-19 15:27               ` Arnd Bergmann
2009-08-20  8:31                 ` Michael S. Tsirkin
2009-08-20 13:10                   ` Arnd Bergmann
2009-08-20 13:38                     ` Michael S. Tsirkin
2009-08-20 14:31                       ` Arnd Bergmann
2009-08-20 14:42                         ` Michael S. Tsirkin
2009-08-20 15:10                           ` Arnd Bergmann
2009-08-21 13:20                       ` Gleb Natapov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090816065110.GA3008@redhat.com \
    --to=mst@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=gregory.haskins@gmail.com \
    --cc=hpa@zytor.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=netdev@vger.kernel.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox