linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Arnd Bergmann <arnd@arndb.de>
To: virtualization@lists.linux-foundation.org
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mingo@elte.hu, linux-mm@kvack.org,
	akpm@linux-foundation.org, hpa@zytor.com,
	gregory.haskins@gmail.com
Subject: Re: [PATCHv3 2/2] vhost_net: a kernel-level virtio server
Date: Fri, 14 Aug 2009 13:40:36 +0200	[thread overview]
Message-ID: <200908141340.36176.arnd@arndb.de> (raw)
In-Reply-To: <20090813182931.GC6585@redhat.com>

On Thursday 13 August 2009, Michael S. Tsirkin wrote:
> What it is: vhost net is a character device that can be used to reduce
> the number of system calls involved in virtio networking.
> Existing virtio net code is used in the guest without modification.

AFAICT, you have addressed all my comments, mostly by convincing me
that you got it right anyway ;-).

I hope this gets into 2.6.32, good work!

> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Acked-by: Arnd Bergmann <arnd@arndb.de>

One idea though:

> +	/* Parameter checking */
> +	if (sock->sk->sk_type != SOCK_RAW) {
> +		r = -ESOCKTNOSUPPORT;
> +		goto done;
> +	}
> +
> +	r = sock->ops->getname(sock, (struct sockaddr *)&uaddr.sa,
> +			       &uaddr_len, 0);
> +	if (r)
> +		goto done;
> +
> +	if (uaddr.sa.sll_family != AF_PACKET) {
> +		r = -EPFNOSUPPORT;
> +		goto done;
> +	}

You currently limit the scope of the driver by only allowing raw packet
sockets to be passed into the network driver. In qemu, we currently support
some very similar transports:

* raw packet (not in a release yet)
* tcp connection
* UDP multicast
* tap character device
* VDE with Unix local sockets

My primary interest right now is the tap support, but I think it would
be interesting in general to allow different file descriptor types
in vhost_net_set_socket. AFAICT, there are two major differences
that we need to handle for this:

* most of the transports are sockets, tap uses a character device.
  This could be dealt with by having both a struct socket * in
  struct vhost_net *and* a struct file *, or by always keeping the
  struct file and calling vfs_readv/vfs_writev for the data transport
  in both cases.

* Each transport has a slightly different header, we have
  - raw ethernet frames (raw, udp multicast, tap)
  - 32-bit length + raw frames, possibly fragmented (tcp)
  - 80-bit header + raw frames, possibly fragmented (tap with vnet_hdr)
  To handle these three cases, we need either different ioctl numbers
  so that vhost_net can choose the right one, or a flags field in
  VHOST_NET_SET_SOCKET, like

  #define VHOST_NET_RAW		1
  #define VHOST_NET_LEN_HDR	2
  #define VHOST_NET_VNET_HDR	4

  struct vhost_net_socket {
	unsigned int flags;
	int fd;
  };
  #define VHOST_NET_SET_SOCKET _IOW(VHOST_VIRTIO, 0x30, struct vhost_net_socket)

If both of those are addressed, we can treat vhost_net as a generic
way to do network handling in the kernel independent of the qemu
model (raw, tap, ...) for it. 

Your qemu patch would have to work differently, so instead of 

qemu -net nic,vhost=eth0

you would do the same as today with the raw packet socket extension

qemu -net nic -net raw,ifname=eth0 

Qemu could then automatically try to use vhost_net, if it's available
in the kernel, or just fall back on software vlan otherwise.
Does that make sense?

	Arnd <>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-08-14 11:41 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1250187913.git.mst@redhat.com>
2009-08-13 18:29 ` [PATCHv3 1/2] mm: export use_mm/unuse_mm to modules Michael S. Tsirkin
2009-08-13 18:29 ` [PATCHv3 2/2] vhost_net: a kernel-level virtio server Michael S. Tsirkin
2009-08-14 11:40   ` Arnd Bergmann [this message]
2009-08-16  6:51     ` Michael S. Tsirkin
2009-08-19  9:04       ` Arnd Bergmann
2009-08-19 13:04         ` Michael S. Tsirkin
2009-08-19 13:46           ` Arnd Bergmann
2009-08-19 14:20             ` Michael S. Tsirkin
2009-08-19 15:27               ` Arnd Bergmann
2009-08-20  8:31                 ` Michael S. Tsirkin
2009-08-20 13:10                   ` Arnd Bergmann
2009-08-20 13:38                     ` Michael S. Tsirkin
2009-08-20 14:31                       ` Arnd Bergmann
2009-08-20 14:42                         ` Michael S. Tsirkin
2009-08-20 15:10                           ` Arnd Bergmann
2009-08-21 13:20                       ` Gleb Natapov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200908141340.36176.arnd@arndb.de \
    --to=arnd@arndb.de \
    --cc=akpm@linux-foundation.org \
    --cc=gregory.haskins@gmail.com \
    --cc=hpa@zytor.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox