From: "Michael S. Tsirkin" <mst@redhat.com>
To: Arnd Bergmann <arnd@arndb.de>
Cc: virtualization@lists.linux-foundation.org,
netdev@vger.kernel.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, mingo@elte.hu, linux-mm@kvack.org,
akpm@linux-foundation.org, hpa@zytor.com,
gregory.haskins@gmail.com
Subject: Re: [PATCHv3 2/2] vhost_net: a kernel-level virtio server
Date: Sun, 16 Aug 2009 09:51:10 +0300 [thread overview]
Message-ID: <20090816065110.GA3008@redhat.com> (raw)
In-Reply-To: <200908141340.36176.arnd@arndb.de>
On Fri, Aug 14, 2009 at 01:40:36PM +0200, Arnd Bergmann wrote:
> On Thursday 13 August 2009, Michael S. Tsirkin wrote:
> > What it is: vhost net is a character device that can be used to reduce
> > the number of system calls involved in virtio networking.
> > Existing virtio net code is used in the guest without modification.
>
> AFAICT, you have addressed all my comments, mostly by convincing me
> that you got it right anyway ;-).
>
> I hope this gets into 2.6.32, good work!
>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
>
> One idea though:
>
> > + /* Parameter checking */
> > + if (sock->sk->sk_type != SOCK_RAW) {
> > + r = -ESOCKTNOSUPPORT;
> > + goto done;
> > + }
> > +
> > + r = sock->ops->getname(sock, (struct sockaddr *)&uaddr.sa,
> > + &uaddr_len, 0);
> > + if (r)
> > + goto done;
> > +
> > + if (uaddr.sa.sll_family != AF_PACKET) {
> > + r = -EPFNOSUPPORT;
> > + goto done;
> > + }
>
> You currently limit the scope of the driver by only allowing raw packet
> sockets to be passed into the network driver. In qemu, we currently support
> some very similar transports:
>
> * raw packet (not in a release yet)
> * tcp connection
> * UDP multicast
> * tap character device
> * VDE with Unix local sockets
>
> My primary interest right now is the tap support, but I think it would
> be interesting in general to allow different file descriptor types
> in vhost_net_set_socket. AFAICT, there are two major differences
> that we need to handle for this:
>
> * most of the transports are sockets, tap uses a character device.
> This could be dealt with by having both a struct socket * in
> struct vhost_net *and* a struct file *, or by always keeping the
> struct file and calling vfs_readv/vfs_writev for the data transport
> in both cases.
I am concerned that character devices might have weird side effects with
read/write operations and that calling them from kernel thread the way I
do might have security implications. Can't point at anything specific
though at the moment.
I wonder - can we expose the underlying socket used by tap, or will that
create complex lifetime issues?
> * Each transport has a slightly different header, we have
> - raw ethernet frames (raw, udp multicast, tap)
> - 32-bit length + raw frames, possibly fragmented (tcp)
> - 80-bit header + raw frames, possibly fragmented (tap with vnet_hdr)
> To handle these three cases, we need either different ioctl numbers
> so that vhost_net can choose the right one, or a flags field in
> VHOST_NET_SET_SOCKET, like
>
> #define VHOST_NET_RAW 1
> #define VHOST_NET_LEN_HDR 2
> #define VHOST_NET_VNET_HDR 4
>
> struct vhost_net_socket {
> unsigned int flags;
> int fd;
> };
> #define VHOST_NET_SET_SOCKET _IOW(VHOST_VIRTIO, 0x30, struct vhost_net_socket)
It seems we can query the socket to find out the type, or use the
features ioctl.
> If both of those are addressed, we can treat vhost_net as a generic
> way to do network handling in the kernel independent of the qemu
> model (raw, tap, ...) for it.
>
> Your qemu patch would have to work differently, so instead of
>
> qemu -net nic,vhost=eth0
>
> you would do the same as today with the raw packet socket extension
>
> qemu -net nic -net raw,ifname=eth0
>
> Qemu could then automatically try to use vhost_net, if it's available
> in the kernel, or just fall back on software vlan otherwise.
> Does that make sense?
>
> Arnd <>
I agree, long term it should be enabled automatically when possible.
--
MST
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-08-16 6:52 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <cover.1250187913.git.mst@redhat.com>
2009-08-13 18:29 ` [PATCHv3 1/2] mm: export use_mm/unuse_mm to modules Michael S. Tsirkin
2009-08-13 18:29 ` [PATCHv3 2/2] vhost_net: a kernel-level virtio server Michael S. Tsirkin
2009-08-14 11:40 ` Arnd Bergmann
2009-08-16 6:51 ` Michael S. Tsirkin [this message]
2009-08-19 9:04 ` Arnd Bergmann
2009-08-19 13:04 ` Michael S. Tsirkin
2009-08-19 13:46 ` Arnd Bergmann
2009-08-19 14:20 ` Michael S. Tsirkin
2009-08-19 15:27 ` Arnd Bergmann
2009-08-20 8:31 ` Michael S. Tsirkin
2009-08-20 13:10 ` Arnd Bergmann
2009-08-20 13:38 ` Michael S. Tsirkin
2009-08-20 14:31 ` Arnd Bergmann
2009-08-20 14:42 ` Michael S. Tsirkin
2009-08-20 15:10 ` Arnd Bergmann
2009-08-21 13:20 ` Gleb Natapov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090816065110.GA3008@redhat.com \
--to=mst@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=gregory.haskins@gmail.com \
--cc=hpa@zytor.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@elte.hu \
--cc=netdev@vger.kernel.org \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox