From: Arnd Bergmann <arnd@arndb.de>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: virtualization@lists.linux-foundation.org,
netdev@vger.kernel.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, mingo@elte.hu, linux-mm@kvack.org,
akpm@linux-foundation.org, hpa@zytor.com,
gregory.haskins@gmail.com, Or Gerlitz <ogerlitz@voltaire.com>
Subject: Re: [PATCHv3 2/2] vhost_net: a kernel-level virtio server
Date: Wed, 19 Aug 2009 11:04:50 +0200 [thread overview]
Message-ID: <200908191104.50672.arnd@arndb.de> (raw)
In-Reply-To: <20090816065110.GA3008@redhat.com>
On Sunday 16 August 2009, Michael S. Tsirkin wrote:
> On Fri, Aug 14, 2009 at 01:40:36PM +0200, Arnd Bergmann wrote:
> >
> > * most of the transports are sockets, tap uses a character device.
> > This could be dealt with by having both a struct socket * in
> > struct vhost_net *and* a struct file *, or by always keeping the
> > struct file and calling vfs_readv/vfs_writev for the data transport
> > in both cases.
>
> I am concerned that character devices might have weird side effects with
> read/write operations and that calling them from kernel thread the way I
> do might have security implications. Can't point at anything specific
> though at the moment.
I understand your feelings about passing a chardev fd into your driver
and I agree that we need to be very careful if we want to allow it.
Maybe we could instead extend the 'splice' system call to work on a
vhost_net file descriptor. If we do that, we can put the access back
into a user thread (or two) that stays in splice indefinetely to
avoid some of the implications of kernel threads like the missing
ability to handle transfer errors in user space.
> I wonder - can we expose the underlying socket used by tap, or will that
> create complex lifetime issues?
I think this could get more messy in the long run than calling vfs_readv
on a random fd. It would mean deep internal knowledge of the tap driver
in vhost_net, which I really would prefer to avoid.
> > * Each transport has a slightly different header, we have
> > - raw ethernet frames (raw, udp multicast, tap)
> > - 32-bit length + raw frames, possibly fragmented (tcp)
> > - 80-bit header + raw frames, possibly fragmented (tap with vnet_hdr)
> > To handle these three cases, we need either different ioctl numbers
> > so that vhost_net can choose the right one, or a flags field in
> > VHOST_NET_SET_SOCKET, like
> >
> > #define VHOST_NET_RAW 1
> > #define VHOST_NET_LEN_HDR 2
> > #define VHOST_NET_VNET_HDR 4
> >
> > struct vhost_net_socket {
> > unsigned int flags;
> > int fd;
> > };
> > #define VHOST_NET_SET_SOCKET _IOW(VHOST_VIRTIO, 0x30, struct vhost_net_socket)
>
> It seems we can query the socket to find out the type,
yes, I understand that you can do that, but I still think that decision
should be left to user space. Adding a length header for TCP streams but
not for UDP is something that we would normally want to do, but IMHO
vhost_net should not need to know about this.
> or use the features ioctl.
Right, I had forgotten about that one. It's probably equivalent
to the flags I suggested, except that one allows you to set features
after starting the communication, while the other one prevents
you from doing that.
> > Qemu could then automatically try to use vhost_net, if it's available
> > in the kernel, or just fall back on software vlan otherwise.
> > Does that make sense?
>
> I agree, long term it should be enabled automatically when possible.
So how about making the qemu command line interface an extension to
what Or Gerlitz has done for the raw packet sockets?
Arnd <><
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-08-19 9:06 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <cover.1250187913.git.mst@redhat.com>
2009-08-13 18:29 ` [PATCHv3 1/2] mm: export use_mm/unuse_mm to modules Michael S. Tsirkin
2009-08-13 18:29 ` [PATCHv3 2/2] vhost_net: a kernel-level virtio server Michael S. Tsirkin
2009-08-14 11:40 ` Arnd Bergmann
2009-08-16 6:51 ` Michael S. Tsirkin
2009-08-19 9:04 ` Arnd Bergmann [this message]
2009-08-19 13:04 ` Michael S. Tsirkin
2009-08-19 13:46 ` Arnd Bergmann
2009-08-19 14:20 ` Michael S. Tsirkin
2009-08-19 15:27 ` Arnd Bergmann
2009-08-20 8:31 ` Michael S. Tsirkin
2009-08-20 13:10 ` Arnd Bergmann
2009-08-20 13:38 ` Michael S. Tsirkin
2009-08-20 14:31 ` Arnd Bergmann
2009-08-20 14:42 ` Michael S. Tsirkin
2009-08-20 15:10 ` Arnd Bergmann
2009-08-21 13:20 ` Gleb Natapov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200908191104.50672.arnd@arndb.de \
--to=arnd@arndb.de \
--cc=akpm@linux-foundation.org \
--cc=gregory.haskins@gmail.com \
--cc=hpa@zytor.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@elte.hu \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=ogerlitz@voltaire.com \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox