From: "Michael S. Tsirkin" <mst@redhat.com>
To: Arnd Bergmann <arnd@arndb.de>
Cc: virtualization@lists.linux-foundation.org,
netdev@vger.kernel.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, mingo@elte.hu, linux-mm@kvack.org,
akpm@linux-foundation.org, hpa@zytor.com,
gregory.haskins@gmail.com, Or Gerlitz <ogerlitz@voltaire.com>
Subject: Re: [PATCHv3 2/2] vhost_net: a kernel-level virtio server
Date: Wed, 19 Aug 2009 16:04:17 +0300 [thread overview]
Message-ID: <20090819130417.GB3080@redhat.com> (raw)
In-Reply-To: <200908191104.50672.arnd@arndb.de>
On Wed, Aug 19, 2009 at 11:04:50AM +0200, Arnd Bergmann wrote:
> On Sunday 16 August 2009, Michael S. Tsirkin wrote:
> > On Fri, Aug 14, 2009 at 01:40:36PM +0200, Arnd Bergmann wrote:
> > >
> > > * most of the transports are sockets, tap uses a character device.
> > > This could be dealt with by having both a struct socket * in
> > > struct vhost_net *and* a struct file *, or by always keeping the
> > > struct file and calling vfs_readv/vfs_writev for the data transport
> > > in both cases.
> >
> > I am concerned that character devices might have weird side effects with
> > read/write operations and that calling them from kernel thread the way I
> > do might have security implications. Can't point at anything specific
> > though at the moment.
>
> I understand your feelings about passing a chardev fd into your driver
> and I agree that we need to be very careful if we want to allow it.
>
> Maybe we could instead extend the 'splice' system call to work on a
> vhost_net file descriptor. If we do that, we can put the access back
> into a user thread (or two) that stays in splice indefinetely
An issue with exposing internal threading model to userspace
in this way is that we lose control of e.g. CPU locality -
and it is very hard for userspace to get it right.
> to
> avoid some of the implications of kernel threads like the missing
> ability to handle transfer errors in user space.
Are you talking about TCP here?
Transfer errors are typically asynchronous - possibly eventfd
as I expose for vhost net is sufficient there.
> > I wonder - can we expose the underlying socket used by tap, or will that
> > create complex lifetime issues?
>
> I think this could get more messy in the long run than calling vfs_readv
> on a random fd. It would mean deep internal knowledge of the tap driver
> in vhost_net, which I really would prefer to avoid.
No, what I had in mind is adding a GET_SOCKET ioctl to tap.
vhost would then just use the socket.
> > > * Each transport has a slightly different header, we have
> > > - raw ethernet frames (raw, udp multicast, tap)
> > > - 32-bit length + raw frames, possibly fragmented (tcp)
> > > - 80-bit header + raw frames, possibly fragmented (tap with vnet_hdr)
> > > To handle these three cases, we need either different ioctl numbers
> > > so that vhost_net can choose the right one, or a flags field in
> > > VHOST_NET_SET_SOCKET, like
> > >
> > > #define VHOST_NET_RAW 1
> > > #define VHOST_NET_LEN_HDR 2
> > > #define VHOST_NET_VNET_HDR 4
> > >
> > > struct vhost_net_socket {
> > > unsigned int flags;
> > > int fd;
> > > };
> > > #define VHOST_NET_SET_SOCKET _IOW(VHOST_VIRTIO, 0x30, struct vhost_net_socket)
> >
> > It seems we can query the socket to find out the type,
>
> yes, I understand that you can do that, but I still think that decision
> should be left to user space. Adding a length header for TCP streams but
> not for UDP is something that we would normally want to do, but IMHO
> vhost_net should not need to know about this.
>
> > or use the features ioctl.
>
> Right, I had forgotten about that one. It's probably equivalent
> to the flags I suggested, except that one allows you to set features
> after starting the communication, while the other one prevents
> you from doing that.
>
> > > Qemu could then automatically try to use vhost_net, if it's available
> > > in the kernel, or just fall back on software vlan otherwise.
> > > Does that make sense?
> >
> > I agree, long term it should be enabled automatically when possible.
>
> So how about making the qemu command line interface an extension to
> what Or Gerlitz has done for the raw packet sockets?
>
> Arnd <><
Not sure I see the connection, but I have not thought about qemu
side of things too much yet - trying to get kernel bits in place
first so that there's a stable ABI to work with.
--
MST
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-08-19 13:06 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <cover.1250187913.git.mst@redhat.com>
2009-08-13 18:29 ` [PATCHv3 1/2] mm: export use_mm/unuse_mm to modules Michael S. Tsirkin
2009-08-13 18:29 ` [PATCHv3 2/2] vhost_net: a kernel-level virtio server Michael S. Tsirkin
2009-08-14 11:40 ` Arnd Bergmann
2009-08-16 6:51 ` Michael S. Tsirkin
2009-08-19 9:04 ` Arnd Bergmann
2009-08-19 13:04 ` Michael S. Tsirkin [this message]
2009-08-19 13:46 ` Arnd Bergmann
2009-08-19 14:20 ` Michael S. Tsirkin
2009-08-19 15:27 ` Arnd Bergmann
2009-08-20 8:31 ` Michael S. Tsirkin
2009-08-20 13:10 ` Arnd Bergmann
2009-08-20 13:38 ` Michael S. Tsirkin
2009-08-20 14:31 ` Arnd Bergmann
2009-08-20 14:42 ` Michael S. Tsirkin
2009-08-20 15:10 ` Arnd Bergmann
2009-08-21 13:20 ` Gleb Natapov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090819130417.GB3080@redhat.com \
--to=mst@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=gregory.haskins@gmail.com \
--cc=hpa@zytor.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@elte.hu \
--cc=netdev@vger.kernel.org \
--cc=ogerlitz@voltaire.com \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox