From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f200.google.com (mail-pf0-f200.google.com [209.85.192.200]) by kanga.kvack.org (Postfix) with ESMTP id 4B9D96B03A6 for ; Wed, 5 Apr 2017 03:48:05 -0400 (EDT) Received: by mail-pf0-f200.google.com with SMTP id d79so2869701pfe.18 for ; Wed, 05 Apr 2017 00:48:05 -0700 (PDT) Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id s26si19478366pge.221.2017.04.05.00.48.04 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 05 Apr 2017 00:48:04 -0700 (PDT) From: "Wang, Wei W" Subject: RE: [PATCH kernel v8 2/4] virtio-balloon: VIRTIO_BALLOON_F_CHUNK_TRANSFER Date: Wed, 5 Apr 2017 07:47:33 +0000 Message-ID: <286AC319A985734F985F78AFA26841F7391E1AFC@shsmsx102.ccr.corp.intel.com> References: <1489648127-37282-1-git-send-email-wei.w.wang@intel.com> <1489648127-37282-3-git-send-email-wei.w.wang@intel.com> <286AC319A985734F985F78AFA26841F7391E1962@shsmsx102.ccr.corp.intel.com> <20170405065313-mutt-send-email-mst@kernel.org> <286AC319A985734F985F78AFA26841F7391E19B9@shsmsx102.ccr.corp.intel.com> In-Reply-To: <286AC319A985734F985F78AFA26841F7391E19B9@shsmsx102.ccr.corp.intel.com> Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Sender: owner-linux-mm@kvack.org List-ID: To: "Wang, Wei W" , "Michael S. Tsirkin" Cc: "aarcange@redhat.com" , "virtio-dev@lists.oasis-open.org" , "kvm@vger.kernel.org" , "qemu-devel@nongnu.org" , "amit.shah@redhat.com" , "liliang.opensource@gmail.com" , "Hansen, Dave" , "linux-kernel@vger.kernel.org" , "virtualization@lists.linux-foundation.org" , "linux-mm@kvack.org" , "cornelia.huck@de.ibm.com" , "pbonzini@redhat.com" , "akpm@linux-foundation.org" , "mgorman@techsingularity.net" On Wednesday, April 5, 2017 12:31 PM, Wei Wang wrote: > On Wednesday, April 5, 2017 11:54 AM, Michael S. Tsirkin wrote: > > On Wed, Apr 05, 2017 at 03:31:36AM +0000, Wang, Wei W wrote: > > > On Thursday, March 16, 2017 3:09 PM Wei Wang wrote: > > > > The implementation of the current virtio-balloon is not very > > > > efficient, because the ballooned pages are transferred to the host > > > > one by one. Here is the breakdown of the time in percentage spent > > > > on each step of the balloon inflating process (inflating 7GB of an = 8GB idle > guest). > > > > > > > > 1) allocating pages (6.5%) > > > > 2) sending PFNs to host (68.3%) > > > > 3) address translation (6.1%) > > > > 4) madvise (19%) > > > > > > > > It takes about 4126ms for the inflating process to complete. > > > > The above profiling shows that the bottlenecks are stage 2) and sta= ge 4). > > > > > > > > This patch optimizes step 2) by transferring pages to the host in > > > > chunks. A chunk consists of guest physically continuous pages, and > > > > it is offered to the host via a base PFN (i.e. the start PFN of > > > > those physically continuous pages) and the size (i.e. the total > > > > number of the > > pages). A chunk is formated as below: > > > > > > > > -------------------------------------------------------- > > > > | Base (52 bit) | Rsvd (12 bit) | > > > > -------------------------------------------------------- > > > > -------------------------------------------------------- > > > > | Size (52 bit) | Rsvd (12 bit) | > > > > -------------------------------------------------------- > > > > > > > > By doing so, step 4) can also be optimized by doing address > > > > translation and > > > > madvise() in chunks rather than page by page. > > > > > > > > This optimization requires the negotiation of a new feature bit, > > > > VIRTIO_BALLOON_F_CHUNK_TRANSFER. > > > > > > > > With this new feature, the above ballooning process takes ~590ms > > > > resulting in an improvement of ~85%. > > > > > > > > TODO: optimize stage 1) by allocating/freeing a chunk of pages > > > > instead of a single page each time. > > > > > > > > Signed-off-by: Liang Li > > > > Signed-off-by: Wei Wang > > > > Suggested-by: Michael S. Tsirkin > > > > --- > > > > drivers/virtio/virtio_balloon.c | 371 > > +++++++++++++++++++++++++++++++++- > > > > -- > > > > include/uapi/linux/virtio_balloon.h | 9 + > > > > 2 files changed, 353 insertions(+), 27 deletions(-) > > > > > > > > diff --git a/drivers/virtio/virtio_balloon.c > > > > b/drivers/virtio/virtio_balloon.c index > > > > f59cb4f..3f4a161 100644 > > > > --- a/drivers/virtio/virtio_balloon.c > > > > +++ b/drivers/virtio/virtio_balloon.c > > > > @@ -42,6 +42,10 @@ > > > > #define OOM_VBALLOON_DEFAULT_PAGES 256 #define > > > > VIRTBALLOON_OOM_NOTIFY_PRIORITY 80 > > > > > > > > +#define PAGE_BMAP_SIZE (8 * PAGE_SIZE) > > > > +#define PFNS_PER_PAGE_BMAP (PAGE_BMAP_SIZE * BITS_PER_BYTE) > > > > +#define PAGE_BMAP_COUNT_MAX 32 > > > > + > > > > static int oom_pages =3D OOM_VBALLOON_DEFAULT_PAGES; > > > > module_param(oom_pages, int, S_IRUSR | S_IWUSR); > > > > MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); @@ -50,6 > > +54,14 > > > > @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); static > > > > struct vfsmount *balloon_mnt; #endif > > > > > > > > +#define BALLOON_CHUNK_BASE_SHIFT 12 #define > > > > +BALLOON_CHUNK_SIZE_SHIFT 12 struct balloon_page_chunk { > > > > + __le64 base; > > > > + __le64 size; > > > > +}; > > > > + > > > > +typedef __le64 resp_data_t; > > > > struct virtio_balloon { > > > > struct virtio_device *vdev; > > > > struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; @@ -67,6 > > > > +79,31 @@ struct virtio_balloon { > > > > > > > > /* Number of balloon pages we've told the Host we're not using. *= / > > > > unsigned int num_pages; > > > > + /* Pointer to the response header. */ > > > > + struct virtio_balloon_resp_hdr *resp_hdr; > > > > + /* Pointer to the start address of response data. */ > > > > + resp_data_t *resp_data; > > > > > > I think the implementation has an issue here - both the balloon > > > pages and the > > unused pages use the same buffer ("resp_data" above) to store chunks. > > It would cause a race in this case: live migration starts while balloon= ing is also > in progress. > > I plan to use separate buffers for CHUNKS_OF_BALLOON_PAGES and > > CHUNKS_OF_UNUSED_PAGES. Please let me know if you have a different > > suggestion. Thanks. > > > > > > Best, > > > Wei > > > > Is only one resp data ever in flight for each kind? > > If not you want as many buffers as vq allows. > > >=20 > No, all the kinds were using only one resp_data. I will make it one resp_= data for > each kind. >=20 Just in case it wasn't well explained - it is one resp data in flight for e= ach kind, but the two kinds share the one resp data under the balloon_lock.= I'm thinking would it be worthwhile to have them use separate resp data, = so that when live migration begins, it doesn't need to wait to get unused p= ages (in the case when "inflating" is using resp data and not sure how long= would it take to finish). The unused pages are useful for the first round = of ram transfer, so I think it is important to get them as soon as possible= . Best, Wei -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org