From: "Li, Liang Z" <liang.z.li@intel.com>
To: "Hansen, Dave" <dave.hansen@intel.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Cc: "virtualization@lists.linux-foundation.org"
<virtualization@lists.linux-foundation.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"virtio-dev@lists.oasis-open.org"
<virtio-dev@lists.oasis-open.org>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"quintela@redhat.com" <quintela@redhat.com>,
"dgilbert@redhat.com" <dgilbert@redhat.com>,
"mst@redhat.com" <mst@redhat.com>,
"jasowang@redhat.com" <jasowang@redhat.com>,
"kirill.shutemov@linux.intel.com"
<kirill.shutemov@linux.intel.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"mhocko@suse.com" <mhocko@suse.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
Mel Gorman <mgorman@techsingularity.net>,
Cornelia Huck <cornelia.huck@de.ibm.com>,
Amit Shah <amit.shah@redhat.com>
Subject: RE: [PATCH kernel v5 5/5] virtio-balloon: tell host vm's unused page info
Date: Tue, 6 Dec 2016 04:47:27 +0000 [thread overview]
Message-ID: <F2CBF3009FA73547804AE4C663CAB28E3A12F831@shsmsx102.ccr.corp.intel.com> (raw)
In-Reply-To: <70ece7a5-348b-2eb9-c40a-f21b08df042c@intel.com>
> >>> + mutex_lock(&vb->balloon_lock);
> >>> +
> >>> + for (order = MAX_ORDER - 1; order >= 0; order--) {
> >>
> >> I scratched my head for a bit on this one. Why are you walking over
> >> orders,
> >> *then* zones. I *think* you're doing it because you can efficiently
> >> fill the bitmaps at a given order for all zones, then move to a new
> >> bitmap. But, it would be interesting to document this.
> >
> > Yes, use the order is somewhat strange, but it's helpful to keep the API simple.
> > Do you think it's acceptable?
>
> Yeah, it's fine. Just comment it, please.
>
Good!
> >>> + if (ret == -ENOSPC) {
> >>> + void *new_resp_data;
> >>> +
> >>> + new_resp_data = kmalloc(2 * vb->resp_buf_size,
> >>> + GFP_KERNEL);
> >>> + if (new_resp_data) {
> >>> + kfree(vb->resp_data);
> >>> + vb->resp_data = new_resp_data;
> >>> + vb->resp_buf_size *= 2;
> >>
> >> What happens to the data in ->resp_data at this point? Doesn't this
> >> just throw it away?
> >
> > Yes, so we should make sure the data in resp_data is not inuse.
>
> But doesn't it have valid data that we just collected and haven't told the
> hypervisor about yet? Aren't we throwing away good data that cost us
> something to collect?
Indeed. Some filled data may exist for the previous zone. Should we
change the API to
'int get_unused_pages(unsigned long *unused_pages, unsigned long size,
int order, unsigned long *pos, struct zone *zone)' ?
then we can use the 'zone' to record the zone to retry and not discard the
filled data.
> >> ...
> >>> +struct page_info_item {
> >>> + __le64 start_pfn : 52; /* start pfn for the bitmap */
> >>> + __le64 page_shift : 6; /* page shift width, in bytes */
>
> What does a page_shift "in bytes" mean? :)
Obviously, you know. :o
I will try to make it clear.
>
> >>> + __le64 bmap_len : 6; /* bitmap length, in bytes */ };
> >>
> >> Is 'bmap_len' too short? a 64-byte buffer is a bit tiny. Right?
> >
> > Currently, we just use the 8 bytes and 0 bytes bitmap, should we support
> more than 64 bytes?
>
> It just means that with this format, you end up wasting at least ~1/8th of the
> space with metadata. That's a bit unfortunate, but I guess it's not fatal.
>
> I'd definitely call it out in the patch description and make sure other folks take
> a look at it.
OK.
>
> There's a somewhat easy fix, but that would make the qemu implementation
> more complicated: You could just have bmap_len==0x3f imply that there's
> another field that contains an extended bitmap length for when you need long
> bitmaps.
>
> But, as you note, there's no need for it, so it's a matter of trading the extra
> complexity versus the desire to not habing to change the ABI again for longer
> (hopefully).
>
Your suggestion still works without changing the current code, just reserve
' bmap_len==0x3f' for future extension, and it's not used by the current code.
> >>> +static int mark_unused_pages(struct zone *zone,
> >>> + unsigned long *unused_pages, unsigned long size,
> >>> + int order, unsigned long *pos)
> >>> +{
> >>> + unsigned long pfn, flags;
> >>> + unsigned int t;
> >>> + struct list_head *curr;
> >>> + struct page_info_item *info;
> >>> +
> >>> + if (zone_is_empty(zone))
> >>> + return 0;
> >>> +
> >>> + spin_lock_irqsave(&zone->lock, flags);
> >>> +
> >>> + if (*pos + zone->free_area[order].nr_free > size)
> >>> + return -ENOSPC;
> >>
> >> Urg, so this won't partially fill? So, what the nr_free pages limit
> >> where we no longer fit in the kmalloc()'d buffer where this simply won't
> work?
> >
> > Yes. My initial implementation is partially fill, it's better for the worst case.
> > I thought the above code is more efficient for most case ...
> > Do you think partially fill the bitmap is better?
>
> Could you please answer the question I asked?
>
For your question:
-------------------------------------------------------------------------------------------------------
>So, what the nr_free pages limit where we no longer fit in the kmalloc()'d buffer
> where this simply won't work?
------------------------------------------------------------------------------------------------------
No, if the buffer is not big enough to save 'nr_free' pages, get_unused_pages() will return
'-ENOSPC', and the following code will try to allocate a 2x times size buffer for retrying,
until the proper size buffer is allocated. The current order will not be skipped unless the
buffer allocation failed.
> Because if you don't get this right, it could mean that there are system that
> simply *fail* here.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-12-06 4:47 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-30 8:43 [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration Liang Li
2016-11-30 8:43 ` [PATCH kernel v5 1/5] virtio-balloon: rework deflate to add page to a list Liang Li
2016-11-30 8:43 ` [PATCH kernel v5 2/5] virtio-balloon: define new feature bit and head struct Liang Li
2016-11-30 8:43 ` [PATCH kernel v5 3/5] virtio-balloon: speed up inflate/deflate process Liang Li
2016-11-30 8:43 ` [PATCH kernel v5 4/5] virtio-balloon: define flags and head for host request vq Liang Li
2016-11-30 8:43 ` [PATCH kernel v5 5/5] virtio-balloon: tell host vm's unused page info Liang Li
2016-11-30 19:15 ` Dave Hansen
2016-12-04 13:13 ` Li, Liang Z
2016-12-05 17:22 ` Dave Hansen
2016-12-06 4:47 ` Li, Liang Z [this message]
2016-12-06 8:40 ` [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration David Hildenbrand
2016-12-07 13:35 ` Li, Liang Z
2016-12-07 15:34 ` Dave Hansen
2016-12-09 3:09 ` Li, Liang Z
2016-12-07 15:42 ` David Hildenbrand
2016-12-07 15:45 ` Dave Hansen
2016-12-07 16:21 ` David Hildenbrand
2016-12-07 16:57 ` Dave Hansen
2016-12-07 18:38 ` [Qemu-devel] " Andrea Arcangeli
2016-12-07 18:44 ` Dave Hansen
2016-12-07 18:58 ` Andrea Arcangeli
2016-12-07 19:54 ` Dave Hansen
2016-12-07 20:28 ` Andrea Arcangeli
2016-12-09 4:45 ` Li, Liang Z
2016-12-09 4:53 ` Dave Hansen
2016-12-09 5:35 ` Li, Liang Z
2016-12-09 16:42 ` Andrea Arcangeli
2016-12-14 8:20 ` Li, Liang Z
2016-12-14 8:59 ` Li, Liang Z
2016-12-15 15:34 ` Dave Hansen
2016-12-15 15:54 ` Michael S. Tsirkin
2016-12-16 1:12 ` Li, Liang Z
2016-12-16 15:40 ` Andrea Arcangeli
2016-12-17 11:56 ` Li, Liang Z
2016-12-16 0:48 ` Li, Liang Z
2016-12-16 1:09 ` Dave Hansen
2016-12-16 1:38 ` Li, Liang Z
2016-12-16 1:40 ` Dave Hansen
2016-12-16 1:43 ` Li, Liang Z
2016-12-16 16:01 ` Andrea Arcangeli
2016-12-17 12:39 ` Li, Liang Z
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=F2CBF3009FA73547804AE4C663CAB28E3A12F831@shsmsx102.ccr.corp.intel.com \
--to=liang.z.li@intel.com \
--cc=akpm@linux-foundation.org \
--cc=amit.shah@redhat.com \
--cc=cornelia.huck@de.ibm.com \
--cc=dave.hansen@intel.com \
--cc=dgilbert@redhat.com \
--cc=jasowang@redhat.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=virtio-dev@lists.oasis-open.org \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox