From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: wei.w.wang@intel.com, willy@infradead.org
Cc: virtio-dev@lists.oasis-open.org, linux-kernel@vger.kernel.org,
qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org,
kvm@vger.kernel.org, linux-mm@kvack.org, mst@redhat.com,
mhocko@kernel.org, akpm@linux-foundation.org,
mawilcox@microsoft.com, david@redhat.com,
cornelia.huck@de.ibm.com, mgorman@techsingularity.net,
aarcange@redhat.com, amit.shah@redhat.com, pbonzini@redhat.com,
liliang.opensource@gmail.com, yang.zhang.wz@gmail.com,
quan.xu0@gmail.com, nilal@redhat.com, riel@redhat.com
Subject: Re: [PATCH v20 4/7] virtio-balloon: VIRTIO_BALLOON_F_SG
Date: Tue, 26 Dec 2017 19:38:13 +0900 [thread overview]
Message-ID: <201712261938.IFF64061.LtFMOVJFHOSFQO@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <5A41BCC1.5010004@intel.com>
Wei Wang wrote:
> On 12/25/2017 10:51 PM, Tetsuo Handa wrote:
> > Wei Wang wrote:
> >>>>>> @@ -173,8 +292,15 @@ static unsigned fill_balloon(struct
> >>>>>> virtio_balloon *vb, size_t num)
> >>>>>> while ((page = balloon_page_pop(&pages))) {
> >>>>>> balloon_page_enqueue(&vb->vb_dev_info, page);
> >>>>>> + if (use_sg) {
> >>>>>> + if (xb_set_page(vb, page, &pfn_min, &pfn_max) < 0) {
> >>>>>> + __free_page(page);
> >>>>>> + continue;
> >>>>>> + }
> >>>>>> + } else {
> >>>>>> + set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> >>>>>> + }
> >>>>> Is this the right behaviour?
> >>>> I don't think so. In the worst case, we can set no bit using
> >>>> xb_set_page().
> >>>>> If we can't record the page in the xb,
> >>>>> wouldn't we rather send it across as a single page?
> >>>>>
> >>>> I think that we need to be able to fallback to !use_sg path when OOM.
> >>> I also have different thoughts:
> >>>
> >>> 1) For OOM, we have leak_balloon_sg_oom (oom has nothing to do with
> >>> fill_balloon), which does not use xbitmap to record pages, thus no
> >>> memory allocation.
> >>>
> >>> 2) If the memory is already under pressure, it is pointless to
> >>> continue inflating memory to the host. We need to give thanks to the
> >>> memory allocation failure reported by xbitmap, which gets us a chance
> >>> to release the inflated pages that have been demonstrated to cause the
> >>> memory pressure of the guest.
> >>>
> >> Forgot to add my conclusion: I think the above behavior is correct.
> >>
> > What is the desired behavior when hitting OOM path during inflate/deflate?
> > Once inflation started, the inflation logic is called again and again
> > until the balloon inflates to the requested size.
>
> The above is true, but I can't agree with the following. Please see below.
>
> > Such situation will
> > continue wasting CPU resource between inflate-due-to-host's-request versus
> > deflate-due-to-guest's-OOM. It is pointless but cannot stop doing pointless
> > thing.
>
> What we are doing here is to free the pages that were just allocated in
> this round of inflating. Next round will be sometime later when the
> balloon work item gets its turn to run. Yes, it will then continue to
> inflate.
> Here are the two cases that will happen then:
> 1) the guest is still under memory pressure, the inflate will fail at
> memory allocation, which results in a msleep(200), and then it exists
> for another time to run.
> 2) the guest isn't under memory pressure any more (e.g. the task which
> consumes the huge amount of memory is gone), it will continue to inflate
> as normal till the requested size.
>
How likely does 2) occur? It is not so likely. msleep(200) is enough to spam
the guest with puff messages. Next round is starting too quickly.
> I think what we are doing is a quite sensible behavior, except a small
> change I plan to make:
>
> while ((page = balloon_page_pop(&pages))) {
> - balloon_page_enqueue(&vb->vb_dev_info, page);
> if (use_sg) {
> if (xb_set_page(vb, page, &pfn_min, &pfn_max) <
> 0) {
> __free_page(page);
> continue;
> }
> } else {
> set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> }
> + balloon_page_enqueue(&vb->vb_dev_info, page);
>
> >
> > Also, as of Linux 4.15, only up to VIRTIO_BALLOON_ARRAY_PFNS_MAX pages (i.e.
> > 1MB) are invisible from deflate request. That amount would be an acceptable
> > error. But your patch makes more pages being invisible, for pages allocated
> > by balloon_page_alloc() without holding balloon_lock are stored into a local
> > variable "LIST_HEAD(pages)" (which means that balloon_page_dequeue() with
> > balloon_lock held won't be able to find pages not yet queued by
> > balloon_page_enqueue()), doesn't it? What if all memory pages were held in
> > "LIST_HEAD(pages)" and balloon_page_dequeue() was called before
> > balloon_page_enqueue() is called?
> >
>
> If we think of the balloon driver just as a regular driver or
> application, that will be a pretty nature thing. A regular driver can
> eat a huge amount of memory for its own usages, would this amount of
> memory be treated as an error as they are invisible to the
> balloon_page_enqueue?
>
No. Memory used by applications which consumed a lot of memory in their
mm_struct is reclaimed by the OOM killer/reaper. Drivers try to avoid
allocating more memory than they need. If drivers allocate more memory
than they need, they have a hook for releasing unused memory (i.e.
register_shrinker() or OOM notifier). What I'm saying here is that
the hook for releasing unused memory does not work unless memory held in
LIST_HEAD(pages) becomes visible to balloon_page_dequeue().
If a system has 128GB of memory, and 127GB of memory was stored into
LIST_HEAD(pages) upon first fill_balloon() request, and somebody held
balloon_lock from OOM notifier path from out_of_memory() before
fill_balloon() holds balloon_lock, leak_balloon_sg_oom() finds that
no memory can be freed because balloon_page_enqueue() was never called,
and allows the caller of out_of_memory() to invoke the OOM killer despite
there is 127GB of memory which can be freed if fill_balloon() was able
to hold balloon_lock before leak_balloon_sg_oom() holds balloon_lock.
I don't think that that amount is an acceptable error.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-12-26 10:38 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-19 12:17 [PATCH v20 0/7] Virtio-balloon Enhancement Wei Wang
2017-12-19 12:17 ` [PATCH v20 1/7] xbitmap: Introduce xbitmap Wei Wang
2017-12-19 15:58 ` Philippe Ombredanne
2017-12-19 12:17 ` [PATCH v20 2/7] xbitmap: potential improvement Wei Wang
2017-12-19 12:17 ` [PATCH v20 3/7] xbitmap: add more operations Wei Wang
2017-12-19 12:17 ` [PATCH v20 4/7] virtio-balloon: VIRTIO_BALLOON_F_SG Wei Wang
2017-12-24 3:21 ` Matthew Wilcox
2017-12-24 4:45 ` Tetsuo Handa
2017-12-24 7:42 ` Wei Wang
2017-12-24 8:16 ` [virtio-dev] " Wei Wang
2017-12-25 14:51 ` Tetsuo Handa
2017-12-26 3:06 ` Wei Wang
2017-12-26 10:38 ` Tetsuo Handa [this message]
2017-12-26 11:36 ` Wei Wang
2017-12-26 13:40 ` Tetsuo Handa
2018-01-02 13:24 ` Matthew Wilcox
2018-01-03 2:29 ` Tetsuo Handa
2018-01-03 9:00 ` Wei Wang
2018-01-03 10:19 ` Tetsuo Handa
2017-12-19 12:17 ` [PATCH v20 5/7] mm: support reporting free page blocks Wei Wang
2017-12-19 12:17 ` [PATCH v20 6/7] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ Wei Wang
2017-12-19 12:17 ` [PATCH v20 7/7] virtio-balloon: don't report free pages when page poisoning is enabled Wei Wang
2017-12-19 14:05 ` [PATCH v20 0/7] Virtio-balloon Enhancement Tetsuo Handa
2017-12-19 14:40 ` Matthew Wilcox
2017-12-20 2:33 ` Tetsuo Handa
2017-12-19 18:08 ` Michael S. Tsirkin
2017-12-20 10:34 ` Wei Wang
2017-12-20 12:25 ` Matthew Wilcox
2017-12-20 16:13 ` Wang, Wei W
2017-12-20 17:10 ` Matthew Wilcox
2017-12-21 2:49 ` Wei Wang
2017-12-21 12:14 ` Matthew Wilcox
2017-12-21 12:56 ` Tetsuo Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201712261938.IFF64061.LtFMOVJFHOSFQO@I-love.SAKURA.ne.jp \
--to=penguin-kernel@i-love.sakura.ne.jp \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=amit.shah@redhat.com \
--cc=cornelia.huck@de.ibm.com \
--cc=david@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=liliang.opensource@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mawilcox@microsoft.com \
--cc=mgorman@techsingularity.net \
--cc=mhocko@kernel.org \
--cc=mst@redhat.com \
--cc=nilal@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quan.xu0@gmail.com \
--cc=riel@redhat.com \
--cc=virtio-dev@lists.oasis-open.org \
--cc=virtualization@lists.linux-foundation.org \
--cc=wei.w.wang@intel.com \
--cc=willy@infradead.org \
--cc=yang.zhang.wz@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox