From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: mst@redhat.com
Cc: mhocko@kernel.org, wei.w.wang@intel.com,
virtualization@lists.linux-foundation.org, linux-mm@kvack.org,
rmaksudova@parallels.com, den@openvz.org
Subject: Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
Date: Wed, 18 Oct 2017 19:59:23 +0900 [thread overview]
Message-ID: <201710181959.ACI05296.JLMVQOOFtHSOFF@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20171016195317-mutt-send-email-mst@kernel.org>
Tetsuo Handa wrote:
> 20171016-deflate.log.xz continued printing "puff" messages without any OOM
> killer messages, for fill_balloon() always inflates faster than leak_balloon()
> deflates.
>
> Since the OOM killer cannot be invoked unless leak_balloon() completely
> deflates faster than fill_balloon() inflates, the guest remained unusable
> (e.g. unable to login via ssh) other than printing "puff" messages.
> This result was worse than 20171016-default.log.xz , for the system was
> not able to make any forward progress (i.e. complete OOM lockup).
I tested further and found that it is not complete OOM lockup.
It turned out that the reason of being unable to login via ssh was that fork()
was failing because __vm_enough_memory() was failing because
/proc/sys/vm/overcommit_memory was set to 0. Although virtio_balloon driver
was ready to release pages if asked via virtballoon_oom_notify() from
out_of_memory(), __vm_enough_memory() was not able to take such pages into
account. As a result, operations which need to use fork() were failing without
calling out_of_memory().
( http://lkml.kernel.org/r/201710181954.FHH51594.MtFOFLOQFSOHVJ@I-love.SAKURA.ne.jp )
Do you see anything wrong with the patch I used for emulating
VIRTIO_BALLOON_F_DEFLATE_ON_OOM path (shown below) ?
----------------------------------------
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index f0b3a0b..a679ac2 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -164,7 +164,7 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
}
set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
- if (!virtio_has_feature(vb->vdev,
+ if (virtio_has_feature(vb->vdev,
VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
adjust_managed_page_count(page, -1);
}
@@ -184,7 +184,7 @@ static void release_pages_balloon(struct virtio_balloon *vb,
struct page *page, *next;
list_for_each_entry_safe(page, next, pages, lru) {
- if (!virtio_has_feature(vb->vdev,
+ if (virtio_has_feature(vb->vdev,
VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
adjust_managed_page_count(page, 1);
list_del(&page->lru);
@@ -363,7 +363,7 @@ static int virtballoon_oom_notify(struct notifier_block *self,
unsigned num_freed_pages;
vb = container_of(self, struct virtio_balloon, nb);
- if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
+ if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
return NOTIFY_OK;
freed = parm;
----------------------------------------
> As I demonstrated above, VIRTIO_BALLOON_F_DEFLATE_ON_OOM can lead to complete
> OOM lockup because out_of_memory() => fill_balloon() => out_of_memory() =>
> fill_balloon() sequence can effectively disable the OOM killer when the host
> assumed that it's safe to inflate the balloon to a large portion of guest
> memory and this won't cause an OOM situation.
The other problem is that, although it is not complete OOM lockup, it is too
slow to wait if we hit out_of_memory() => fill_balloon() => out_of_memory() =>
fill_balloon() sequence.
> If leak_balloon() from out_of_memory() should be stronger than
> fill_balloon() from update_balloon_size_func(), we need to make
> sure that update_balloon_size_func() stops calling fill_balloon()
> when leak_balloon() was called from out_of_memory().
I tried below patch to reduce the possibility of hitting out_of_memory() =>
fill_balloon() => out_of_memory() => fill_balloon() sequence.
----------------------------------------
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index a679ac2..9037fee 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -57,7 +57,7 @@ struct virtio_balloon {
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
- struct work_struct update_balloon_size_work;
+ struct delayed_work update_balloon_size_work;
/* Prevent updating balloon when it is being canceled. */
spinlock_t stop_update_lock;
@@ -88,6 +88,7 @@ struct virtio_balloon {
/* To register callback in oom notifier call chain */
struct notifier_block nb;
+ struct timer_list deflate_on_oom_timer;
};
static struct virtio_device_id id_table[] = {
@@ -141,7 +142,8 @@ static void set_page_pfns(struct virtio_balloon *vb,
page_to_balloon_pfn(page) + i);
}
-static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
+static unsigned fill_balloon(struct virtio_balloon *vb, size_t num,
+ unsigned long *delay)
{
struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
unsigned num_allocated_pages;
@@ -152,14 +154,21 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
mutex_lock(&vb->balloon_lock);
for (vb->num_pfns = 0; vb->num_pfns < num;
vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
- struct page *page = balloon_page_enqueue(vb_dev_info);
+ struct page *page;
+
+ if (timer_pending(&vb->deflate_on_oom_timer)) {
+ /* Wait for hold off timer expiracy. */
+ *delay = HZ;
+ break;
+ }
+ page = balloon_page_enqueue(vb_dev_info);
if (!page) {
dev_info_ratelimited(&vb->vdev->dev,
"Out of puff! Can't get %u pages\n",
VIRTIO_BALLOON_PAGES_PER_PAGE);
/* Sleep for at least 1/5 of a second before retry. */
- msleep(200);
+ *delay = HZ / 5;
break;
}
set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
@@ -310,7 +319,8 @@ static void virtballoon_changed(struct virtio_device *vdev)
spin_lock_irqsave(&vb->stop_update_lock, flags);
if (!vb->stop_update)
- queue_work(system_freezable_wq, &vb->update_balloon_size_work);
+ queue_delayed_work(system_freezable_wq,
+ &vb->update_balloon_size_work, 0);
spin_unlock_irqrestore(&vb->stop_update_lock, flags);
}
@@ -366,9 +376,13 @@ static int virtballoon_oom_notify(struct notifier_block *self,
if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
return NOTIFY_OK;
+ /* Hold off fill_balloon() for 60 seconds. */
+ mod_timer(&vb->deflate_on_oom_timer, jiffies + 60 * HZ);
freed = parm;
num_freed_pages = leak_balloon(vb, oom_pages);
update_balloon_size(vb);
+ dev_info_ratelimited(&vb->vdev->dev, "Released %u pages. Remains %u pages.\n",
+ num_freed_pages, vb->num_pages);
*freed += num_freed_pages;
return NOTIFY_OK;
@@ -387,19 +401,21 @@ static void update_balloon_size_func(struct work_struct *work)
{
struct virtio_balloon *vb;
s64 diff;
+ unsigned long delay = 0;
- vb = container_of(work, struct virtio_balloon,
+ vb = container_of(to_delayed_work(work), struct virtio_balloon,
update_balloon_size_work);
diff = towards_target(vb);
if (diff > 0)
- diff -= fill_balloon(vb, diff);
+ diff -= fill_balloon(vb, diff, &delay);
else if (diff < 0)
diff += leak_balloon(vb, -diff);
update_balloon_size(vb);
if (diff)
- queue_work(system_freezable_wq, work);
+ queue_delayed_work(system_freezable_wq, to_delayed_work(work),
+ delay);
}
static int init_vqs(struct virtio_balloon *vb)
@@ -521,6 +537,10 @@ static struct dentry *balloon_mount(struct file_system_type *fs_type,
#endif /* CONFIG_BALLOON_COMPACTION */
+static void timer_expired(unsigned long unused)
+{
+}
+
static int virtballoon_probe(struct virtio_device *vdev)
{
struct virtio_balloon *vb;
@@ -539,7 +559,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
}
INIT_WORK(&vb->update_balloon_stats_work, update_balloon_stats_func);
- INIT_WORK(&vb->update_balloon_size_work, update_balloon_size_func);
+ INIT_DELAYED_WORK(&vb->update_balloon_size_work,
+ update_balloon_size_func);
spin_lock_init(&vb->stop_update_lock);
vb->stop_update = false;
vb->num_pages = 0;
@@ -553,6 +574,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
if (err)
goto out_free_vb;
+ setup_timer(&vb->deflate_on_oom_timer, timer_expired, 0);
vb->nb.notifier_call = virtballoon_oom_notify;
vb->nb.priority = VIRTBALLOON_OOM_NOTIFY_PRIORITY;
err = register_oom_notifier(&vb->nb);
@@ -564,6 +586,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
if (IS_ERR(balloon_mnt)) {
err = PTR_ERR(balloon_mnt);
unregister_oom_notifier(&vb->nb);
+ del_timer_sync(&vb->deflate_on_oom_timer);
goto out_del_vqs;
}
@@ -573,6 +596,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
err = PTR_ERR(vb->vb_dev_info.inode);
kern_unmount(balloon_mnt);
unregister_oom_notifier(&vb->nb);
+ del_timer_sync(&vb->deflate_on_oom_timer);
vb->vb_dev_info.inode = NULL;
goto out_del_vqs;
}
@@ -611,11 +635,12 @@ static void virtballoon_remove(struct virtio_device *vdev)
struct virtio_balloon *vb = vdev->priv;
unregister_oom_notifier(&vb->nb);
+ del_timer_sync(&vb->deflate_on_oom_timer);
spin_lock_irq(&vb->stop_update_lock);
vb->stop_update = true;
spin_unlock_irq(&vb->stop_update_lock);
- cancel_work_sync(&vb->update_balloon_size_work);
+ cancel_delayed_work_sync(&vb->update_balloon_size_work);
cancel_work_sync(&vb->update_balloon_stats_work);
remove_common(vb);
----------------------------------------
While response was better than now, inflating again spoiled the effort.
Retrying to inflate until allocation fails is already too painful.
Complete log is at http://I-love.SAKURA.ne.jp/tmp/20171018-deflate.log.xz .
----------------------------------------
[ 19.529096] kworker/0:2: page allocation failure: order:0, mode:0x14310ca(GFP_HIGHUSER_MOVABLE|__GFP_NORETRY|__GFP_NOMEMALLOC), nodemask=(null)
[ 19.530721] kworker/0:2 cpuset=/ mems_allowed=0
[ 19.531581] CPU: 0 PID: 111 Comm: kworker/0:2 Not tainted 4.14.0-rc5+ #302
[ 19.532397] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 19.533285] Workqueue: events_freezable update_balloon_size_func [virtio_balloon]
[ 19.534143] Call Trace:
[ 19.535015] dump_stack+0x63/0x87
[ 19.535844] warn_alloc+0x114/0x1c0
[ 19.536667] __alloc_pages_slowpath+0x9a6/0xba7
[ 19.537491] ? sched_clock_cpu+0x11/0xb0
[ 19.538311] __alloc_pages_nodemask+0x26a/0x290
[ 19.539188] alloc_pages_current+0x6a/0xb0
[ 19.540004] balloon_page_enqueue+0x25/0xf0
[ 19.540818] update_balloon_size_func+0xe1/0x260 [virtio_balloon]
[ 19.541626] process_one_work+0x149/0x360
[ 19.542417] worker_thread+0x4d/0x3c0
[ 19.543186] kthread+0x109/0x140
[ 19.543930] ? rescuer_thread+0x380/0x380
[ 19.544716] ? kthread_park+0x60/0x60
[ 19.545426] ret_from_fork+0x25/0x30
[ 19.546141] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[ 19.547903] virtio_balloon virtio3: Released 256 pages. Remains 1984834 pages.
[ 19.659660] virtio_balloon virtio3: Released 256 pages. Remains 1984578 pages.
[ 21.891392] virtio_balloon virtio3: Released 256 pages. Remains 1984322 pages.
[ 21.894719] virtio_balloon virtio3: Released 256 pages. Remains 1984066 pages.
[ 22.490131] virtio_balloon virtio3: Released 256 pages. Remains 1983810 pages.
[ 31.939666] virtio_balloon virtio3: Released 256 pages. Remains 1983554 pages.
[ 95.524753] kworker/0:2: page allocation failure: order:0, mode:0x14310ca(GFP_HIGHUSER_MOVABLE|__GFP_NORETRY|__GFP_NOMEMALLOC), nodemask=(null)
[ 95.525641] kworker/0:2 cpuset=/ mems_allowed=0
[ 95.526110] CPU: 0 PID: 111 Comm: kworker/0:2 Not tainted 4.14.0-rc5+ #302
[ 95.526552] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 95.527018] Workqueue: events_freezable update_balloon_size_func [virtio_balloon]
[ 95.527492] Call Trace:
[ 95.527969] dump_stack+0x63/0x87
[ 95.528469] warn_alloc+0x114/0x1c0
[ 95.528922] __alloc_pages_slowpath+0x9a6/0xba7
[ 95.529388] ? qxl_image_free_objects+0x56/0x60 [qxl]
[ 95.529849] ? qxl_draw_opaque_fb+0x102/0x3a0 [qxl]
[ 95.530315] __alloc_pages_nodemask+0x26a/0x290
[ 95.530777] alloc_pages_current+0x6a/0xb0
[ 95.531243] balloon_page_enqueue+0x25/0xf0
[ 95.531703] update_balloon_size_func+0xe1/0x260 [virtio_balloon]
[ 95.532180] process_one_work+0x149/0x360
[ 95.532645] worker_thread+0x4d/0x3c0
[ 95.533143] kthread+0x109/0x140
[ 95.533622] ? rescuer_thread+0x380/0x380
[ 95.534100] ? kthread_park+0x60/0x60
[ 95.534568] ret_from_fork+0x25/0x30
[ 95.535093] warn_alloc_show_mem: 1 callbacks suppressed
[ 95.535093] Mem-Info:
[ 95.536072] active_anon:11171 inactive_anon:2084 isolated_anon:0
[ 95.536072] active_file:8 inactive_file:70 isolated_file:0
[ 95.536072] unevictable:0 dirty:0 writeback:0 unstable:0
[ 95.536072] slab_reclaimable:3554 slab_unreclaimable:6848
[ 95.536072] mapped:588 shmem:2144 pagetables:749 bounce:0
[ 95.536072] free:25859 free_pcp:72 free_cma:0
[ 95.538922] Node 0 active_anon:44684kB inactive_anon:8336kB active_file:32kB inactive_file:280kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2352kB dirty:0kB writeback:0kB shmem:8576kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 10240kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 95.540516] Node 0 DMA free:15900kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 95.542325] lowmem_reserve[]: 0 2954 7925 7925 7925
[ 95.543020] Node 0 DMA32 free:44748kB min:25144kB low:31428kB high:37712kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129308kB managed:3063740kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 95.544949] lowmem_reserve[]: 0 0 4970 4970 4970
[ 95.545624] Node 0 Normal free:42788kB min:42304kB low:52880kB high:63456kB active_anon:44684kB inactive_anon:8336kB active_file:32kB inactive_file:108kB unevictable:0kB writepending:0kB present:5242880kB managed:5093540kB mlocked:0kB kernel_stack:1984kB pagetables:2996kB bounce:0kB free_pcp:372kB local_pcp:220kB free_cma:0kB
[ 95.547739] lowmem_reserve[]: 0 0 0 0 0
[ 95.548464] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15900kB
[ 95.549988] Node 0 DMA32: 3*4kB (UM) 4*8kB (UM) 2*16kB (U) 2*32kB (U) 1*64kB (U) 0*128kB 2*256kB (UM) 2*512kB (UM) 2*1024kB (UM) 2*2048kB (UM) 9*4096kB (M) = 44748kB
[ 95.551551] Node 0 Normal: 925*4kB (UME) 455*8kB (UME) 349*16kB (UME) 137*32kB (UME) 37*64kB (UME) 23*128kB (UME) 8*256kB (UME) 5*512kB (UM) 3*1024kB (UM) 6*2048kB (U) 0*4096kB = 42588kB
[ 95.553147] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 95.554030] 2224 total pagecache pages
[ 95.554874] 0 pages in swap cache
[ 95.555667] Swap cache stats: add 0, delete 0, find 0/0
[ 95.556469] Free swap = 0kB
[ 95.557280] Total swap = 0kB
[ 95.558079] 2097045 pages RAM
[ 95.558856] 0 pages HighMem/MovableOnly
[ 95.559652] 53748 pages reserved
[ 95.560444] 0 pages cma reserved
[ 95.561262] 0 pages hwpoisoned
[ 95.562086] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[ 95.565779] virtio_balloon virtio3: Released 256 pages. Remains 1984947 pages.
[ 96.265255] virtio_balloon virtio3: Released 256 pages. Remains 1984691 pages.
[ 105.498910] virtio_balloon virtio3: Released 256 pages. Remains 1984435 pages.
[ 105.500518] virtio_balloon virtio3: Released 256 pages. Remains 1984179 pages.
[ 105.520034] virtio_balloon virtio3: Released 256 pages. Remains 1983923 pages.
----------------------------------------
Michael S. Tsirkin wrote:
> I think that's the case. Question is, when can we inflate again?
I think that it is when the host explicitly asked again, for
VIRTIO_BALLOON_F_DEFLATE_ON_OOM path does not schedule for later inflation.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-10-18 10:59 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1507632457-4611-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp>
2017-10-10 11:47 ` Michal Hocko
2017-10-12 2:36 ` Wei Wang
2017-10-13 11:28 ` Tetsuo Handa
2017-10-13 13:19 ` Michael S. Tsirkin
2017-10-13 13:23 ` Michael S. Tsirkin
2017-10-13 16:41 ` Tetsuo Handa
2017-10-15 0:22 ` Michael S. Tsirkin
2017-10-15 5:38 ` Tetsuo Handa
2017-10-16 10:58 ` Tetsuo Handa
2017-10-16 17:01 ` Michael S. Tsirkin
2017-10-18 10:59 ` Tetsuo Handa [this message]
2017-10-18 17:16 ` Michael S. Tsirkin
2017-10-19 11:52 ` Tetsuo Handa
2017-10-19 13:00 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201710181959.ACI05296.JLMVQOOFtHSOFF@I-love.SAKURA.ne.jp \
--to=penguin-kernel@i-love.sakura.ne.jp \
--cc=den@openvz.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mst@redhat.com \
--cc=rmaksudova@parallels.com \
--cc=virtualization@lists.linux-foundation.org \
--cc=wei.w.wang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox