* [PATCH] mm: Throttle shrinkers harder
@ 2014-04-10 7:05 Chris Wilson
2014-04-18 19:14 ` Andrew Morton
0 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2014-04-10 7:05 UTC (permalink / raw)
To: linux-mm
Cc: Rik van Riel, intel-gfx, Dave Chinner, Hugh Dickins,
Michal Hocko, Mel Gorman, Johannes Weiner, Andrew Morton,
Glauber Costa
During testing of i915.ko with working texture sets larger than RAM, we
encounter OOM with plenty of memory still trapped within writeback, e.g:
[ 42.386039] active_anon:10134 inactive_anon:1900781 isolated_anon:32
active_file:33 inactive_file:39 isolated_file:0
unevictable:0 dirty:0 writeback:337627 unstable:0
free:11985 slab_reclaimable:9458 slab_unreclaimable:23614
mapped:41 shmem:1560769 pagetables:1276 bounce:0
If we throttle for writeback following shrink_slab, this gives us time
to wait upon the writeback generated by the i915.ko shinker:
[ 4756.750808] active_anon:24386 inactive_anon:900793 isolated_anon:0
active_file:23 inactive_file:20 isolated_file:0
unevictable:0 dirty:0 writeback:0 unstable:0
free:5550 slab_reclaimable:5184 slab_unreclaimable:4888
mapped:3 shmem:472393 pagetables:1249 bounce:0
(Sadly though the test is still failing.)
Testcase: igt/gem_tiled_swapping
References: https://bugs.freedesktop.org/show_bug.cgi?id=72742
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Glauber Costa <glommer@openvz.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: linux-mm@kvack.org
---
mm/vmscan.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a9c74b409681..8c2cb1150d17 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -135,6 +135,10 @@ unsigned long vm_total_pages; /* The total number of pages which the VM controls
static LIST_HEAD(shrinker_list);
static DECLARE_RWSEM(shrinker_rwsem);
+static bool throttle_direct_reclaim(gfp_t gfp_mask,
+ struct zonelist *zonelist,
+ nodemask_t *nodemask);
+
#ifdef CONFIG_MEMCG
static bool global_reclaim(struct scan_control *sc)
{
@@ -1521,7 +1525,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
* of pages under pages flagged for immediate reclaim and stall if any
* are encountered in the nr_immediate check below.
*/
- if (nr_writeback && nr_writeback == nr_taken)
+ if (nr_writeback > nr_taken / 2)
zone_set_flag(zone, ZONE_WRITEBACK);
/*
@@ -2465,6 +2469,12 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
WB_REASON_TRY_TO_FREE_PAGES);
sc->may_writepage = 1;
}
+
+ if (global_reclaim(sc) &&
+ throttle_direct_reclaim(sc->gfp_mask,
+ zonelist,
+ sc->nodemask))
+ aborted_reclaim = true;
} while (--sc->priority >= 0 && !aborted_reclaim);
out:
--
1.9.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: Throttle shrinkers harder
2014-04-10 7:05 [PATCH] mm: Throttle shrinkers harder Chris Wilson
@ 2014-04-18 19:14 ` Andrew Morton
2014-04-22 19:30 ` Daniel Vetter
0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2014-04-18 19:14 UTC (permalink / raw)
To: Chris Wilson
Cc: linux-mm, intel-gfx, Mel Gorman, Michal Hocko, Rik van Riel,
Johannes Weiner, Dave Chinner, Glauber Costa, Hugh Dickins,
David Rientjes
On Thu, 10 Apr 2014 08:05:06 +0100 Chris Wilson <chris@chris-wilson.co.uk> wrote:
> During testing of i915.ko with working texture sets larger than RAM, we
> encounter OOM with plenty of memory still trapped within writeback, e.g:
>
> [ 42.386039] active_anon:10134 inactive_anon:1900781 isolated_anon:32
> active_file:33 inactive_file:39 isolated_file:0
> unevictable:0 dirty:0 writeback:337627 unstable:0
> free:11985 slab_reclaimable:9458 slab_unreclaimable:23614
> mapped:41 shmem:1560769 pagetables:1276 bounce:0
>
> If we throttle for writeback following shrink_slab, this gives us time
> to wait upon the writeback generated by the i915.ko shinker:
>
> [ 4756.750808] active_anon:24386 inactive_anon:900793 isolated_anon:0
> active_file:23 inactive_file:20 isolated_file:0
> unevictable:0 dirty:0 writeback:0 unstable:0
> free:5550 slab_reclaimable:5184 slab_unreclaimable:4888
> mapped:3 shmem:472393 pagetables:1249 bounce:0
>
> (Sadly though the test is still failing.)
>
> Testcase: igt/gem_tiled_swapping
> References: https://bugs.freedesktop.org/show_bug.cgi?id=72742
i915_gem_object_get_pages_gtt() makes my head spin, but
https://bugs.freedesktop.org/attachment.cgi?id=90818 says
"gfp_mask=0x201da" which is
___GFP_HARDWALL|___GFP_COLD|___GFP_FS|___GFP_IO|___GFP_WAIT|___GFP_MOVABLE|___GFP_HIGHMEM
so this allocation should work and it very bad if the page allocator is
declaring oom while there is so much writeback in flight, assuming the
writeback is to eligible zones.
Mel, Johannes: could you take a look please?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: Throttle shrinkers harder
2014-04-18 19:14 ` Andrew Morton
@ 2014-04-22 19:30 ` Daniel Vetter
2014-04-23 21:14 ` Dave Hansen
0 siblings, 1 reply; 13+ messages in thread
From: Daniel Vetter @ 2014-04-22 19:30 UTC (permalink / raw)
To: Andrew Morton
Cc: Chris Wilson, linux-mm, intel-gfx, Mel Gorman, Michal Hocko,
Rik van Riel, Johannes Weiner, Dave Chinner, Glauber Costa,
Hugh Dickins, David Rientjes
On Fri, Apr 18, 2014 at 12:14:16PM -0700, Andrew Morton wrote:
> On Thu, 10 Apr 2014 08:05:06 +0100 Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> > During testing of i915.ko with working texture sets larger than RAM, we
> > encounter OOM with plenty of memory still trapped within writeback, e.g:
> >
> > [ 42.386039] active_anon:10134 inactive_anon:1900781 isolated_anon:32
> > active_file:33 inactive_file:39 isolated_file:0
> > unevictable:0 dirty:0 writeback:337627 unstable:0
> > free:11985 slab_reclaimable:9458 slab_unreclaimable:23614
> > mapped:41 shmem:1560769 pagetables:1276 bounce:0
> >
> > If we throttle for writeback following shrink_slab, this gives us time
> > to wait upon the writeback generated by the i915.ko shinker:
> >
> > [ 4756.750808] active_anon:24386 inactive_anon:900793 isolated_anon:0
> > active_file:23 inactive_file:20 isolated_file:0
> > unevictable:0 dirty:0 writeback:0 unstable:0
> > free:5550 slab_reclaimable:5184 slab_unreclaimable:4888
> > mapped:3 shmem:472393 pagetables:1249 bounce:0
> >
> > (Sadly though the test is still failing.)
> >
> > Testcase: igt/gem_tiled_swapping
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=72742
>
> i915_gem_object_get_pages_gtt() makes my head spin, but
> https://bugs.freedesktop.org/attachment.cgi?id=90818 says
> "gfp_mask=0x201da" which is
>
> ___GFP_HARDWALL|___GFP_COLD|___GFP_FS|___GFP_IO|___GFP_WAIT|___GFP_MOVABLE|___GFP_HIGHMEM
>
> so this allocation should work and it very bad if the page allocator is
> declaring oom while there is so much writeback in flight, assuming the
> writeback is to eligible zones.
For more head spinning look at the lock stealing dance we do in our
shrinker callbacks i915_gem_inactive_scan|count(). It's not pretty at all,
but it helps to avoids the dreaded oom in a few more cases. Some review of
our mess of ducttape from -mm developers with actual clue would be really
appreciated ...
-Daniel
> Mel, Johannes: could you take a look please?
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: Throttle shrinkers harder
2014-04-22 19:30 ` Daniel Vetter
@ 2014-04-23 21:14 ` Dave Hansen
2014-04-24 5:58 ` Chris Wilson
0 siblings, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2014-04-23 21:14 UTC (permalink / raw)
To: Andrew Morton, Chris Wilson, linux-mm, intel-gfx, Mel Gorman,
Michal Hocko, Rik van Riel, Johannes Weiner, Dave Chinner,
Glauber Costa, Hugh Dickins, David Rientjes
On 04/22/2014 12:30 PM, Daniel Vetter wrote:
>> > > During testing of i915.ko with working texture sets larger than RAM, we
>> > > encounter OOM with plenty of memory still trapped within writeback, e.g:
>> > >
>> > > [ 42.386039] active_anon:10134 inactive_anon:1900781 isolated_anon:32
>> > > active_file:33 inactive_file:39 isolated_file:0
>> > > unevictable:0 dirty:0 writeback:337627 unstable:0
>> > > free:11985 slab_reclaimable:9458 slab_unreclaimable:23614
>> > > mapped:41 shmem:1560769 pagetables:1276 bounce:0
>> > >
>> > > If we throttle for writeback following shrink_slab, this gives us time
>> > > to wait upon the writeback generated by the i915.ko shinker:
>> > >
>> > > [ 4756.750808] active_anon:24386 inactive_anon:900793 isolated_anon:0
>> > > active_file:23 inactive_file:20 isolated_file:0
>> > > unevictable:0 dirty:0 writeback:0 unstable:0
>> > > free:5550 slab_reclaimable:5184 slab_unreclaimable:4888
>> > > mapped:3 shmem:472393 pagetables:1249 bounce:0
Could you get some dumps of the entire set of OOM information? These
are only tiny snippets.
Also, the vmstat output from the bug:
> https://bugs.freedesktop.org/show_bug.cgi?id=72742
shows there being an *AWFUL* lot of swap I/O going on here. From the
looks of it, we stuck ~2GB in swap and evicted another 1.5GB of page
cache (although I guess that could be double-counting tmpfs getting
swapped out too). Hmmm, was this one of the cases where you actually
ran _out_ of swap?
> 2 0 19472 33952 296 3610324 0 19472 0 19472 1474 151 3 27 71 0
> 4 0 484964 66468 296 3175864 0 465492 0 465516 2597 1395 0 32 66 2
> 0 2 751940 23692 980 3022884 0 266976 688 266976 3681 636 0 27 66 6
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 2 1 1244580 295336 988 2606984 0 492896 0 492908 1237 311 1 9 50 41
> 0 2 2047996 28760 988 2037144 0 803160 0 803160 1221 1291 1 15 69 14
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: Throttle shrinkers harder
2014-04-23 21:14 ` Dave Hansen
@ 2014-04-24 5:58 ` Chris Wilson
2014-04-24 15:21 ` Dave Hansen
0 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2014-04-24 5:58 UTC (permalink / raw)
To: Dave Hansen
Cc: Andrew Morton, linux-mm, intel-gfx, Mel Gorman, Michal Hocko,
Rik van Riel, Johannes Weiner, Dave Chinner, Glauber Costa,
Hugh Dickins, David Rientjes
On Wed, Apr 23, 2014 at 02:14:36PM -0700, Dave Hansen wrote:
> On 04/22/2014 12:30 PM, Daniel Vetter wrote:
> >> > > During testing of i915.ko with working texture sets larger than RAM, we
> >> > > encounter OOM with plenty of memory still trapped within writeback, e.g:
> >> > >
> >> > > [ 42.386039] active_anon:10134 inactive_anon:1900781 isolated_anon:32
> >> > > active_file:33 inactive_file:39 isolated_file:0
> >> > > unevictable:0 dirty:0 writeback:337627 unstable:0
> >> > > free:11985 slab_reclaimable:9458 slab_unreclaimable:23614
> >> > > mapped:41 shmem:1560769 pagetables:1276 bounce:0
> >> > >
> >> > > If we throttle for writeback following shrink_slab, this gives us time
> >> > > to wait upon the writeback generated by the i915.ko shinker:
> >> > >
> >> > > [ 4756.750808] active_anon:24386 inactive_anon:900793 isolated_anon:0
> >> > > active_file:23 inactive_file:20 isolated_file:0
> >> > > unevictable:0 dirty:0 writeback:0 unstable:0
> >> > > free:5550 slab_reclaimable:5184 slab_unreclaimable:4888
> >> > > mapped:3 shmem:472393 pagetables:1249 bounce:0
>
> Could you get some dumps of the entire set of OOM information? These
> are only tiny snippets.
For reference the last oom report after flushing all the writeback:
[ 4756.749554] crond invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
[ 4756.749603] crond cpuset=/ mems_allowed=0
[ 4756.749628] CPU: 0 PID: 3574 Comm: crond Tainted: G W 3.14.0_prts_de579f_20140410 #2
[ 4756.749676] Hardware name: Gigabyte Technology Co., Ltd. H55M-UD2H/H55M-UD2H, BIOS F4 12/02/2009
[ 4756.749723] 0000000000000000 00000000000201da ffffffff81717273 ffff8800d235dc40
[ 4756.749762] ffffffff81714541 0000000000000400 ffff8800cb6f3b10 ffff880117ff8000
[ 4756.749800] ffffffff81072266 0000000000000206 ffffffff812d6ebe ffff880112f25c40
[ 4756.749838] Call Trace:
[ 4756.749856] [<ffffffff81717273>] ? dump_stack+0x41/0x51
[ 4756.749881] [<ffffffff81714541>] ? dump_header.isra.8+0x69/0x191
[ 4756.749911] [<ffffffff81072266>] ? ktime_get_ts+0x49/0xab
[ 4756.749938] [<ffffffff812d6ebe>] ? ___ratelimit+0xae/0xc8
[ 4756.749965] [<ffffffff810a72a8>] ? oom_kill_process+0x76/0x32c
[ 4756.749992] [<ffffffff810a706d>] ? find_lock_task_mm+0x22/0x6e
[ 4756.750018] [<ffffffff810a7add>] ? out_of_memory+0x41c/0x44f
[ 4756.750045] [<ffffffff810ab31d>] ? __alloc_pages_nodemask+0x680/0x78d
[ 4756.750076] [<ffffffff810d4b7f>] ? alloc_pages_current+0xbf/0xdc
[ 4756.750103] [<ffffffff810a61f8>] ? filemap_fault+0x266/0x38b
[ 4756.750130] [<ffffffff810bc3f5>] ? __do_fault+0xac/0x3bf
[ 4756.750155] [<ffffffff810bfb85>] ? handle_mm_fault+0x1e7/0x7e2
[ 4756.750181] [<ffffffff810bc960>] ? tlb_flush_mmu+0x4b/0x64
[ 4756.750219] [<ffffffff812d8ed5>] ? timerqueue_add+0x79/0x98
[ 4756.750254] [<ffffffff8104d283>] ? enqueue_hrtimer+0x15/0x37
[ 4756.750287] [<ffffffff8171f63d>] ? __do_page_fault+0x42e/0x47b
[ 4756.750319] [<ffffffff8104d580>] ? hrtimer_try_to_cancel+0x67/0x70
[ 4756.750353] [<ffffffff8104d595>] ? hrtimer_cancel+0xc/0x16
[ 4756.750385] [<ffffffff81719807>] ? do_nanosleep+0xb3/0xf1
[ 4756.750415] [<ffffffff8104def9>] ? hrtimer_nanosleep+0x89/0x10b
[ 4756.750447] [<ffffffff8171cbf2>] ? page_fault+0x22/0x30
[ 4756.750476] Mem-Info:
[ 4756.750490] Node 0 DMA per-cpu:
[ 4756.750510] CPU 0: hi: 0, btch: 1 usd: 0
[ 4756.750533] CPU 1: hi: 0, btch: 1 usd: 0
[ 4756.750555] CPU 2: hi: 0, btch: 1 usd: 0
[ 4756.750576] CPU 3: hi: 0, btch: 1 usd: 0
[ 4756.750598] Node 0 DMA32 per-cpu:
[ 4756.750615] CPU 0: hi: 186, btch: 31 usd: 0
[ 4756.750637] CPU 1: hi: 186, btch: 31 usd: 0
[ 4756.750660] CPU 2: hi: 186, btch: 31 usd: 0
[ 4756.750681] CPU 3: hi: 186, btch: 31 usd: 0
[ 4756.750702] Node 0 Normal per-cpu:
[ 4756.750720] CPU 0: hi: 90, btch: 15 usd: 0
[ 4756.750742] CPU 1: hi: 90, btch: 15 usd: 0
[ 4756.750763] CPU 2: hi: 90, btch: 15 usd: 0
[ 4756.750785] CPU 3: hi: 90, btch: 15 usd: 0
[ 4756.750808] active_anon:24386 inactive_anon:900793 isolated_anon:0
active_file:23 inactive_file:20 isolated_file:0
unevictable:0 dirty:0 writeback:0 unstable:0
free:5550 slab_reclaimable:5184 slab_unreclaimable:4888
mapped:3 shmem:472393 pagetables:1249 bounce:0
free_cma:0
[ 4756.750938] Node 0 DMA free:14664kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:1024kB active_file:0kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:412kB slab_reclaimable:80kB slab_unreclaimable:24kB kernel_stack:0kB pagetables:48kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:76 all_unreclaimable? yes
[ 4756.751103] lowmem_reserve[]: 0 3337 3660 3660
[ 4756.751133] Node 0 DMA32 free:7208kB min:7044kB low:8804kB high:10564kB active_anon:36172kB inactive_anon:3351408kB active_file:92kB inactive_file:72kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3518336kB managed:3440548kB mlocked:0kB dirty:0kB writeback:0kB mapped:12kB shmem:1661420kB slab_reclaimable:17624kB slab_unreclaimable:14400kB kernel_stack:696kB pagetables:4324kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:327 all_unreclaimable? yes
[ 4756.751341] lowmem_reserve[]: 0 0 322 322
[ 4756.752889] Node 0 Normal free:328kB min:680kB low:848kB high:1020kB active_anon:61372kB inactive_anon:250740kB active_file:0kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:393216kB managed:330360kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:227740kB slab_reclaimable:3032kB slab_unreclaimable:5128kB kernel_stack:400kB pagetables:624kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:6 all_unreclaimable? yes
[ 4756.757635] lowmem_reserve[]: 0 0 0 0
[ 4756.759294] Node 0 DMA: 2*4kB (UM) 2*8kB (UM) 3*16kB (UEM) 4*32kB (UEM) 2*64kB (UM) 4*128kB (UEM) 2*256kB (EM) 2*512kB (EM) 2*1024kB (UM) 3*2048kB (EMR) 1*4096kB (M) = 14664kB
[ 4756.762776] Node 0 DMA32: 424*4kB (UEM) 171*8kB (UEM) 21*16kB (UEM) 1*32kB (R) 1*64kB (R) 1*128kB (R) 0*256kB 1*512kB (R) 1*1024kB (R) 1*2048kB (R) 0*4096kB = 7208kB
[ 4756.766284] Node 0 Normal: 26*4kB (UER) 18*8kB (UER) 3*16kB (E) 1*32kB (R) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 328kB
[ 4756.768198] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 4756.770026] 916139 total pagecache pages
[ 4756.771857] 443703 pages in swap cache
[ 4756.773695] Swap cache stats: add 15363874, delete 14920171, find 6533699/7512215
[ 4756.775592] Free swap = 0kB
[ 4756.777505] Total swap = 2047996kB
[ 4756.779410] 981886 pages RAM
[ 4756.781307] 0 pages HighMem/MovableOnly
[ 4756.783192] 15714 pages reserved
[ 4756.785038] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[ 4756.786929] [ 2368] 0 2368 76187 93 153 0 0 systemd-journal
[ 4756.788846] [ 3204] 0 3204 10305 189 23 7 -1000 systemd-udevd
[ 4756.790789] [ 3223] 0 3223 24466 24 22 21 0 lvmetad
[ 4756.792749] [ 3297] 0 3297 12231 68 24 25 -1000 auditd
[ 4756.794715] [ 3306] 0 3306 20053 33 9 0 0 audispd
[ 4756.796680] [ 3308] 0 3308 5993 40 27 1 0 sedispatch
[ 4756.798654] [ 3315] 0 3315 9254 60 36 5 0 abrtd
[ 4756.800646] [ 3316] 0 3316 8725 60 35 1 0 abrt-watch-log
[ 4756.802627] [ 3324] 0 3324 8725 42 35 17 0 abrt-watch-log
[ 4756.804614] [ 3331] 0 3331 4778 43 21 9 0 irqbalance
[ 4756.806604] [ 3337] 0 3337 6069 131 16 20 0 smartd
[ 4756.808597] [ 3343] 0 3343 8249 80 20 0 0 systemd-logind
[ 4756.810593] [ 3344] 0 3344 65772 129 29 14 0 rsyslogd
[ 4756.812594] [ 3346] 0 3346 60608 305 50 9 0 NetworkManager
[ 4756.814602] [ 3347] 70 3347 7018 76 31 1 0 avahi-daemon
[ 4756.816619] [ 3352] 70 3352 6985 48 22 2 0 avahi-daemon
[ 4756.818629] [ 3353] 81 3353 6119 121 17 3 -900 dbus-daemon
[ 4756.820651] [ 3362] 993 3362 5647 55 15 4 0 chronyd
[ 4756.822694] [ 3363] 0 3363 1619 12 10 16 0 mcelog
[ 4756.824760] [ 3389] 999 3389 127896 746 47 65 0 polkitd
[ 4756.826827] [ 3397] 0 3397 40407 161 67 26 -900 modem-manager
[ 4756.828939] [ 3424] 0 3424 25498 2827 49 287 0 dhclient
[ 4756.831039] [ 3432] 0 3432 106838 1061 138 128 0 libvirtd
[ 4756.833154] [ 3446] 0 3446 20104 190 43 10 -1000 sshd
[ 4756.835275] [ 3453] 32 3453 9422 66 22 26 0 rpcbind
[ 4756.837308] [ 3461] 0 3461 25190 399 48 49 0 sendmail
[ 4756.839335] [ 3478] 51 3478 21452 361 41 16 0 sendmail
[ 4756.841386] [ 3573] 0 3573 5930 47 16 0 0 atd
[ 4756.843458] [ 3574] 0 3574 5126 147 14 2 0 crond
[ 4756.845578] [ 3579] 0 3579 27498 27 10 1 0 agetty
[ 4756.847706] [ 3582] 0 3582 32256 220 65 63 0 sshd
[ 4756.849844] [ 3586] 0 3586 29263 456 20 51 0 bash
[ 4756.851997] [ 3765] 0 3765 15968 93 47 0 1000 gem_tiled_swapp
[ 4756.854178] Out of memory: Kill process 3765 (gem_tiled_swapp) score 999 or sacrifice child
[ 4756.856377] Killed process 3765 (gem_tiled_swapp) total-vm:63872kB, anon-rss:368kB, file-rss:4kB
> Also, the vmstat output from the bug:
>
> > https://bugs.freedesktop.org/show_bug.cgi?id=72742
>
> shows there being an *AWFUL* lot of swap I/O going on here. From the
> looks of it, we stuck ~2GB in swap and evicted another 1.5GB of page
> cache (although I guess that could be double-counting tmpfs getting
> swapped out too). Hmmm, was this one of the cases where you actually
> ran _out_ of swap?
Yes. This bug is a little odd because they always run out of swap. We
have another category of bug (which appears to be fixed, touch wood)
where we trigger oom without even touching swap. The test case is
designed to only just swap (use at most 1/4 of the available swap space)
and checks that its working set should fit into available memory + swap.
However, when QA run the test, their systems run completely out of
virtual memory. There is a discrepancy on their machines where
anon_inactive is reported as being 2x shmem, but we only expect
anon_inactive to be our own shmem allocations. I don't know how to track
what else is using anon_inactive. Suggestions?
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: Throttle shrinkers harder
2014-04-24 5:58 ` Chris Wilson
@ 2014-04-24 15:21 ` Dave Hansen
2014-04-24 15:39 ` Chris Wilson
0 siblings, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2014-04-24 15:21 UTC (permalink / raw)
To: Chris Wilson, Andrew Morton, linux-mm, intel-gfx, Mel Gorman,
Michal Hocko, Rik van Riel, Johannes Weiner, Dave Chinner,
Glauber Costa, Hugh Dickins, David Rientjes
On 04/23/2014 10:58 PM, Chris Wilson wrote:
> [ 4756.750938] Node 0 DMA free:14664kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:1024kB active_file:0kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:412kB slab_reclaimable:80kB slab_unreclaimable:24kB kernel_stack:0kB pagetables:48kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:76 all_unreclaimable? yes
> [ 4756.751103] lowmem_reserve[]: 0 3337 3660 3660
> [ 4756.751133] Node 0 DMA32 free:7208kB min:7044kB low:8804kB high:10564kB active_anon:36172kB inactive_anon:3351408kB active_file:92kB inactive_file:72kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3518336kB managed:3440548kB mlocked:0kB dirty:0kB writeback:0kB mapped:12kB shmem:1661420kB slab_reclaimable:17624kB slab_unreclaimable:14400kB kernel_stack:696kB pagetables:4324kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:327 all_unreclaimable? yes
> [ 4756.751341] lowmem_reserve[]: 0 0 322 322
> [ 4756.752889] Node 0 Normal free:328kB min:680kB low:848kB high:1020kB active_anon:61372kB inactive_anon:250740kB active_file:0kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:393216kB managed:330360kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:227740kB slab_reclaimable:3032kB slab_unreclaimable:5128kB kernel_stack:400kB pagetables:624kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:6 all_unreclaimable? yes
> [ 4756.757635] lowmem_reserve[]: 0 0 0 0
> [ 4756.759294] Node 0 DMA: 2*4kB (UM) 2*8kB (UM) 3*16kB (UEM) 4*32kB (UEM) 2*64kB (UM) 4*128kB (UEM) 2*256kB (EM) 2*512kB (EM) 2*1024kB (UM) 3*2048kB (EMR) 1*4096kB (M) = 14664kB
> [ 4756.762776] Node 0 DMA32: 424*4kB (UEM) 171*8kB (UEM) 21*16kB (UEM) 1*32kB (R) 1*64kB (R) 1*128kB (R) 0*256kB 1*512kB (R) 1*1024kB (R) 1*2048kB (R) 0*4096kB = 7208kB
> [ 4756.766284] Node 0 Normal: 26*4kB (UER) 18*8kB (UER) 3*16kB (E) 1*32kB (R) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 328kB
> [ 4756.768198] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> [ 4756.770026] 916139 total pagecache pages
> [ 4756.771857] 443703 pages in swap cache
> [ 4756.773695] Swap cache stats: add 15363874, delete 14920171, find 6533699/7512215
> [ 4756.775592] Free swap = 0kB
> [ 4756.777505] Total swap = 2047996kB
OK, so here's my theory as to what happens:
1. The graphics pages got put on the LRU
2. System is low on memory, they get on (and *STAY* on) the inactive
LRU.
3. VM adds graphics pages to the swap cache, and writes them out, and
we see the writeout from the vmstat, and lots of adds/removes from
the swap cache.
4. But, despite all the swap writeout, we don't get helped by seeing
much memory get freed. Why?
I _suspect_ that the graphics drivers here are holding a reference to
the page. During reclaim, we're mostly concerned with the pages being
mapped. If we manage to get them unmapped, we'll go ahead and swap
them, which I _think_ is what we're seeing. But, when it comes time to
_actually_ free them, that last reference on the page keeps them from
being freed.
Is it possible that there's still a get_page() reference that's holding
those pages in place from the graphics code?
>> Also, the vmstat output from the bug:
>>
>>> https://bugs.freedesktop.org/show_bug.cgi?id=72742
>>
>> shows there being an *AWFUL* lot of swap I/O going on here. From the
>> looks of it, we stuck ~2GB in swap and evicted another 1.5GB of page
>> cache (although I guess that could be double-counting tmpfs getting
>> swapped out too). Hmmm, was this one of the cases where you actually
>> ran _out_ of swap?
>
> Yes. This bug is a little odd because they always run out of swap. We
> have another category of bug (which appears to be fixed, touch wood)
> where we trigger oom without even touching swap. The test case is
> designed to only just swap (use at most 1/4 of the available swap space)
> and checks that its working set should fit into available memory + swap.
> However, when QA run the test, their systems run completely out of
> virtual memory. There is a discrepancy on their machines where
> anon_inactive is reported as being 2x shmem, but we only expect
> anon_inactive to be our own shmem allocations. I don't know how to track
> what else is using anon_inactive. Suggestions?
Let's tackle one bug at a time. They might be the same thing.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: Throttle shrinkers harder
2014-04-24 15:21 ` Dave Hansen
@ 2014-04-24 15:39 ` Chris Wilson
2014-04-24 22:35 ` Dave Hansen
0 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2014-04-24 15:39 UTC (permalink / raw)
To: Dave Hansen
Cc: Andrew Morton, linux-mm, intel-gfx, Mel Gorman, Michal Hocko,
Rik van Riel, Johannes Weiner, Dave Chinner, Glauber Costa,
Hugh Dickins, David Rientjes
On Thu, Apr 24, 2014 at 08:21:58AM -0700, Dave Hansen wrote:
> On 04/23/2014 10:58 PM, Chris Wilson wrote:
> > [ 4756.750938] Node 0 DMA free:14664kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:1024kB active_file:0kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:412kB slab_reclaimable:80kB slab_unreclaimable:24kB kernel_stack:0kB pagetables:48kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:76 all_unreclaimable? yes
> > [ 4756.751103] lowmem_reserve[]: 0 3337 3660 3660
> > [ 4756.751133] Node 0 DMA32 free:7208kB min:7044kB low:8804kB high:10564kB active_anon:36172kB inactive_anon:3351408kB active_file:92kB inactive_file:72kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3518336kB managed:3440548kB mlocked:0kB dirty:0kB writeback:0kB mapped:12kB shmem:1661420kB slab_reclaimable:17624kB slab_unreclaimable:14400kB kernel_stack:696kB pagetables:4324kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:327 all_unreclaimable? yes
> > [ 4756.751341] lowmem_reserve[]: 0 0 322 322
> > [ 4756.752889] Node 0 Normal free:328kB min:680kB low:848kB high:1020kB active_anon:61372kB inactive_anon:250740kB active_file:0kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:393216kB managed:330360kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:227740kB slab_reclaimable:3032kB slab_unreclaimable:5128kB kernel_stack:400kB pagetables:624kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:6 all_unreclaimable? yes
> > [ 4756.757635] lowmem_reserve[]: 0 0 0 0
> > [ 4756.759294] Node 0 DMA: 2*4kB (UM) 2*8kB (UM) 3*16kB (UEM) 4*32kB (UEM) 2*64kB (UM) 4*128kB (UEM) 2*256kB (EM) 2*512kB (EM) 2*1024kB (UM) 3*2048kB (EMR) 1*4096kB (M) = 14664kB
> > [ 4756.762776] Node 0 DMA32: 424*4kB (UEM) 171*8kB (UEM) 21*16kB (UEM) 1*32kB (R) 1*64kB (R) 1*128kB (R) 0*256kB 1*512kB (R) 1*1024kB (R) 1*2048kB (R) 0*4096kB = 7208kB
> > [ 4756.766284] Node 0 Normal: 26*4kB (UER) 18*8kB (UER) 3*16kB (E) 1*32kB (R) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 328kB
> > [ 4756.768198] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> > [ 4756.770026] 916139 total pagecache pages
> > [ 4756.771857] 443703 pages in swap cache
> > [ 4756.773695] Swap cache stats: add 15363874, delete 14920171, find 6533699/7512215
> > [ 4756.775592] Free swap = 0kB
> > [ 4756.777505] Total swap = 2047996kB
>
> OK, so here's my theory as to what happens:
>
> 1. The graphics pages got put on the LRU
> 2. System is low on memory, they get on (and *STAY* on) the inactive
> LRU.
> 3. VM adds graphics pages to the swap cache, and writes them out, and
> we see the writeout from the vmstat, and lots of adds/removes from
> the swap cache.
> 4. But, despite all the swap writeout, we don't get helped by seeing
> much memory get freed. Why?
>
> I _suspect_ that the graphics drivers here are holding a reference to
> the page. During reclaim, we're mostly concerned with the pages being
> mapped. If we manage to get them unmapped, we'll go ahead and swap
> them, which I _think_ is what we're seeing. But, when it comes time to
> _actually_ free them, that last reference on the page keeps them from
> being freed.
>
> Is it possible that there's still a get_page() reference that's holding
> those pages in place from the graphics code?
Not from i915.ko. The last resort of our shrinker is to drop all page
refs held by the GPU, which is invoked if we are asked to free memory
and we have no inactive objects left.
If we could get a callback for the oom report, I could dump some details
about what the GPU is holding onto. That seems like a useful extension to
add to the shrinkers.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: Throttle shrinkers harder
2014-04-24 15:39 ` Chris Wilson
@ 2014-04-24 22:35 ` Dave Hansen
2014-04-25 7:23 ` Chris Wilson
0 siblings, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2014-04-24 22:35 UTC (permalink / raw)
To: Chris Wilson, Andrew Morton, linux-mm, intel-gfx, Mel Gorman,
Michal Hocko, Rik van Riel, Johannes Weiner, Dave Chinner,
Glauber Costa, Hugh Dickins, David Rientjes
On 04/24/2014 08:39 AM, Chris Wilson wrote:
> On Thu, Apr 24, 2014 at 08:21:58AM -0700, Dave Hansen wrote:
>> Is it possible that there's still a get_page() reference that's holding
>> those pages in place from the graphics code?
>
> Not from i915.ko. The last resort of our shrinker is to drop all page
> refs held by the GPU, which is invoked if we are asked to free memory
> and we have no inactive objects left.
How sure are we that this was performed before the OOM?
Also, forgive me for being an idiot wrt the way graphics work, but are
there any good candidates that you can think of that could be holding a
reference? I've honestly never seen an OOM like this.
Somewhat rhetorical question for the mm folks on cc: should we be
sticking the pages on which you're holding a reference on our
unreclaimable list?
> If we could get a callback for the oom report, I could dump some details
> about what the GPU is holding onto. That seems like a useful extension to
> add to the shrinkers.
There's a register_oom_notifier(). Is that sufficient for your use, or
is there something additional that would help?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: Throttle shrinkers harder
2014-04-24 22:35 ` Dave Hansen
@ 2014-04-25 7:23 ` Chris Wilson
2014-04-25 17:18 ` Dave Hansen
0 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2014-04-25 7:23 UTC (permalink / raw)
To: Dave Hansen
Cc: Andrew Morton, linux-mm, intel-gfx, Mel Gorman, Michal Hocko,
Rik van Riel, Johannes Weiner, Dave Chinner, Glauber Costa,
Hugh Dickins, David Rientjes
On Thu, Apr 24, 2014 at 03:35:47PM -0700, Dave Hansen wrote:
> On 04/24/2014 08:39 AM, Chris Wilson wrote:
> > On Thu, Apr 24, 2014 at 08:21:58AM -0700, Dave Hansen wrote:
> >> Is it possible that there's still a get_page() reference that's holding
> >> those pages in place from the graphics code?
> >
> > Not from i915.ko. The last resort of our shrinker is to drop all page
> > refs held by the GPU, which is invoked if we are asked to free memory
> > and we have no inactive objects left.
>
> How sure are we that this was performed before the OOM?
Only by virtue of how shrink_slabs() works. Thanks for the pointer to
register_oom_notifier(), I can use that to make sure that we do purge
everything from the GPU, and do a sanity check at the same time, before
we start killing processes.
> Also, forgive me for being an idiot wrt the way graphics work, but are
> there any good candidates that you can think of that could be holding a
> reference? I've honestly never seen an OOM like this.
Here the only place that we take a page reference is in
i915_gem_object_get_pages(). We do this when we first bind the pages
into the GPU's translation table, but we only release the pages once the
object is destroyed or the system experiences memory pressure. (Once the
GPU touches the pages, we no longer consider them to be cache coherent
with the CPU and so migrating them between the GPU and CPU requires
clflushing, which is expensive.)
Aside from CPU mmaps of the shmemfs filp, all operations on our
graphical objects should lead to i915_gem_object_get_pages(). However
not all objects are recoverable as some may be pinned due to hardware
access.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: Throttle shrinkers harder
2014-04-25 7:23 ` Chris Wilson
@ 2014-04-25 17:18 ` Dave Hansen
2014-04-25 17:56 ` Dave Hansen
2014-04-26 13:10 ` Chris Wilson
0 siblings, 2 replies; 13+ messages in thread
From: Dave Hansen @ 2014-04-25 17:18 UTC (permalink / raw)
To: Chris Wilson, Andrew Morton, linux-mm, intel-gfx, Mel Gorman,
Michal Hocko, Rik van Riel, Johannes Weiner, Dave Chinner,
Glauber Costa, Hugh Dickins, David Rientjes
On 04/25/2014 12:23 AM, Chris Wilson wrote:
> On Thu, Apr 24, 2014 at 03:35:47PM -0700, Dave Hansen wrote:
>> On 04/24/2014 08:39 AM, Chris Wilson wrote:
>>> On Thu, Apr 24, 2014 at 08:21:58AM -0700, Dave Hansen wrote:
>>>> Is it possible that there's still a get_page() reference that's holding
>>>> those pages in place from the graphics code?
>>>
>>> Not from i915.ko. The last resort of our shrinker is to drop all page
>>> refs held by the GPU, which is invoked if we are asked to free memory
>>> and we have no inactive objects left.
>>
>> How sure are we that this was performed before the OOM?
>
> Only by virtue of how shrink_slabs() works.
Could we try to raise the level of assurance there, please? :)
So this "last resort" is i915_gem_shrink_all()? It seems like we might
have some problems getting down to that part of the code if we have
problems getting the mutex.
We have tracepoints for the shrinkers in here (it says slab, but it's
all the shrinkers, I checked):
/sys/kernel/debug/tracing/events/vmscan/mm_shrink_slab_*/enable
and another for OOMs:
/sys/kernel/debug/tracing/events/oom/enable
Could you collect a trace during one of these OOM events and see what
the i915 shrinker is doing? Just enable those two and then collect a
copy of:
/sys/kernel/debug/tracing/trace
That'll give us some insight about how well the shrinker is working. If
the VM gave up on calling in to it, it might reveal why we didn't get
all the way down in to i915_gem_shrink_all().
> Thanks for the pointer to
> register_oom_notifier(), I can use that to make sure that we do purge
> everything from the GPU, and do a sanity check at the same time, before
> we start killing processes.
Actually, that one doesn't get called until we're *SURE* we are going to
OOM. Any action taken in there won't be taken in to account.
>> Also, forgive me for being an idiot wrt the way graphics work, but are
>> there any good candidates that you can think of that could be holding a
>> reference? I've honestly never seen an OOM like this.
>
> Here the only place that we take a page reference is in
> i915_gem_object_get_pages(). We do this when we first bind the pages
> into the GPU's translation table, but we only release the pages once the
> object is destroyed or the system experiences memory pressure. (Once the
> GPU touches the pages, we no longer consider them to be cache coherent
> with the CPU and so migrating them between the GPU and CPU requires
> clflushing, which is expensive.)
>
> Aside from CPU mmaps of the shmemfs filp, all operations on our
> graphical objects should lead to i915_gem_object_get_pages(). However
> not all objects are recoverable as some may be pinned due to hardware
> access.
In that oom callback, could you dump out the aggregate number of
obj->pages_pin_count across all the objects? That would be a very
interesting piece of information to have. It would also be very
insightful for folks who see OOMs in practice with i915 in their systems.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: Throttle shrinkers harder
2014-04-25 17:18 ` Dave Hansen
@ 2014-04-25 17:56 ` Dave Hansen
2014-04-26 13:10 ` Chris Wilson
1 sibling, 0 replies; 13+ messages in thread
From: Dave Hansen @ 2014-04-25 17:56 UTC (permalink / raw)
To: Chris Wilson, Andrew Morton, linux-mm, intel-gfx, Mel Gorman,
Michal Hocko, Rik van Riel, Johannes Weiner, Dave Chinner,
Glauber Costa, Hugh Dickins, David Rientjes
Poking around with those tracepoints, I don't see the i915 shrinker
getting run, only i915_gem_inactive_count() being called. It must be
returning 0 because we're never even _getting_ to the tracepoints
themselves after calling i915_gem_inactive_count().
This is on my laptop, and I haven't been able to coax i915 in to
reclaiming a single page in 10 or 15 minutes. That seems fishy to me.
Surely *SOMETHING* has become reclaimable in that time.
Here's /sys/kernel/debug/dri/0/i915_gem_objects:
> 919 objects, 354914304 bytes
> 874 [333] objects, 291004416 [93614080] bytes in gtt
> 0 [0] active objects, 0 [0] bytes
> 874 [333] inactive objects, 291004416 [93614080] bytes
> 0 unbound objects, 0 bytes
> 199 purgeable objects, 92844032 bytes
> 30 pinned mappable objects, 18989056 bytes
> 139 fault mappable objects, 17371136 bytes
> 2145386496 [268435456] gtt total
>
> Xorg: 632 objects, 235450368 bytes (0 active, 180899840 inactive, 21262336 unbound)
> gnome-control-c: 11 objects, 110592 bytes (0 active, 0 inactive, 49152 unbound)
> chromium-browse: 266 objects, 101367808 bytes (0 active, 101330944 inactive, 0 unbound)
> Xorg: 0 objects, 0 bytes (0 active, 0 inactive, 0 unbound)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: Throttle shrinkers harder
2014-04-25 17:18 ` Dave Hansen
2014-04-25 17:56 ` Dave Hansen
@ 2014-04-26 13:10 ` Chris Wilson
2014-04-28 16:38 ` Dave Hansen
1 sibling, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2014-04-26 13:10 UTC (permalink / raw)
To: Dave Hansen
Cc: Andrew Morton, linux-mm, intel-gfx, Mel Gorman, Michal Hocko,
Rik van Riel, Johannes Weiner, Dave Chinner, Glauber Costa,
Hugh Dickins, David Rientjes
On Fri, Apr 25, 2014 at 10:18:57AM -0700, Dave Hansen wrote:
> On 04/25/2014 12:23 AM, Chris Wilson wrote:
> > On Thu, Apr 24, 2014 at 03:35:47PM -0700, Dave Hansen wrote:
> >> On 04/24/2014 08:39 AM, Chris Wilson wrote:
> >>> On Thu, Apr 24, 2014 at 08:21:58AM -0700, Dave Hansen wrote:
> >>>> Is it possible that there's still a get_page() reference that's holding
> >>>> those pages in place from the graphics code?
> >>>
> >>> Not from i915.ko. The last resort of our shrinker is to drop all page
> >>> refs held by the GPU, which is invoked if we are asked to free memory
> >>> and we have no inactive objects left.
> >>
> >> How sure are we that this was performed before the OOM?
> >
> > Only by virtue of how shrink_slabs() works.
>
> Could we try to raise the level of assurance there, please? :)
>
> So this "last resort" is i915_gem_shrink_all()? It seems like we might
> have some problems getting down to that part of the code if we have
> problems getting the mutex.
In general, but not in this example where the load is tightly controlled.
> We have tracepoints for the shrinkers in here (it says slab, but it's
> all the shrinkers, I checked):
>
> /sys/kernel/debug/tracing/events/vmscan/mm_shrink_slab_*/enable
> and another for OOMs:
> /sys/kernel/debug/tracing/events/oom/enable
>
> Could you collect a trace during one of these OOM events and see what
> the i915 shrinker is doing? Just enable those two and then collect a
> copy of:
>
> /sys/kernel/debug/tracing/trace
>
> That'll give us some insight about how well the shrinker is working. If
> the VM gave up on calling in to it, it might reveal why we didn't get
> all the way down in to i915_gem_shrink_all().
I'll add it to the list for QA to try.
> > Thanks for the pointer to
> > register_oom_notifier(), I can use that to make sure that we do purge
> > everything from the GPU, and do a sanity check at the same time, before
> > we start killing processes.
>
> Actually, that one doesn't get called until we're *SURE* we are going to
> OOM. Any action taken in there won't be taken in to account.
blocking_notifier_call_chain(&oom_notify_list, 0, &freed);
if (freed > 0)
/* Got some memory back in the last second. */
return;
That looks like it should abort the oom and so repeat the allocation
attempt? Or is that too hopeful?
> >> Also, forgive me for being an idiot wrt the way graphics work, but are
> >> there any good candidates that you can think of that could be holding a
> >> reference? I've honestly never seen an OOM like this.
> >
> > Here the only place that we take a page reference is in
> > i915_gem_object_get_pages(). We do this when we first bind the pages
> > into the GPU's translation table, but we only release the pages once the
> > object is destroyed or the system experiences memory pressure. (Once the
> > GPU touches the pages, we no longer consider them to be cache coherent
> > with the CPU and so migrating them between the GPU and CPU requires
> > clflushing, which is expensive.)
> >
> > Aside from CPU mmaps of the shmemfs filp, all operations on our
> > graphical objects should lead to i915_gem_object_get_pages(). However
> > not all objects are recoverable as some may be pinned due to hardware
> > access.
>
> In that oom callback, could you dump out the aggregate number of
> obj->pages_pin_count across all the objects? That would be a very
> interesting piece of information to have. It would also be very
> insightful for folks who see OOMs in practice with i915 in their systems.
Indeed.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: Throttle shrinkers harder
2014-04-26 13:10 ` Chris Wilson
@ 2014-04-28 16:38 ` Dave Hansen
0 siblings, 0 replies; 13+ messages in thread
From: Dave Hansen @ 2014-04-28 16:38 UTC (permalink / raw)
To: Chris Wilson, Andrew Morton, linux-mm, intel-gfx, Mel Gorman,
Michal Hocko, Rik van Riel, Johannes Weiner, Dave Chinner,
Glauber Costa, Hugh Dickins, David Rientjes
On 04/26/2014 06:10 AM, Chris Wilson wrote:
>>> > > Thanks for the pointer to
>>> > > register_oom_notifier(), I can use that to make sure that we do purge
>>> > > everything from the GPU, and do a sanity check at the same time, before
>>> > > we start killing processes.
>> >
>> > Actually, that one doesn't get called until we're *SURE* we are going to
>> > OOM. Any action taken in there won't be taken in to account.
> blocking_notifier_call_chain(&oom_notify_list, 0, &freed);
> if (freed > 0)
> /* Got some memory back in the last second. */
> return;
>
> That looks like it should abort the oom and so repeat the allocation
> attempt? Or is that too hopeful?
You're correct. I was reading the code utterly wrong.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2014-04-28 16:39 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-10 7:05 [PATCH] mm: Throttle shrinkers harder Chris Wilson
2014-04-18 19:14 ` Andrew Morton
2014-04-22 19:30 ` Daniel Vetter
2014-04-23 21:14 ` Dave Hansen
2014-04-24 5:58 ` Chris Wilson
2014-04-24 15:21 ` Dave Hansen
2014-04-24 15:39 ` Chris Wilson
2014-04-24 22:35 ` Dave Hansen
2014-04-25 7:23 ` Chris Wilson
2014-04-25 17:18 ` Dave Hansen
2014-04-25 17:56 ` Dave Hansen
2014-04-26 13:10 ` Chris Wilson
2014-04-28 16:38 ` Dave Hansen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox