[PATCH] mm: Throttle shrinkers harder

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm: Throttle shrinkers harder
@ 2014-04-10  7:05 Chris Wilson
  2014-04-18 19:14 ` Andrew Morton
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2014-04-10  7:05 UTC (permalink / raw)
  To: linux-mm
  Cc: Rik van Riel, intel-gfx, Dave Chinner, Hugh Dickins,
	Michal Hocko, Mel Gorman, Johannes Weiner, Andrew Morton,
	Glauber Costa

During testing of i915.ko with working texture sets larger than RAM, we
encounter OOM with plenty of memory still trapped within writeback, e.g:

[   42.386039] active_anon:10134 inactive_anon:1900781 isolated_anon:32
 active_file:33 inactive_file:39 isolated_file:0
 unevictable:0 dirty:0 writeback:337627 unstable:0
 free:11985 slab_reclaimable:9458 slab_unreclaimable:23614
 mapped:41 shmem:1560769 pagetables:1276 bounce:0

If we throttle for writeback following shrink_slab, this gives us time
to wait upon the writeback generated by the i915.ko shinker:

[ 4756.750808] active_anon:24386 inactive_anon:900793 isolated_anon:0
 active_file:23 inactive_file:20 isolated_file:0
 unevictable:0 dirty:0 writeback:0 unstable:0
 free:5550 slab_reclaimable:5184 slab_unreclaimable:4888
 mapped:3 shmem:472393 pagetables:1249 bounce:0

(Sadly though the test is still failing.)

Testcase: igt/gem_tiled_swapping
References: https://bugs.freedesktop.org/show_bug.cgi?id=72742
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Glauber Costa <glommer@openvz.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: linux-mm@kvack.org
---
 mm/vmscan.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index a9c74b409681..8c2cb1150d17 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -135,6 +135,10 @@ unsigned long vm_total_pages;	/* The total number of pages which the VM controls
 static LIST_HEAD(shrinker_list);
 static DECLARE_RWSEM(shrinker_rwsem);
 
+static bool throttle_direct_reclaim(gfp_t gfp_mask,
+				    struct zonelist *zonelist,
+				    nodemask_t *nodemask);
+
 #ifdef CONFIG_MEMCG
 static bool global_reclaim(struct scan_control *sc)
 {
@@ -1521,7 +1525,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	 * of pages under pages flagged for immediate reclaim and stall if any
 	 * are encountered in the nr_immediate check below.
 	 */
-	if (nr_writeback && nr_writeback == nr_taken)
+	if (nr_writeback > nr_taken / 2)
 		zone_set_flag(zone, ZONE_WRITEBACK);
 
 	/*
@@ -2465,6 +2469,12 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
 						WB_REASON_TRY_TO_FREE_PAGES);
 			sc->may_writepage = 1;
 		}
+
+		if (global_reclaim(sc) &&
+		    throttle_direct_reclaim(sc->gfp_mask,
+					    zonelist,
+					    sc->nodemask))
+			aborted_reclaim = true;
 	} while (--sc->priority >= 0 && !aborted_reclaim);
 
 out:
-- 
1.9.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Throttle shrinkers harder
  2014-04-10  7:05 [PATCH] mm: Throttle shrinkers harder Chris Wilson
@ 2014-04-18 19:14 ` Andrew Morton
  2014-04-22 19:30   ` Daniel Vetter
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2014-04-18 19:14 UTC (permalink / raw)
  To: Chris Wilson
  Cc: linux-mm, intel-gfx, Mel Gorman, Michal Hocko, Rik van Riel,
	Johannes Weiner, Dave Chinner, Glauber Costa, Hugh Dickins,
	David Rientjes

On Thu, 10 Apr 2014 08:05:06 +0100 Chris Wilson <chris@chris-wilson.co.uk> wrote:

> During testing of i915.ko with working texture sets larger than RAM, we
> encounter OOM with plenty of memory still trapped within writeback, e.g:
> 
> [   42.386039] active_anon:10134 inactive_anon:1900781 isolated_anon:32
>  active_file:33 inactive_file:39 isolated_file:0
>  unevictable:0 dirty:0 writeback:337627 unstable:0
>  free:11985 slab_reclaimable:9458 slab_unreclaimable:23614
>  mapped:41 shmem:1560769 pagetables:1276 bounce:0
> 
> If we throttle for writeback following shrink_slab, this gives us time
> to wait upon the writeback generated by the i915.ko shinker:
> 
> [ 4756.750808] active_anon:24386 inactive_anon:900793 isolated_anon:0
>  active_file:23 inactive_file:20 isolated_file:0
>  unevictable:0 dirty:0 writeback:0 unstable:0
>  free:5550 slab_reclaimable:5184 slab_unreclaimable:4888
>  mapped:3 shmem:472393 pagetables:1249 bounce:0
> 
> (Sadly though the test is still failing.)
> 
> Testcase: igt/gem_tiled_swapping
> References: https://bugs.freedesktop.org/show_bug.cgi?id=72742

i915_gem_object_get_pages_gtt() makes my head spin, but
https://bugs.freedesktop.org/attachment.cgi?id=90818 says
"gfp_mask=0x201da" which is 

___GFP_HARDWALL|___GFP_COLD|___GFP_FS|___GFP_IO|___GFP_WAIT|___GFP_MOVABLE|___GFP_HIGHMEM

so this allocation should work and it very bad if the page allocator is
declaring oom while there is so much writeback in flight, assuming the
writeback is to eligible zones.

Mel, Johannes: could you take a look please?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Throttle shrinkers harder
  2014-04-18 19:14 ` Andrew Morton
@ 2014-04-22 19:30   ` Daniel Vetter
  2014-04-23 21:14     ` Dave Hansen
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Vetter @ 2014-04-22 19:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Chris Wilson, linux-mm, intel-gfx, Mel Gorman, Michal Hocko,
	Rik van Riel, Johannes Weiner, Dave Chinner, Glauber Costa,
	Hugh Dickins, David Rientjes

On Fri, Apr 18, 2014 at 12:14:16PM -0700, Andrew Morton wrote:
> On Thu, 10 Apr 2014 08:05:06 +0100 Chris Wilson <chris@chris-wilson.co.uk> wrote:
> 
> > During testing of i915.ko with working texture sets larger than RAM, we
> > encounter OOM with plenty of memory still trapped within writeback, e.g:
> > 
> > [   42.386039] active_anon:10134 inactive_anon:1900781 isolated_anon:32
> >  active_file:33 inactive_file:39 isolated_file:0
> >  unevictable:0 dirty:0 writeback:337627 unstable:0
> >  free:11985 slab_reclaimable:9458 slab_unreclaimable:23614
> >  mapped:41 shmem:1560769 pagetables:1276 bounce:0
> > 
> > If we throttle for writeback following shrink_slab, this gives us time
> > to wait upon the writeback generated by the i915.ko shinker:
> > 
> > [ 4756.750808] active_anon:24386 inactive_anon:900793 isolated_anon:0
> >  active_file:23 inactive_file:20 isolated_file:0
> >  unevictable:0 dirty:0 writeback:0 unstable:0
> >  free:5550 slab_reclaimable:5184 slab_unreclaimable:4888
> >  mapped:3 shmem:472393 pagetables:1249 bounce:0
> > 
> > (Sadly though the test is still failing.)
> > 
> > Testcase: igt/gem_tiled_swapping
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=72742
> 
> i915_gem_object_get_pages_gtt() makes my head spin, but
> https://bugs.freedesktop.org/attachment.cgi?id=90818 says
> "gfp_mask=0x201da" which is 
> 
> ___GFP_HARDWALL|___GFP_COLD|___GFP_FS|___GFP_IO|___GFP_WAIT|___GFP_MOVABLE|___GFP_HIGHMEM
> 
> so this allocation should work and it very bad if the page allocator is
> declaring oom while there is so much writeback in flight, assuming the
> writeback is to eligible zones.

For more head spinning look at the lock stealing dance we do in our
shrinker callbacks i915_gem_inactive_scan|count(). It's not pretty at all,
but it helps to avoids the dreaded oom in a few more cases. Some review of
our mess of ducttape from -mm developers with actual clue would be really
appreciated ...
-Daniel
 
> Mel, Johannes: could you take a look please?
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Throttle shrinkers harder
  2014-04-22 19:30   ` Daniel Vetter
@ 2014-04-23 21:14     ` Dave Hansen
  2014-04-24  5:58       ` Chris Wilson
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2014-04-23 21:14 UTC (permalink / raw)
  To: Andrew Morton, Chris Wilson, linux-mm, intel-gfx, Mel Gorman,
	Michal Hocko, Rik van Riel, Johannes Weiner, Dave Chinner,
	Glauber Costa, Hugh Dickins, David Rientjes

On 04/22/2014 12:30 PM, Daniel Vetter wrote:
>> > > During testing of i915.ko with working texture sets larger than RAM, we
>> > > encounter OOM with plenty of memory still trapped within writeback, e.g:
>> > > 
>> > > [   42.386039] active_anon:10134 inactive_anon:1900781 isolated_anon:32
>> > >  active_file:33 inactive_file:39 isolated_file:0
>> > >  unevictable:0 dirty:0 writeback:337627 unstable:0
>> > >  free:11985 slab_reclaimable:9458 slab_unreclaimable:23614
>> > >  mapped:41 shmem:1560769 pagetables:1276 bounce:0
>> > > 
>> > > If we throttle for writeback following shrink_slab, this gives us time
>> > > to wait upon the writeback generated by the i915.ko shinker:
>> > > 
>> > > [ 4756.750808] active_anon:24386 inactive_anon:900793 isolated_anon:0
>> > >  active_file:23 inactive_file:20 isolated_file:0
>> > >  unevictable:0 dirty:0 writeback:0 unstable:0
>> > >  free:5550 slab_reclaimable:5184 slab_unreclaimable:4888
>> > >  mapped:3 shmem:472393 pagetables:1249 bounce:0

Could you get some dumps of the entire set of OOM information?  These
are only tiny snippets.

Also, the vmstat output from the bug:

> https://bugs.freedesktop.org/show_bug.cgi?id=72742

shows there being an *AWFUL* lot of swap I/O going on here.  From the
looks of it, we stuck ~2GB in swap and evicted another 1.5GB of page
cache (although I guess that could be double-counting tmpfs getting
swapped out too).  Hmmm, was this one of the cases where you actually
ran _out_ of swap?

>  2  0  19472  33952    296 3610324    0 19472     0 19472 1474  151  3 27 71  0
>  4  0 484964  66468    296 3175864    0 465492     0 465516 2597 1395  0 32 66  2
>  0  2 751940  23692    980 3022884    0 266976   688 266976 3681  636  0 27 66  6
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  2  1 1244580 295336    988 2606984    0 492896     0 492908 1237  311  1  9 50 41
>  0  2 2047996  28760    988 2037144    0 803160     0 803160 1221 1291  1 15 69 14


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Throttle shrinkers harder
  2014-04-23 21:14     ` Dave Hansen
@ 2014-04-24  5:58       ` Chris Wilson
  2014-04-24 15:21         ` Dave Hansen
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2014-04-24  5:58 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, linux-mm, intel-gfx, Mel Gorman, Michal Hocko,
	Rik van Riel, Johannes Weiner, Dave Chinner, Glauber Costa,
	Hugh Dickins, David Rientjes

On Wed, Apr 23, 2014 at 02:14:36PM -0700, Dave Hansen wrote:
> On 04/22/2014 12:30 PM, Daniel Vetter wrote:
> >> > > During testing of i915.ko with working texture sets larger than RAM, we
> >> > > encounter OOM with plenty of memory still trapped within writeback, e.g:
> >> > > 
> >> > > [   42.386039] active_anon:10134 inactive_anon:1900781 isolated_anon:32
> >> > >  active_file:33 inactive_file:39 isolated_file:0
> >> > >  unevictable:0 dirty:0 writeback:337627 unstable:0
> >> > >  free:11985 slab_reclaimable:9458 slab_unreclaimable:23614
> >> > >  mapped:41 shmem:1560769 pagetables:1276 bounce:0
> >> > > 
> >> > > If we throttle for writeback following shrink_slab, this gives us time
> >> > > to wait upon the writeback generated by the i915.ko shinker:
> >> > > 
> >> > > [ 4756.750808] active_anon:24386 inactive_anon:900793 isolated_anon:0
> >> > >  active_file:23 inactive_file:20 isolated_file:0
> >> > >  unevictable:0 dirty:0 writeback:0 unstable:0
> >> > >  free:5550 slab_reclaimable:5184 slab_unreclaimable:4888
> >> > >  mapped:3 shmem:472393 pagetables:1249 bounce:0
> 
> Could you get some dumps of the entire set of OOM information?  These
> are only tiny snippets.

For reference the last oom report after flushing all the writeback:

[ 4756.749554] crond invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
[ 4756.749603] crond cpuset=/ mems_allowed=0
[ 4756.749628] CPU: 0 PID: 3574 Comm: crond Tainted: G        W    3.14.0_prts_de579f_20140410 #2
[ 4756.749676] Hardware name: Gigabyte Technology Co., Ltd. H55M-UD2H/H55M-UD2H, BIOS F4 12/02/2009
[ 4756.749723]  0000000000000000 00000000000201da ffffffff81717273 ffff8800d235dc40
[ 4756.749762]  ffffffff81714541 0000000000000400 ffff8800cb6f3b10 ffff880117ff8000
[ 4756.749800]  ffffffff81072266 0000000000000206 ffffffff812d6ebe ffff880112f25c40
[ 4756.749838] Call Trace:
[ 4756.749856]  [<ffffffff81717273>] ? dump_stack+0x41/0x51
[ 4756.749881]  [<ffffffff81714541>] ? dump_header.isra.8+0x69/0x191
[ 4756.749911]  [<ffffffff81072266>] ? ktime_get_ts+0x49/0xab
[ 4756.749938]  [<ffffffff812d6ebe>] ? ___ratelimit+0xae/0xc8
[ 4756.749965]  [<ffffffff810a72a8>] ? oom_kill_process+0x76/0x32c
[ 4756.749992]  [<ffffffff810a706d>] ? find_lock_task_mm+0x22/0x6e
[ 4756.750018]  [<ffffffff810a7add>] ? out_of_memory+0x41c/0x44f
[ 4756.750045]  [<ffffffff810ab31d>] ? __alloc_pages_nodemask+0x680/0x78d
[ 4756.750076]  [<ffffffff810d4b7f>] ? alloc_pages_current+0xbf/0xdc
[ 4756.750103]  [<ffffffff810a61f8>] ? filemap_fault+0x266/0x38b
[ 4756.750130]  [<ffffffff810bc3f5>] ? __do_fault+0xac/0x3bf
[ 4756.750155]  [<ffffffff810bfb85>] ? handle_mm_fault+0x1e7/0x7e2
[ 4756.750181]  [<ffffffff810bc960>] ? tlb_flush_mmu+0x4b/0x64
[ 4756.750219]  [<ffffffff812d8ed5>] ? timerqueue_add+0x79/0x98
[ 4756.750254]  [<ffffffff8104d283>] ? enqueue_hrtimer+0x15/0x37
[ 4756.750287]  [<ffffffff8171f63d>] ? __do_page_fault+0x42e/0x47b
[ 4756.750319]  [<ffffffff8104d580>] ? hrtimer_try_to_cancel+0x67/0x70
[ 4756.750353]  [<ffffffff8104d595>] ? hrtimer_cancel+0xc/0x16
[ 4756.750385]  [<ffffffff81719807>] ? do_nanosleep+0xb3/0xf1
[ 4756.750415]  [<ffffffff8104def9>] ? hrtimer_nanosleep+0x89/0x10b
[ 4756.750447]  [<ffffffff8171cbf2>] ? page_fault+0x22/0x30
[ 4756.750476] Mem-Info:
[ 4756.750490] Node 0 DMA per-cpu:
[ 4756.750510] CPU    0: hi:    0, btch:   1 usd:   0
[ 4756.750533] CPU    1: hi:    0, btch:   1 usd:   0
[ 4756.750555] CPU    2: hi:    0, btch:   1 usd:   0
[ 4756.750576] CPU    3: hi:    0, btch:   1 usd:   0
[ 4756.750598] Node 0 DMA32 per-cpu:
[ 4756.750615] CPU    0: hi:  186, btch:  31 usd:   0
[ 4756.750637] CPU    1: hi:  186, btch:  31 usd:   0
[ 4756.750660] CPU    2: hi:  186, btch:  31 usd:   0
[ 4756.750681] CPU    3: hi:  186, btch:  31 usd:   0
[ 4756.750702] Node 0 Normal per-cpu:
[ 4756.750720] CPU    0: hi:   90, btch:  15 usd:   0
[ 4756.750742] CPU    1: hi:   90, btch:  15 usd:   0
[ 4756.750763] CPU    2: hi:   90, btch:  15 usd:   0
[ 4756.750785] CPU    3: hi:   90, btch:  15 usd:   0
[ 4756.750808] active_anon:24386 inactive_anon:900793 isolated_anon:0
 active_file:23 inactive_file:20 isolated_file:0
 unevictable:0 dirty:0 writeback:0 unstable:0
 free:5550 slab_reclaimable:5184 slab_unreclaimable:4888
 mapped:3 shmem:472393 pagetables:1249 bounce:0
 free_cma:0
[ 4756.750938] Node 0 DMA free:14664kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:1024kB active_file:0kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:412kB slab_reclaimable:80kB slab_unreclaimable:24kB kernel_stack:0kB pagetables:48kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:76 all_unreclaimable? yes
[ 4756.751103] lowmem_reserve[]: 0 3337 3660 3660
[ 4756.751133] Node 0 DMA32 free:7208kB min:7044kB low:8804kB high:10564kB active_anon:36172kB inactive_anon:3351408kB active_file:92kB inactive_file:72kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3518336kB managed:3440548kB mlocked:0kB dirty:0kB writeback:0kB mapped:12kB shmem:1661420kB slab_reclaimable:17624kB slab_unreclaimable:14400kB kernel_stack:696kB pagetables:4324kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:327 all_unreclaimable? yes
[ 4756.751341] lowmem_reserve[]: 0 0 322 322
[ 4756.752889] Node 0 Normal free:328kB min:680kB low:848kB high:1020kB active_anon:61372kB inactive_anon:250740kB active_file:0kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:393216kB managed:330360kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:227740kB slab_reclaimable:3032kB slab_unreclaimable:5128kB kernel_stack:400kB pagetables:624kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:6 all_unreclaimable? yes
[ 4756.757635] lowmem_reserve[]: 0 0 0 0
[ 4756.759294] Node 0 DMA: 2*4kB (UM) 2*8kB (UM) 3*16kB (UEM) 4*32kB (UEM) 2*64kB (UM) 4*128kB (UEM) 2*256kB (EM) 2*512kB (EM) 2*1024kB (UM) 3*2048kB (EMR) 1*4096kB (M) = 14664kB
[ 4756.762776] Node 0 DMA32: 424*4kB (UEM) 171*8kB (UEM) 21*16kB (UEM) 1*32kB (R) 1*64kB (R) 1*128kB (R) 0*256kB 1*512kB (R) 1*1024kB (R) 1*2048kB (R) 0*4096kB = 7208kB
[ 4756.766284] Node 0 Normal: 26*4kB (UER) 18*8kB (UER) 3*16kB (E) 1*32kB (R) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 328kB
[ 4756.768198] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 4756.770026] 916139 total pagecache pages
[ 4756.771857] 443703 pages in swap cache
[ 4756.773695] Swap cache stats: add 15363874, delete 14920171, find 6533699/7512215
[ 4756.775592] Free swap  = 0kB
[ 4756.777505] Total swap = 2047996kB
[ 4756.779410] 981886 pages RAM
[ 4756.781307] 0 pages HighMem/MovableOnly
[ 4756.783192] 15714 pages reserved
[ 4756.785038] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[ 4756.786929] [ 2368]     0  2368    76187       93     153        0             0 systemd-journal
[ 4756.788846] [ 3204]     0  3204    10305      189      23        7         -1000 systemd-udevd
[ 4756.790789] [ 3223]     0  3223    24466       24      22       21             0 lvmetad
[ 4756.792749] [ 3297]     0  3297    12231       68      24       25         -1000 auditd
[ 4756.794715] [ 3306]     0  3306    20053       33       9        0             0 audispd
[ 4756.796680] [ 3308]     0  3308     5993       40      27        1             0 sedispatch
[ 4756.798654] [ 3315]     0  3315     9254       60      36        5             0 abrtd
[ 4756.800646] [ 3316]     0  3316     8725       60      35        1             0 abrt-watch-log
[ 4756.802627] [ 3324]     0  3324     8725       42      35       17             0 abrt-watch-log
[ 4756.804614] [ 3331]     0  3331     4778       43      21        9             0 irqbalance
[ 4756.806604] [ 3337]     0  3337     6069      131      16       20             0 smartd
[ 4756.808597] [ 3343]     0  3343     8249       80      20        0             0 systemd-logind
[ 4756.810593] [ 3344]     0  3344    65772      129      29       14             0 rsyslogd
[ 4756.812594] [ 3346]     0  3346    60608      305      50        9             0 NetworkManager
[ 4756.814602] [ 3347]    70  3347     7018       76      31        1             0 avahi-daemon
[ 4756.816619] [ 3352]    70  3352     6985       48      22        2             0 avahi-daemon
[ 4756.818629] [ 3353]    81  3353     6119      121      17        3          -900 dbus-daemon
[ 4756.820651] [ 3362]   993  3362     5647       55      15        4             0 chronyd
[ 4756.822694] [ 3363]     0  3363     1619       12      10       16             0 mcelog
[ 4756.824760] [ 3389]   999  3389   127896      746      47       65             0 polkitd
[ 4756.826827] [ 3397]     0  3397    40407      161      67       26          -900 modem-manager
[ 4756.828939] [ 3424]     0  3424    25498     2827      49      287             0 dhclient
[ 4756.831039] [ 3432]     0  3432   106838     1061     138      128             0 libvirtd
[ 4756.833154] [ 3446]     0  3446    20104      190      43       10         -1000 sshd
[ 4756.835275] [ 3453]    32  3453     9422       66      22       26             0 rpcbind
[ 4756.837308] [ 3461]     0  3461    25190      399      48       49             0 sendmail
[ 4756.839335] [ 3478]    51  3478    21452      361      41       16             0 sendmail
[ 4756.841386] [ 3573]     0  3573     5930       47      16        0             0 atd
[ 4756.843458] [ 3574]     0  3574     5126      147      14        2             0 crond
[ 4756.845578] [ 3579]     0  3579    27498       27      10        1             0 agetty
[ 4756.847706] [ 3582]     0  3582    32256      220      65       63             0 sshd
[ 4756.849844] [ 3586]     0  3586    29263      456      20       51             0 bash
[ 4756.851997] [ 3765]     0  3765    15968       93      47        0          1000 gem_tiled_swapp
[ 4756.854178] Out of memory: Kill process 3765 (gem_tiled_swapp) score 999 or sacrifice child
[ 4756.856377] Killed process 3765 (gem_tiled_swapp) total-vm:63872kB, anon-rss:368kB, file-rss:4kB

> Also, the vmstat output from the bug:
> 
> > https://bugs.freedesktop.org/show_bug.cgi?id=72742
> 
> shows there being an *AWFUL* lot of swap I/O going on here.  From the
> looks of it, we stuck ~2GB in swap and evicted another 1.5GB of page
> cache (although I guess that could be double-counting tmpfs getting
> swapped out too).  Hmmm, was this one of the cases where you actually
> ran _out_ of swap?

Yes. This bug is a little odd because they always run out of swap. We
have another category of bug (which appears to be fixed, touch wood)
where we trigger oom without even touching swap. The test case is
designed to only just swap (use at most 1/4 of the available swap space)
and checks that its working set should fit into available memory + swap.
However, when QA run the test, their systems run completely out of
virtual memory. There is a discrepancy on their machines where
anon_inactive is reported as being 2x shmem, but we only expect
anon_inactive to be our own shmem allocations. I don't know how to track
what else is using anon_inactive. Suggestions?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Throttle shrinkers harder
  2014-04-24  5:58       ` Chris Wilson
@ 2014-04-24 15:21         ` Dave Hansen
  2014-04-24 15:39           ` Chris Wilson
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2014-04-24 15:21 UTC (permalink / raw)
  To: Chris Wilson, Andrew Morton, linux-mm, intel-gfx, Mel Gorman,
	Michal Hocko, Rik van Riel, Johannes Weiner, Dave Chinner,
	Glauber Costa, Hugh Dickins, David Rientjes

On 04/23/2014 10:58 PM, Chris Wilson wrote:
> [ 4756.750938] Node 0 DMA free:14664kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:1024kB active_file:0kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:412kB slab_reclaimable:80kB slab_unreclaimable:24kB kernel_stack:0kB pagetables:48kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:76 all_unreclaimable? yes
> [ 4756.751103] lowmem_reserve[]: 0 3337 3660 3660
> [ 4756.751133] Node 0 DMA32 free:7208kB min:7044kB low:8804kB high:10564kB active_anon:36172kB inactive_anon:3351408kB active_file:92kB inactive_file:72kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3518336kB managed:3440548kB mlocked:0kB dirty:0kB writeback:0kB mapped:12kB shmem:1661420kB slab_reclaimable:17624kB slab_unreclaimable:14400kB kernel_stack:696kB pagetables:4324kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:327 all_unreclaimable? yes
> [ 4756.751341] lowmem_reserve[]: 0 0 322 322
> [ 4756.752889] Node 0 Normal free:328kB min:680kB low:848kB high:1020kB active_anon:61372kB inactive_anon:250740kB active_file:0kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:393216kB managed:330360kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:227740kB slab_reclaimable:3032kB slab_unreclaimable:5128kB kernel_stack:400kB pagetables:624kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:6 all_unreclaimable? yes
> [ 4756.757635] lowmem_reserve[]: 0 0 0 0
> [ 4756.759294] Node 0 DMA: 2*4kB (UM) 2*8kB (UM) 3*16kB (UEM) 4*32kB (UEM) 2*64kB (UM) 4*128kB (UEM) 2*256kB (EM) 2*512kB (EM) 2*1024kB (UM) 3*2048kB (EMR) 1*4096kB (M) = 14664kB
> [ 4756.762776] Node 0 DMA32: 424*4kB (UEM) 171*8kB (UEM) 21*16kB (UEM) 1*32kB (R) 1*64kB (R) 1*128kB (R) 0*256kB 1*512kB (R) 1*1024kB (R) 1*2048kB (R) 0*4096kB = 7208kB
> [ 4756.766284] Node 0 Normal: 26*4kB (UER) 18*8kB (UER) 3*16kB (E) 1*32kB (R) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 328kB
> [ 4756.768198] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> [ 4756.770026] 916139 total pagecache pages
> [ 4756.771857] 443703 pages in swap cache
> [ 4756.773695] Swap cache stats: add 15363874, delete 14920171, find 6533699/7512215
> [ 4756.775592] Free swap  = 0kB
> [ 4756.777505] Total swap = 2047996kB

OK, so here's my theory as to what happens:

1. The graphics pages got put on the LRU
2. System is low on memory, they get on (and *STAY* on) the inactive
   LRU.
3. VM adds graphics pages to the swap cache, and writes them out, and
   we see the writeout from the vmstat, and lots of adds/removes from
   the swap cache.
4. But, despite all the swap writeout, we don't get helped by seeing
   much memory get freed.  Why?

I _suspect_ that the graphics drivers here are holding a reference to
the page.  During reclaim, we're mostly concerned with the pages being
mapped.  If we manage to get them unmapped, we'll go ahead and swap
them, which I _think_ is what we're seeing.  But, when it comes time to
_actually_ free them, that last reference on the page keeps them from
being freed.

Is it possible that there's still a get_page() reference that's holding
those pages in place from the graphics code?

>> Also, the vmstat output from the bug:
>>
>>> https://bugs.freedesktop.org/show_bug.cgi?id=72742
>>
>> shows there being an *AWFUL* lot of swap I/O going on here.  From the
>> looks of it, we stuck ~2GB in swap and evicted another 1.5GB of page
>> cache (although I guess that could be double-counting tmpfs getting
>> swapped out too).  Hmmm, was this one of the cases where you actually
>> ran _out_ of swap?
> 
> Yes. This bug is a little odd because they always run out of swap. We
> have another category of bug (which appears to be fixed, touch wood)
> where we trigger oom without even touching swap. The test case is
> designed to only just swap (use at most 1/4 of the available swap space)
> and checks that its working set should fit into available memory + swap.
> However, when QA run the test, their systems run completely out of
> virtual memory. There is a discrepancy on their machines where
> anon_inactive is reported as being 2x shmem, but we only expect
> anon_inactive to be our own shmem allocations. I don't know how to track
> what else is using anon_inactive. Suggestions?

Let's tackle one bug at a time.  They might be the same thing.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Throttle shrinkers harder
  2014-04-24 15:21         ` Dave Hansen
@ 2014-04-24 15:39           ` Chris Wilson
  2014-04-24 22:35             ` Dave Hansen
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2014-04-24 15:39 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, linux-mm, intel-gfx, Mel Gorman, Michal Hocko,
	Rik van Riel, Johannes Weiner, Dave Chinner, Glauber Costa,
	Hugh Dickins, David Rientjes

On Thu, Apr 24, 2014 at 08:21:58AM -0700, Dave Hansen wrote:
> On 04/23/2014 10:58 PM, Chris Wilson wrote:
> > [ 4756.750938] Node 0 DMA free:14664kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:1024kB active_file:0kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:412kB slab_reclaimable:80kB slab_unreclaimable:24kB kernel_stack:0kB pagetables:48kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:76 all_unreclaimable? yes
> > [ 4756.751103] lowmem_reserve[]: 0 3337 3660 3660
> > [ 4756.751133] Node 0 DMA32 free:7208kB min:7044kB low:8804kB high:10564kB active_anon:36172kB inactive_anon:3351408kB active_file:92kB inactive_file:72kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3518336kB managed:3440548kB mlocked:0kB dirty:0kB writeback:0kB mapped:12kB shmem:1661420kB slab_reclaimable:17624kB slab_unreclaimable:14400kB kernel_stack:696kB pagetables:4324kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:327 all_unreclaimable? yes
> > [ 4756.751341] lowmem_reserve[]: 0 0 322 322
> > [ 4756.752889] Node 0 Normal free:328kB min:680kB low:848kB high:1020kB active_anon:61372kB inactive_anon:250740kB active_file:0kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:393216kB managed:330360kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:227740kB slab_reclaimable:3032kB slab_unreclaimable:5128kB kernel_stack:400kB pagetables:624kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:6 all_unreclaimable? yes
> > [ 4756.757635] lowmem_reserve[]: 0 0 0 0
> > [ 4756.759294] Node 0 DMA: 2*4kB (UM) 2*8kB (UM) 3*16kB (UEM) 4*32kB (UEM) 2*64kB (UM) 4*128kB (UEM) 2*256kB (EM) 2*512kB (EM) 2*1024kB (UM) 3*2048kB (EMR) 1*4096kB (M) = 14664kB
> > [ 4756.762776] Node 0 DMA32: 424*4kB (UEM) 171*8kB (UEM) 21*16kB (UEM) 1*32kB (R) 1*64kB (R) 1*128kB (R) 0*256kB 1*512kB (R) 1*1024kB (R) 1*2048kB (R) 0*4096kB = 7208kB
> > [ 4756.766284] Node 0 Normal: 26*4kB (UER) 18*8kB (UER) 3*16kB (E) 1*32kB (R) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 328kB
> > [ 4756.768198] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> > [ 4756.770026] 916139 total pagecache pages
> > [ 4756.771857] 443703 pages in swap cache
> > [ 4756.773695] Swap cache stats: add 15363874, delete 14920171, find 6533699/7512215
> > [ 4756.775592] Free swap  = 0kB
> > [ 4756.777505] Total swap = 2047996kB
> 
> OK, so here's my theory as to what happens:
> 
> 1. The graphics pages got put on the LRU
> 2. System is low on memory, they get on (and *STAY* on) the inactive
>    LRU.
> 3. VM adds graphics pages to the swap cache, and writes them out, and
>    we see the writeout from the vmstat, and lots of adds/removes from
>    the swap cache.
> 4. But, despite all the swap writeout, we don't get helped by seeing
>    much memory get freed.  Why?
> 
> I _suspect_ that the graphics drivers here are holding a reference to
> the page.  During reclaim, we're mostly concerned with the pages being
> mapped.  If we manage to get them unmapped, we'll go ahead and swap
> them, which I _think_ is what we're seeing.  But, when it comes time to
> _actually_ free them, that last reference on the page keeps them from
> being freed.
> 
> Is it possible that there's still a get_page() reference that's holding
> those pages in place from the graphics code?

Not from i915.ko. The last resort of our shrinker is to drop all page
refs held by the GPU, which is invoked if we are asked to free memory
and we have no inactive objects left.

If we could get a callback for the oom report, I could dump some details
about what the GPU is holding onto. That seems like a useful extension to
add to the shrinkers.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Throttle shrinkers harder
  2014-04-24 15:39           ` Chris Wilson
@ 2014-04-24 22:35             ` Dave Hansen
  2014-04-25  7:23               ` Chris Wilson
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2014-04-24 22:35 UTC (permalink / raw)
  To: Chris Wilson, Andrew Morton, linux-mm, intel-gfx, Mel Gorman,
	Michal Hocko, Rik van Riel, Johannes Weiner, Dave Chinner,
	Glauber Costa, Hugh Dickins, David Rientjes

On 04/24/2014 08:39 AM, Chris Wilson wrote:
> On Thu, Apr 24, 2014 at 08:21:58AM -0700, Dave Hansen wrote:
>> Is it possible that there's still a get_page() reference that's holding
>> those pages in place from the graphics code?
> 
> Not from i915.ko. The last resort of our shrinker is to drop all page
> refs held by the GPU, which is invoked if we are asked to free memory
> and we have no inactive objects left.

How sure are we that this was performed before the OOM?

Also, forgive me for being an idiot wrt the way graphics work, but are
there any good candidates that you can think of that could be holding a
reference?  I've honestly never seen an OOM like this.

Somewhat rhetorical question for the mm folks on cc: should we be
sticking the pages on which you're holding a reference on our
unreclaimable list?

> If we could get a callback for the oom report, I could dump some details
> about what the GPU is holding onto. That seems like a useful extension to
> add to the shrinkers.

There's a register_oom_notifier().  Is that sufficient for your use, or
is there something additional that would help?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Throttle shrinkers harder
  2014-04-24 22:35             ` Dave Hansen
@ 2014-04-25  7:23               ` Chris Wilson
  2014-04-25 17:18                 ` Dave Hansen
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2014-04-25  7:23 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, linux-mm, intel-gfx, Mel Gorman, Michal Hocko,
	Rik van Riel, Johannes Weiner, Dave Chinner, Glauber Costa,
	Hugh Dickins, David Rientjes

On Thu, Apr 24, 2014 at 03:35:47PM -0700, Dave Hansen wrote:
> On 04/24/2014 08:39 AM, Chris Wilson wrote:
> > On Thu, Apr 24, 2014 at 08:21:58AM -0700, Dave Hansen wrote:
> >> Is it possible that there's still a get_page() reference that's holding
> >> those pages in place from the graphics code?
> > 
> > Not from i915.ko. The last resort of our shrinker is to drop all page
> > refs held by the GPU, which is invoked if we are asked to free memory
> > and we have no inactive objects left.
> 
> How sure are we that this was performed before the OOM?

Only by virtue of how shrink_slabs() works. Thanks for the pointer to
register_oom_notifier(), I can use that to make sure that we do purge
everything from the GPU, and do a sanity check at the same time, before
we start killing processes.
 
> Also, forgive me for being an idiot wrt the way graphics work, but are
> there any good candidates that you can think of that could be holding a
> reference?  I've honestly never seen an OOM like this.

Here the only place that we take a page reference is in
i915_gem_object_get_pages(). We do this when we first bind the pages
into the GPU's translation table, but we only release the pages once the
object is destroyed or the system experiences memory pressure. (Once the
GPU touches the pages, we no longer consider them to be cache coherent
with the CPU and so migrating them between the GPU and CPU requires
clflushing, which is expensive.)

Aside from CPU mmaps of the shmemfs filp, all operations on our
graphical objects should lead to i915_gem_object_get_pages(). However
not all objects are recoverable as some may be pinned due to hardware
access.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Throttle shrinkers harder
  2014-04-25  7:23               ` Chris Wilson
@ 2014-04-25 17:18                 ` Dave Hansen
  2014-04-25 17:56                   ` Dave Hansen
  2014-04-26 13:10                   ` Chris Wilson
  0 siblings, 2 replies; 13+ messages in thread
From: Dave Hansen @ 2014-04-25 17:18 UTC (permalink / raw)
  To: Chris Wilson, Andrew Morton, linux-mm, intel-gfx, Mel Gorman,
	Michal Hocko, Rik van Riel, Johannes Weiner, Dave Chinner,
	Glauber Costa, Hugh Dickins, David Rientjes

On 04/25/2014 12:23 AM, Chris Wilson wrote:
> On Thu, Apr 24, 2014 at 03:35:47PM -0700, Dave Hansen wrote:
>> On 04/24/2014 08:39 AM, Chris Wilson wrote:
>>> On Thu, Apr 24, 2014 at 08:21:58AM -0700, Dave Hansen wrote:
>>>> Is it possible that there's still a get_page() reference that's holding
>>>> those pages in place from the graphics code?
>>>
>>> Not from i915.ko. The last resort of our shrinker is to drop all page
>>> refs held by the GPU, which is invoked if we are asked to free memory
>>> and we have no inactive objects left.
>>
>> How sure are we that this was performed before the OOM?
> 
> Only by virtue of how shrink_slabs() works.

Could we try to raise the level of assurance there, please? :)

So this "last resort" is i915_gem_shrink_all()?  It seems like we might
have some problems getting down to that part of the code if we have
problems getting the mutex.

We have tracepoints for the shrinkers in here (it says slab, but it's
all the shrinkers, I checked):

/sys/kernel/debug/tracing/events/vmscan/mm_shrink_slab_*/enable
and another for OOMs:
/sys/kernel/debug/tracing/events/oom/enable

Could you collect a trace during one of these OOM events and see what
the i915 shrinker is doing?  Just enable those two and then collect a
copy of:

	/sys/kernel/debug/tracing/trace

That'll give us some insight about how well the shrinker is working.  If
the VM gave up on calling in to it, it might reveal why we didn't get
all the way down in to i915_gem_shrink_all().

> Thanks for the pointer to
> register_oom_notifier(), I can use that to make sure that we do purge
> everything from the GPU, and do a sanity check at the same time, before
> we start killing processes.

Actually, that one doesn't get called until we're *SURE* we are going to
OOM.  Any action taken in there won't be taken in to account.

>> Also, forgive me for being an idiot wrt the way graphics work, but are
>> there any good candidates that you can think of that could be holding a
>> reference?  I've honestly never seen an OOM like this.
> 
> Here the only place that we take a page reference is in
> i915_gem_object_get_pages(). We do this when we first bind the pages
> into the GPU's translation table, but we only release the pages once the
> object is destroyed or the system experiences memory pressure. (Once the
> GPU touches the pages, we no longer consider them to be cache coherent
> with the CPU and so migrating them between the GPU and CPU requires
> clflushing, which is expensive.)
> 
> Aside from CPU mmaps of the shmemfs filp, all operations on our
> graphical objects should lead to i915_gem_object_get_pages(). However
> not all objects are recoverable as some may be pinned due to hardware
> access.

In that oom callback, could you dump out the aggregate number of
obj->pages_pin_count across all the objects?  That would be a very
interesting piece of information to have.  It would also be very
insightful for folks who see OOMs in practice with i915 in their systems.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Throttle shrinkers harder
  2014-04-25 17:18                 ` Dave Hansen
@ 2014-04-25 17:56                   ` Dave Hansen
  2014-04-26 13:10                   ` Chris Wilson
  1 sibling, 0 replies; 13+ messages in thread
From: Dave Hansen @ 2014-04-25 17:56 UTC (permalink / raw)
  To: Chris Wilson, Andrew Morton, linux-mm, intel-gfx, Mel Gorman,
	Michal Hocko, Rik van Riel, Johannes Weiner, Dave Chinner,
	Glauber Costa, Hugh Dickins, David Rientjes

Poking around with those tracepoints, I don't see the i915 shrinker
getting run, only i915_gem_inactive_count() being called.  It must be
returning 0 because we're never even _getting_ to the tracepoints
themselves after calling i915_gem_inactive_count().

This is on my laptop, and I haven't been able to coax i915 in to
reclaiming a single page in 10 or 15 minutes.  That seems fishy to me.
Surely *SOMETHING* has become reclaimable in that time.

Here's /sys/kernel/debug/dri/0/i915_gem_objects:

> 919 objects, 354914304 bytes
> 874 [333] objects, 291004416 [93614080] bytes in gtt
>   0 [0] active objects, 0 [0] bytes
>   874 [333] inactive objects, 291004416 [93614080] bytes
> 0 unbound objects, 0 bytes
> 199 purgeable objects, 92844032 bytes
> 30 pinned mappable objects, 18989056 bytes
> 139 fault mappable objects, 17371136 bytes
> 2145386496 [268435456] gtt total
> 
> Xorg: 632 objects, 235450368 bytes (0 active, 180899840 inactive, 21262336 unbound)
> gnome-control-c: 11 objects, 110592 bytes (0 active, 0 inactive, 49152 unbound)
> chromium-browse: 266 objects, 101367808 bytes (0 active, 101330944 inactive, 0 unbound)
> Xorg: 0 objects, 0 bytes (0 active, 0 inactive, 0 unbound)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Throttle shrinkers harder
  2014-04-25 17:18                 ` Dave Hansen
  2014-04-25 17:56                   ` Dave Hansen
@ 2014-04-26 13:10                   ` Chris Wilson
  2014-04-28 16:38                     ` Dave Hansen
  1 sibling, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2014-04-26 13:10 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, linux-mm, intel-gfx, Mel Gorman, Michal Hocko,
	Rik van Riel, Johannes Weiner, Dave Chinner, Glauber Costa,
	Hugh Dickins, David Rientjes

On Fri, Apr 25, 2014 at 10:18:57AM -0700, Dave Hansen wrote:
> On 04/25/2014 12:23 AM, Chris Wilson wrote:
> > On Thu, Apr 24, 2014 at 03:35:47PM -0700, Dave Hansen wrote:
> >> On 04/24/2014 08:39 AM, Chris Wilson wrote:
> >>> On Thu, Apr 24, 2014 at 08:21:58AM -0700, Dave Hansen wrote:
> >>>> Is it possible that there's still a get_page() reference that's holding
> >>>> those pages in place from the graphics code?
> >>>
> >>> Not from i915.ko. The last resort of our shrinker is to drop all page
> >>> refs held by the GPU, which is invoked if we are asked to free memory
> >>> and we have no inactive objects left.
> >>
> >> How sure are we that this was performed before the OOM?
> > 
> > Only by virtue of how shrink_slabs() works.
> 
> Could we try to raise the level of assurance there, please? :)
> 
> So this "last resort" is i915_gem_shrink_all()?  It seems like we might
> have some problems getting down to that part of the code if we have
> problems getting the mutex.

In general, but not in this example where the load is tightly controlled.
 
> We have tracepoints for the shrinkers in here (it says slab, but it's
> all the shrinkers, I checked):
> 
> /sys/kernel/debug/tracing/events/vmscan/mm_shrink_slab_*/enable
> and another for OOMs:
> /sys/kernel/debug/tracing/events/oom/enable
> 
> Could you collect a trace during one of these OOM events and see what
> the i915 shrinker is doing?  Just enable those two and then collect a
> copy of:
> 
> 	/sys/kernel/debug/tracing/trace
> 
> That'll give us some insight about how well the shrinker is working.  If
> the VM gave up on calling in to it, it might reveal why we didn't get
> all the way down in to i915_gem_shrink_all().

I'll add it to the list for QA to try.
 
> > Thanks for the pointer to
> > register_oom_notifier(), I can use that to make sure that we do purge
> > everything from the GPU, and do a sanity check at the same time, before
> > we start killing processes.
> 
> Actually, that one doesn't get called until we're *SURE* we are going to
> OOM.  Any action taken in there won't be taken in to account.

blocking_notifier_call_chain(&oom_notify_list, 0, &freed);
if (freed > 0)
	/* Got some memory back in the last second. */
	return;

That looks like it should abort the oom and so repeat the allocation
attempt? Or is that too hopeful?

> >> Also, forgive me for being an idiot wrt the way graphics work, but are
> >> there any good candidates that you can think of that could be holding a
> >> reference?  I've honestly never seen an OOM like this.
> > 
> > Here the only place that we take a page reference is in
> > i915_gem_object_get_pages(). We do this when we first bind the pages
> > into the GPU's translation table, but we only release the pages once the
> > object is destroyed or the system experiences memory pressure. (Once the
> > GPU touches the pages, we no longer consider them to be cache coherent
> > with the CPU and so migrating them between the GPU and CPU requires
> > clflushing, which is expensive.)
> > 
> > Aside from CPU mmaps of the shmemfs filp, all operations on our
> > graphical objects should lead to i915_gem_object_get_pages(). However
> > not all objects are recoverable as some may be pinned due to hardware
> > access.
> 
> In that oom callback, could you dump out the aggregate number of
> obj->pages_pin_count across all the objects?  That would be a very
> interesting piece of information to have.  It would also be very
> insightful for folks who see OOMs in practice with i915 in their systems.

Indeed.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Throttle shrinkers harder
  2014-04-26 13:10                   ` Chris Wilson
@ 2014-04-28 16:38                     ` Dave Hansen
  0 siblings, 0 replies; 13+ messages in thread
From: Dave Hansen @ 2014-04-28 16:38 UTC (permalink / raw)
  To: Chris Wilson, Andrew Morton, linux-mm, intel-gfx, Mel Gorman,
	Michal Hocko, Rik van Riel, Johannes Weiner, Dave Chinner,
	Glauber Costa, Hugh Dickins, David Rientjes

On 04/26/2014 06:10 AM, Chris Wilson wrote:
>>> > > Thanks for the pointer to
>>> > > register_oom_notifier(), I can use that to make sure that we do purge
>>> > > everything from the GPU, and do a sanity check at the same time, before
>>> > > we start killing processes.
>> > 
>> > Actually, that one doesn't get called until we're *SURE* we are going to
>> > OOM.  Any action taken in there won't be taken in to account.
> blocking_notifier_call_chain(&oom_notify_list, 0, &freed);
> if (freed > 0)
> 	/* Got some memory back in the last second. */
> 	return;
> 
> That looks like it should abort the oom and so repeat the allocation
> attempt? Or is that too hopeful?

You're correct.  I was reading the code utterly wrong.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-04-28 16:39 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-10  7:05 [PATCH] mm: Throttle shrinkers harder Chris Wilson
2014-04-18 19:14 ` Andrew Morton
2014-04-22 19:30   ` Daniel Vetter
2014-04-23 21:14     ` Dave Hansen
2014-04-24  5:58       ` Chris Wilson
2014-04-24 15:21         ` Dave Hansen
2014-04-24 15:39           ` Chris Wilson
2014-04-24 22:35             ` Dave Hansen
2014-04-25  7:23               ` Chris Wilson
2014-04-25 17:18                 ` Dave Hansen
2014-04-25 17:56                   ` Dave Hansen
2014-04-26 13:10                   ` Chris Wilson
2014-04-28 16:38                     ` Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox