linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, intel-gfx@lists.freedesktop.org,
	Mel Gorman <mgorman@suse.de>, Michal Hocko <mhocko@suse.cz>,
	Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Dave Chinner <dchinner@redhat.com>,
	Glauber Costa <glommer@openvz.org>,
	Hugh Dickins <hughd@google.com>,
	David Rientjes <rientjes@google.com>
Subject: Re: [PATCH] mm: Throttle shrinkers harder
Date: Fri, 25 Apr 2014 10:18:57 -0700	[thread overview]
Message-ID: <535A9901.6090607@intel.com> (raw)
In-Reply-To: <20140425072325.GO31221@nuc-i3427.alporthouse.com>

On 04/25/2014 12:23 AM, Chris Wilson wrote:
> On Thu, Apr 24, 2014 at 03:35:47PM -0700, Dave Hansen wrote:
>> On 04/24/2014 08:39 AM, Chris Wilson wrote:
>>> On Thu, Apr 24, 2014 at 08:21:58AM -0700, Dave Hansen wrote:
>>>> Is it possible that there's still a get_page() reference that's holding
>>>> those pages in place from the graphics code?
>>>
>>> Not from i915.ko. The last resort of our shrinker is to drop all page
>>> refs held by the GPU, which is invoked if we are asked to free memory
>>> and we have no inactive objects left.
>>
>> How sure are we that this was performed before the OOM?
> 
> Only by virtue of how shrink_slabs() works.

Could we try to raise the level of assurance there, please? :)

So this "last resort" is i915_gem_shrink_all()?  It seems like we might
have some problems getting down to that part of the code if we have
problems getting the mutex.

We have tracepoints for the shrinkers in here (it says slab, but it's
all the shrinkers, I checked):

/sys/kernel/debug/tracing/events/vmscan/mm_shrink_slab_*/enable
and another for OOMs:
/sys/kernel/debug/tracing/events/oom/enable

Could you collect a trace during one of these OOM events and see what
the i915 shrinker is doing?  Just enable those two and then collect a
copy of:

	/sys/kernel/debug/tracing/trace

That'll give us some insight about how well the shrinker is working.  If
the VM gave up on calling in to it, it might reveal why we didn't get
all the way down in to i915_gem_shrink_all().

> Thanks for the pointer to
> register_oom_notifier(), I can use that to make sure that we do purge
> everything from the GPU, and do a sanity check at the same time, before
> we start killing processes.

Actually, that one doesn't get called until we're *SURE* we are going to
OOM.  Any action taken in there won't be taken in to account.

>> Also, forgive me for being an idiot wrt the way graphics work, but are
>> there any good candidates that you can think of that could be holding a
>> reference?  I've honestly never seen an OOM like this.
> 
> Here the only place that we take a page reference is in
> i915_gem_object_get_pages(). We do this when we first bind the pages
> into the GPU's translation table, but we only release the pages once the
> object is destroyed or the system experiences memory pressure. (Once the
> GPU touches the pages, we no longer consider them to be cache coherent
> with the CPU and so migrating them between the GPU and CPU requires
> clflushing, which is expensive.)
> 
> Aside from CPU mmaps of the shmemfs filp, all operations on our
> graphical objects should lead to i915_gem_object_get_pages(). However
> not all objects are recoverable as some may be pinned due to hardware
> access.

In that oom callback, could you dump out the aggregate number of
obj->pages_pin_count across all the objects?  That would be a very
interesting piece of information to have.  It would also be very
insightful for folks who see OOMs in practice with i915 in their systems.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-04-25 17:19 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-10  7:05 Chris Wilson
2014-04-18 19:14 ` Andrew Morton
2014-04-22 19:30   ` Daniel Vetter
2014-04-23 21:14     ` Dave Hansen
2014-04-24  5:58       ` Chris Wilson
2014-04-24 15:21         ` Dave Hansen
2014-04-24 15:39           ` Chris Wilson
2014-04-24 22:35             ` Dave Hansen
2014-04-25  7:23               ` Chris Wilson
2014-04-25 17:18                 ` Dave Hansen [this message]
2014-04-25 17:56                   ` Dave Hansen
2014-04-26 13:10                   ` Chris Wilson
2014-04-28 16:38                     ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=535A9901.6090607@intel.com \
    --to=dave.hansen@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=chris@chris-wilson.co.uk \
    --cc=dchinner@redhat.com \
    --cc=glommer@openvz.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox