Re: [RFC] Per file OOM badness

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Eric Anholt <eric@anholt.net>
To: Michal Hocko <mhocko@kernel.org>,
	Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org,
	Christian.Koenig@amd.com
Subject: Re: [RFC] Per file OOM badness
Date: Thu, 18 Jan 2018 12:01:32 -0800	[thread overview]
Message-ID: <87k1wfgcmb.fsf@anholt.net> (raw)
In-Reply-To: <20180118171355.GH6584@dhcp22.suse.cz>

[-- Attachment #1: Type: text/plain, Size: 4015 bytes --]

Michal Hocko <mhocko@kernel.org> writes:

> On Thu 18-01-18 18:00:06, Michal Hocko wrote:
>> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
>> > Hi, this series is a revised version of an RFC sent by Christian König
>> > a few years ago. The original RFC can be found at 
>> > https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
>> > 
>> > This is the same idea and I've just adressed his concern from the original RFC 
>> > and switched to a callback into file_ops instead of a new member in struct file.
>> 
>> Please add the full description to the cover letter and do not make
>> people hunt links.
>> 
>> Here is the origin cover letter text
>> : I'm currently working on the issue that when device drivers allocate memory on
>> : behalf of an application the OOM killer usually doesn't knew about that unless
>> : the application also get this memory mapped into their address space.
>> : 
>> : This is especially annoying for graphics drivers where a lot of the VRAM
>> : usually isn't CPU accessible and so doesn't make sense to map into the
>> : address space of the process using it.
>> : 
>> : The problem now is that when an application starts to use a lot of VRAM those
>> : buffers objects sooner or later get swapped out to system memory, but when we
>> : now run into an out of memory situation the OOM killer obviously doesn't knew
>> : anything about that memory and so usually kills the wrong process.
>
> OK, but how do you attribute that memory to a particular OOM killable
> entity? And how do you actually enforce that those resources get freed
> on the oom killer action?
>
>> : The following set of patches tries to address this problem by introducing a per
>> : file OOM badness score, which device drivers can use to give the OOM killer a
>> : hint how many resources are bound to a file descriptor so that it can make
>> : better decisions which process to kill.
>
> But files are not killable, they can be shared... In other words this
> doesn't help the oom killer to make an educated guess at all.

Maybe some more context would help the discussion?

The struct file in patch 3 is the DRM fd.  That's effectively "my
process's interface to talking to the GPU" not "a single GPU resource".
Once that file is closed, all of the process's private, idle GPU buffers
will be immediately freed (this will be most of their allocations), and
some will be freed once the GPU completes some work (this will be most
of the rest of their allocations).

Some GEM BOs won't be freed just by closing the fd, if they've been
shared between processes.  Those are usually about 8-24MB total in a
process, rather than the GBs that modern apps use (or that our testcases
like to allocate and thus trigger oomkilling of the test harness instead
of the offending testcase...)

Even if we just had the private+idle buffers being accounted in OOM
badness, that would be a huge step forward in system reliability.

>> : So question at every one: What do you think about this approach?
>
> I thing is just just wrong semantically. Non-reclaimable memory is a
> pain, especially when there is way too much of it. If you can free that
> memory somehow then you can hook into slab shrinker API and react on the
> memory pressure. If you can account such a memory to a particular
> process and make sure that the consumption is bound by the process life
> time then we can think of an accounting that oom_badness can consider
> when selecting a victim.

For graphics, we can't free most of our memory without also effectively
killing the process.  i915 and vc4 have "purgeable" interfaces for
userspace (on i915 this is exposed all the way to GL applications and is
hooked into shrinker, and on vc4 this is so far just used for
userspace-internal buffer caches to be purged when a CMA allocation
fails).  However, those purgeable pools are expected to be a tiny
fraction of the GPU allocations by the process.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

next prev parent reply	other threads:[~2018-01-18 20:01 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-18 16:47 Andrey Grodzovsky
2018-01-18 16:47 ` [PATCH 1/4] fs: add OOM badness callback in file_operatrations struct Andrey Grodzovsky
2018-01-18 16:47 ` [PATCH 2/4] oom: take per file badness into account Andrey Grodzovsky
2018-01-18 16:47 ` [PATCH 3/4] drm/gem: adjust per file OOM badness on handling buffers Andrey Grodzovsky
2018-01-19  6:01   ` Chunming Zhou
2018-01-18 16:47 ` [PATCH 4/4] drm/amdgpu: Use drm_oom_badness for amdgpu Andrey Grodzovsky
2018-01-30  9:24   ` Daniel Vetter
2018-01-30 12:42     ` Andrey Grodzovsky
2018-01-18 17:00 ` [RFC] Per file OOM badness Michal Hocko
2018-01-18 17:13   ` Michal Hocko
2018-01-18 20:01     ` Eric Anholt [this message]
2018-01-19  8:20       ` Michal Hocko
2018-01-19  8:39         ` Christian König
2018-01-19  9:32           ` Michel Dänzer
2018-01-19  9:58             ` Christian König
2018-01-19 10:02               ` Michel Dänzer
2018-01-19 15:07                 ` Michel Dänzer
2018-01-21  6:50                   ` Eric Anholt
2018-01-19 10:40           ` Michal Hocko
2018-01-19 11:37             ` Christian König
2018-01-19 12:13               ` Michal Hocko
2018-01-19 12:20                 ` Michal Hocko
2018-01-19 16:54                   ` Christian König
2018-01-23 11:39                     ` Michal Hocko
2018-01-19 16:48               ` Michel Dänzer
2018-01-19  8:35       ` Christian König
2018-01-19  6:01     ` He, Roger
2018-01-19  8:25       ` Michal Hocko
2018-01-19 10:02         ` roger
2018-01-23 15:27   ` Roman Gushchin
2018-01-23 15:36     ` Michal Hocko
2018-01-23 16:39       ` Michel Dänzer
2018-01-24  9:28         ` Michal Hocko
2018-01-24 10:27           ` Michel Dänzer
2018-01-24 11:01             ` Michal Hocko
2018-01-24 11:23               ` Michel Dänzer
2018-01-24 11:50                 ` Michal Hocko
2018-01-24 12:11                   ` Christian König
2018-01-30  9:31                     ` Daniel Vetter
2018-01-30  9:43                       ` Michel Dänzer
2018-01-30 10:40                         ` Christian König
2018-01-30 11:02                           ` Michel Dänzer
2018-01-30 11:28                             ` Christian König
2018-01-30 11:34                               ` Michel Dänzer
2018-01-30 11:36                                 ` Nicolai Hähnle
2018-01-30 11:42                                   ` Michel Dänzer
2018-01-30 11:56                                     ` Christian König
2018-01-30 15:52                                       ` Michel Dänzer
2018-01-30 10:42                         ` Daniel Vetter
2018-01-30 10:48                           ` Michel Dänzer
2018-01-30 11:35                             ` Nicolai Hähnle
2018-01-24 14:31                   ` Michel Dänzer
2018-01-30  9:29                   ` Michel Dänzer
2018-01-30 10:28                     ` Michal Hocko
2018-03-26 14:36                       ` Lucas Stach
2018-04-04  9:09                         ` Michel Dänzer
2018-04-04  9:36                           ` Lucas Stach
2018-04-04  9:46                             ` Michel Dänzer
2018-01-19  5:39 ` He, Roger
2018-01-19  8:17   ` Christian König
2018-01-22 23:23 ` Andrew Morton
2018-01-23  1:59   ` Andrey Grodzovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k1wfgcmb.fsf@anholt.net \
    --to=eric@anholt.net \
    --cc=Christian.Koenig@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=andrey.grodzovsky@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox