On Tue, 2019-02-19 at 13:04 +1100, Dave Chinner wrote:
> On Tue, Feb 19, 2019 at 12:31:45AM +0000, Roman Gushchin wrote:
> > Sorry, resending with the fixed to/cc list. Please, ignore the
> > first letter.
> 
> Please resend again with linux-fsdevel on the cc list, because this
> isn't a MM topic given the regressions from the shrinker patches
> have all been on the filesystem side of the shrinkers....

It looks like there are two separate things going on here.

The first are an MM issues, one of potentially leaking memory
by not scanning slabs with few items on them, and having
such slabs stay around forever after the cgroup they were
created for has disappeared, and the other of various other
bugs with shrinker invocation behavior (like the nr_deferred
fixes you posted a patch for). I believe these are MM topics.


The second is the filesystem (and maybe other) shrinker
functions' behavior being somewhat fragile and depending
on closely on current MM behavior, potentially up to
and including MM bugs.

The lack of a contract between the MM and the shrinker
callbacks is a recurring issue, and something we may
want to discuss in a joint session.

Some reflections on the shrinker/MM interaction:
- Since all memory (in a zone) could potentially be in
  shrinker pools, shrinkers MUST eventually free some
  memory.
- Shrinkers should not block kswapd from making progress.
  If kswapd got stuck in NFS inode writeback, and ended up
  not being able to free clean pages to receive network
  packets, that might cause a deadlock.
- The MM should be able to deal with shrinkers doing
  nothing at this call, but having some work pending 
  (eg. waiting on IO completion), without getting a false
  OOM kill. How can we do this best?
- Related to the above: stalling in the shrinker code is
  unpredictable, and can take an arbitrarily long amount
  of time. Is there a better way we can make reclaimers
  wait for in-flight work to be completed?

-- 
All Rights Reversed.