On Tue, 2019-02-19 at 13:04 +1100, Dave Chinner wrote: > On Tue, Feb 19, 2019 at 12:31:45AM +0000, Roman Gushchin wrote: > > Sorry, resending with the fixed to/cc list. Please, ignore the > > first letter. > > Please resend again with linux-fsdevel on the cc list, because this > isn't a MM topic given the regressions from the shrinker patches > have all been on the filesystem side of the shrinkers.... It looks like there are two separate things going on here. The first are an MM issues, one of potentially leaking memory by not scanning slabs with few items on them, and having such slabs stay around forever after the cgroup they were created for has disappeared, and the other of various other bugs with shrinker invocation behavior (like the nr_deferred fixes you posted a patch for). I believe these are MM topics. The second is the filesystem (and maybe other) shrinker functions' behavior being somewhat fragile and depending on closely on current MM behavior, potentially up to and including MM bugs. The lack of a contract between the MM and the shrinker callbacks is a recurring issue, and something we may want to discuss in a joint session. Some reflections on the shrinker/MM interaction: - Since all memory (in a zone) could potentially be in shrinker pools, shrinkers MUST eventually free some memory. - Shrinkers should not block kswapd from making progress. If kswapd got stuck in NFS inode writeback, and ended up not being able to free clean pages to receive network packets, that might cause a deadlock. - The MM should be able to deal with shrinkers doing nothing at this call, but having some work pending (eg. waiting on IO completion), without getting a false OOM kill. How can we do this best? - Related to the above: stalling in the shrinker code is unpredictable, and can take an arbitrarily long amount of time. Is there a better way we can make reclaimers wait for in-flight work to be completed? -- All Rights Reversed.