On Mon, Feb 3, 2020 at 1:22 PM Alexander Duyck < alexander.h.duyck@linux.intel.com> wrote: > On Mon, 2020-02-03 at 12:32 -0800, Tyler Sanderson wrote: > > There were apparently good reasons for moving away from OOM notifier > > callback: > > https://lkml.org/lkml/2018/7/12/314 > > https://lkml.org/lkml/2018/8/2/322 > > > > In particular the OOM notifier is worse than the shrinker because: > > It is last-resort, which means the system has already gone through > > heroics to prevent OOM. Those heroic reclaim efforts are expensive and > > impact application performance. > > It lacks understanding of NUMA or other OOM constraints. > > It has a higher potential for bugs due to the subtlety of the callback > > context. > > Given the above, I think the shrinker API certainly makes the most sense > > _if_ the balloon size is static. In that case memory should be reclaimed > > from the balloon early and proportionally to balloon size, which the > > shrinker API achieves. > > The problem is the shrinker doesn't have any concept of tiering or > priority. I suspect he reason for using the OOM notification is because in > practice it should be the last thing we are pulling memory out of with > things like page cache and slab caches being first. Once we have pages > that are leaked out of the balloon by the shrinker it will trigger the > balloon wanting to reinflate. Deciding whether to trade IO performance (page cache) for memory-usage efficiency (balloon) seems use-case dependent. Deciding when to re-inflate is a similar policy choice. If the balloon's shrinker priority is hard-coded to "last-resort" then there would be no way to implement a policy where page cache growth could shrink the balloon. The current balloon implementation allows the host to implement this policy and tune the tradeoff between balloon and page cache. > Ideally if the shrinker is running we > shouldn't be able to reinflate the balloon, and if we are reinflating the > balloon we shouldn't need to run the shrinker. The fact that we can do > both at the same time is problematic. > I agree that this is inefficient. > > > However, if the balloon is inflating and intentionally causing memory > > pressure then this results in the inefficiency pointed out earlier. > > > > If the balloon is inflating but not causing memory pressure then there > > is no problem with either API. > > The entire point of the balloon is to cause memory pressure. Otherwise > essentially all we are really doing is hinting since the guest doesn't > need the memory and isn't going to use it any time soon. > Causing memory pressure is just a mechanism to achieve increased reclaim. If there was a better mechanism (like the fine-grained-cache-shrinking one discussed below) then I think the balloon device would be perfectly justified in using that instead (and maybe "balloon" becomes a misnomer. Oh well). > > > This suggests another route: rather than cause memory pressure to shrink > > the page cache, the balloon could issue the equivalent of "echo 3 > > > /proc/sys/vm/drop_caches". > > Of course ideally, we want to be more fine grained than "drop > > everything". We really want an API that says "drop everything that > > hasn't been accessed in the last 5 minutes". > > > > This would eliminate the need for the balloon to cause memory pressure > > at all which avoids the inefficiency in question. Furthermore, this > > pairs nicely with the FREE_PAGE_HINT feature. > > Something similar was brought up in the discussion we had about this in my > patch set. The problem is, by trying to use a value like "5 minutes" it > implies that we are going to need to track some extra state somewhere to > determine that value. > > An alternative is to essentially just slowly shrink memory for the guest. > We had some discussion about this in another thread, and the following > code example was brought up as a way to go about doing that: > > https://github.com/Conan-Kudo/omv-kernel-rc/blob/master/0154-sysctl-vm-Fine-grained-cache-shrinking.patch > > The idea is you essentially just slowly bleed the memory from the guest by > specifying some amount of MB of cache to be freed on some regular > interval. > Makes sense. Whatever API is settled on, I'd just propose that we allow the host to invoke it via the balloon device since the host has a host-global view of memory and can make decisions that an individual guest cannot. Alex, what is the status of your fine-grained-cache-shrinking patch? It seems like a really good idea. > Thanks. > > - Alex > >