linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Anton Vorontsov <cbouatmailru@gmail.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
	Pekka Enberg <penberg@kernel.org>, Mel Gorman <mgorman@suse.de>,
	Leonid Moiseichuk <leonid.moiseichuk@nokia.com>,
	KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
	Minchan Kim <minchan@kernel.org>,
	Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>,
	John Stultz <john.stultz@linaro.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linaro-kernel@lists.linaro.org, patches@linaro.org,
	kernel-team@android.com, linux-man@vger.kernel.org
Subject: Re: [RFC v3 0/3] vmpressure_fd: Linux VM pressure notifications
Date: Wed, 14 Nov 2012 19:59:52 -0800 (PST)	[thread overview]
Message-ID: <alpine.DEB.2.00.1211141946370.14414@chino.kir.corp.google.com> (raw)
In-Reply-To: <20121115033932.GA15546@lizard.sbx05977.paloaca.wayport.net>

On Wed, 14 Nov 2012, Anton Vorontsov wrote:

> > I agree that eventfd is the way to go, but I'll also add that this feature 
> > seems to be implemented at a far too coarse of level.  Memory, and hence 
> > memory pressure, is constrained by several factors other than just the 
> > amount of physical RAM which vmpressure_fd is addressing.  What about 
> > memory pressure caused by cpusets or mempolicies?  (Memcg has its own 
> > reclaim logic
> 
> Yes, sure, and my plan for per-cgroups vmpressure was to just add the same
> hooks into cgroups reclaim logic (as far as I understand, we can use the
> same scanned/reclaimed ratio + reclaimer priority to determine the
> pressure).
> 

I don't understand, how would this work with cpusets, for example, with 
vmpressure_fd as defined?  The cpuset policy is embedded in the page 
allocator and skips over zones that are not allowed when trying to find a 
page of the specified order.  Imagine a cpuset bound to a single node that 
is under severe memory pressure.  The reclaim logic will get triggered and 
cause a notification on your fd when the rest of the system's nodes may 
have tons of memory available.  So now an application that actually is 
using this interface and is trying to be a good kernel citizen decides to 
free caches back to the kernel, start ratelimiting, etc, when it actually 
doesn't have any memory allocated on the nearly-oom cpuset so its memory 
freeing doesn't actually achieve anything.

Rather, I think it's much better to be notified when an individual process 
invokes various levels of reclaim up to and including the oom killer so 
that we know the context that memory freeing needs to happen (or, 
optionally, the set of processes that could be sacrificed so that this 
higher priority process may allocate memory).

> > and its own memory thresholds implemented on top of eventfd 
> > that people already use.)  These both cause high levels of reclaim within 
> > the page allocator whereas there may be an abundance of free memory 
> > available on the system.
> 
> Yes, surely global-level vmpressure should be separate for the per-cgroup
> memory pressure.
> 

I disagree, I think if you have a per-thread memory pressure notification 
if and when it starts down the page allocator slowpath, through the 
various states of reclaim (perhaps on a scale of 0-100 as described), and 
including the oom killer that you can target eventual memory freeing that 
actually is useful.

> But we still want the "global vmpressure" thing, so that we could use it
> without cgroups too. How to do it -- syscall or sysfs+eventfd doesn't
> matter much (in the sense that I can do eventfd thing if you folks like it
> :).
> 

Most processes aren't going to care if they are running into memory 
pressure and have no implementation to free memory back to the kernel or 
start ratelimiting themselves.  They will just continue happily along 
until they get the memory they want or they get oom killed.  The ones that 
do, however, or a job scheduler or monitor that is watching over the 
memory usage of a set of tasks, will be able to do something when 
notified.

In the hopes of a single API that can do all this and not a 
reimplementation for various types of memory limitations (it seems like 
what you're suggesting is at least three different APIs: system-wide via 
vmpressure_fd, memcg via memcg thresholds, and cpusets through an eventual 
cpuset threshold), I'm hoping that we can have a single interface that can 
be polled on to determine when individual processes are encountering 
memory pressure.  And if I'm not running in your oom cpuset, I don't care 
about your memory pressure.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-11-15  3:59 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-07 10:53 Anton Vorontsov
2012-11-07 11:01 ` [RFC 1/3] mm: Add " Anton Vorontsov
2012-11-08 17:01   ` Mel Gorman
2012-11-08 17:14     ` Kirill A. Shutemov
2012-11-13 18:38   ` Jonathan Corbet
2012-11-07 11:01 ` [RFC 2/3] tools/testing: Add vmpressure-test utility Anton Vorontsov
2012-11-07 11:01 ` [RFC 3/3] man-pages: Add man page for vmpressure_fd(2) Anton Vorontsov
2012-11-07 14:19   ` Rik van Riel
2012-11-20  5:52   ` Andrew Morton
2012-11-20  6:24     ` Anton Vorontsov
2012-11-20 18:12       ` David Rientjes
2012-11-21 15:01         ` Mel Gorman
2012-11-21 19:39           ` Andrew Morton
2012-11-22  8:52             ` Pekka Enberg
2012-11-07 11:21 ` [RFC v3 0/3] vmpressure_fd: Linux VM pressure notifications Kirill A. Shutemov
2012-11-07 11:28   ` Pekka Enberg
2012-11-07 11:43     ` Kirill A. Shutemov
2012-11-15  3:21       ` David Rientjes
2012-11-15  3:39         ` Anton Vorontsov
2012-11-15  3:59           ` David Rientjes [this message]
2012-11-15  7:34             ` Anton Vorontsov
2012-11-15  8:11               ` David Rientjes
2012-11-15  8:52                 ` Anton Vorontsov
2012-11-15 21:25                   ` David Rientjes
2012-11-16  9:33                     ` Glauber Costa
2012-11-16 20:04                       ` David Rientjes
2012-11-16 21:12                         ` Glauber Costa
2012-11-16 21:57                           ` David Rientjes
2012-11-17  1:21                             ` Anton Vorontsov
2012-11-18 22:53                               ` David Rientjes
2012-11-19 14:00                               ` Glauber Costa
2012-11-19 13:57                             ` Glauber Costa
2012-11-20 18:02                               ` David Rientjes
2012-11-21  9:30                                 ` Kirill A. Shutemov
2012-11-21 11:32                                   ` leonid.moiseichuk
2012-11-21 11:54                                     ` Glauber Costa
2012-11-21 13:48                                       ` leonid.moiseichuk
2012-11-26 21:35                                 ` Michal Hocko
2012-11-19 14:19                             ` Glauber Costa
2012-11-20 18:23                               ` David Rientjes
2012-11-21  8:27                                 ` Glauber Costa
2012-11-21  8:46                                   ` Anton Vorontsov
2012-11-21  9:25                                     ` Glauber Costa
2012-11-07 11:43   ` Anton Vorontsov
2012-11-07 12:11     ` Kirill A. Shutemov
2012-11-07 12:28       ` Anton Vorontsov
2012-11-07 17:20   ` Greg Thelen
2012-11-07 20:52     ` Pekka Enberg
2012-11-07 11:30 ` Pekka Enberg
2012-11-07 11:31   ` Pekka Enberg
2012-11-07 12:06   ` Anton Vorontsov
2012-11-09  8:32 ` Luiz Capitulino
2012-11-09  9:04   ` Anton Vorontsov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.00.1211141946370.14414@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=b.zolnierkie@samsung.com \
    --cc=cbouatmailru@gmail.com \
    --cc=john.stultz@linaro.org \
    --cc=kernel-team@android.com \
    --cc=kirill@shutemov.name \
    --cc=kosaki.motohiro@gmail.com \
    --cc=leonid.moiseichuk@nokia.com \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    --cc=patches@linaro.org \
    --cc=penberg@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox