From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8F679C3DA49
	for <linux-mm@archiver.kernel.org>; Tue, 23 Jul 2024 14:30:00 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id EB9836B00A7; Tue, 23 Jul 2024 10:29:59 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id E6A2F6B00A8; Tue, 23 Jul 2024 10:29:59 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id D0A386B00A9; Tue, 23 Jul 2024 10:29:59 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11])
	by kanga.kvack.org (Postfix) with ESMTP id B211C6B00A7
	for <linux-mm@kvack.org>; Tue, 23 Jul 2024 10:29:59 -0400 (EDT)
Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay08.hostedemail.com (Postfix) with ESMTP id 10A3E140155
	for <linux-mm@kvack.org>; Tue, 23 Jul 2024 14:29:59 +0000 (UTC)
X-FDA: 82371251718.13.14FE68B
Received: from mail-ot1-f48.google.com (mail-ot1-f48.google.com [209.85.210.48])
	by imf01.hostedemail.com (Postfix) with ESMTP id CA42F40023
	for <linux-mm@kvack.org>; Tue, 23 Jul 2024 14:29:56 +0000 (UTC)
Authentication-Results: imf01.hostedemail.com;
	dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=hvsurPhj;
	spf=pass (imf01.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.210.48 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org;
	dmarc=pass (policy=none) header.from=cmpxchg.org
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1721744951;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=pOAklqx8oFvo8x3kYelMrVZA/e8PS7D7KCs7XOSGQQg=;
	b=0xnw6jzSFtGFhetYwJ7JsmF8pLrFA4FZqdeh9b0gb57wFw5aTA2I3GgxN6xTxABt767Tyf
	PsWZNbTFcwVcGo+HSg+e1gElkEeTTGQT/xMyPJyaG90U5bPdtHTQklF+2GBZYkXjpW1q8j
	ThY5+pBm2CcpVA6sRkXmmsLdI6+SYqQ=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721744951; a=rsa-sha256;
	cv=none;
	b=vtUoiwoQZYhkADKTyPGLVQruHa5UbFFp4YAwK0hy1PaCyzkev3OHYY4oXqW9w3AI7rppRM
	YxWobWV6WypuiZ68kaxXfWTX6+3rew50GxxCB+W1iyYkHRuPvsBfK1sJF95C+oFmH5mkEu
	p95XoJWQi2tuFxcLT1/TjWTLGP54Eyo=
ARC-Authentication-Results: i=1;
	imf01.hostedemail.com;
	dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=hvsurPhj;
	spf=pass (imf01.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.210.48 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org;
	dmarc=pass (policy=none) header.from=cmpxchg.org
Received: by mail-ot1-f48.google.com with SMTP id 46e09a7af769-7091558067eso703988a34.3
        for <linux-mm@kvack.org>; Tue, 23 Jul 2024 07:29:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1721744995; x=1722349795; darn=kvack.org;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to;
        bh=pOAklqx8oFvo8x3kYelMrVZA/e8PS7D7KCs7XOSGQQg=;
        b=hvsurPhj+pLk3PfsyWR/46C/7Hnz+/7JCinBbCJEdO7Icy4wX55nJ2boXIBnKhnAy2
         HGitkoF++cTo8H6SBdxpkMkdlrPa3s3fFupxt3c6R458amt8+MaB8MMQK9Yh+qWNH+7J
         LRKrXHzdfQpTPmqHt8IgejOH1Sb+qKa8ZyF28WY2tiSbFG0Zjh37Mx+FBLlQnVe3BfXQ
         DFKsEQAb/55/lLMIfYs9qI149BvomloOTQkHpRc9wzv2p657sFx1OriCA0xXglE29MQX
         HFZCEWaBApKJqA0IjTUNUjJtImjQtrjz1UhVoyrVQSY/bGloShNRs7BKd493QyYPTlM5
         HoIw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1721744995; x=1722349795;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date
         :message-id:reply-to;
        bh=pOAklqx8oFvo8x3kYelMrVZA/e8PS7D7KCs7XOSGQQg=;
        b=CSsWhFTEcLqcrgJzkWJoFrKB7j3hGcfaX+BcBbZdGlFQ3Scu/9QEM7tlUNPMFN+/3l
         +Z0B5UCsSPgzVs31sbFhj/bK/LbR6NMQQci0shIdixE/z4CMImdjskATVkbWEgVJbhqt
         MTXslN2pFvBOJXJvDkRYC9XhNFuqhP8DhORCPasodc6W0eITidcEKSE+cTKwvYQRQaJO
         9ZmB4ZTNNVIyVVc9Bx23tQxvx+c3KSK03OlbmglKbkurhyu7QdXH7VQlBISUQkZ7NDzF
         3494Qu/g7R9nsTK7dO44QDm4npMPvJncSCa6jHL/h37A1GOlc6aufGJ04JGW+B5iPR5P
         gLvg==
X-Forwarded-Encrypted: i=1; AJvYcCX6U+k87Bo431VWmowcpaQiF2ujeIR470buqzsATHla31yJE5cmieVLTq8agoSqUkYdG/tnEf8IzSBcO8zzgXRxp3c=
X-Gm-Message-State: AOJu0Yw5iLopD8VVgpxzVY+mzzVhfAA1tj5RFap6aLxqL66SgQPvVBcP
	1F5XNBFAru8Et/OSr32vpV4uYdEpIOc2T/NIzCryKh3OiAx3eM39MrYW59Q0epQ=
X-Google-Smtp-Source: AGHT+IF3YHQUIJ8+SrZCGjkIAJzBT6BuX2rBtMjYzpxW0sLMa2dAZBOXVVA0gqNSq/4rVrK8BpqhKQ==
X-Received: by 2002:a05:6830:6681:b0:703:884d:2fc7 with SMTP id 46e09a7af769-7091817df13mr4552774a34.24.1721744995346;
        Tue, 23 Jul 2024 07:29:55 -0700 (PDT)
Received: from localhost ([2603:7000:c01:2716:da5e:d3ff:fee7:26e7])
        by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6b7acb07fd0sm47602486d6.130.2024.07.23.07.29.54
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 23 Jul 2024 07:29:54 -0700 (PDT)
Date: Tue, 23 Jul 2024 10:29:50 -0400
From: Johannes Weiner <hannes@cmpxchg.org>
To: David Finkel <davidf@vimeo.com>
Cc: Muchun Song <muchun.song@linux.dev>, Tejun Heo <tj@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>, core-services@vimeo.com,
	Jonathan Corbet <corbet@lwn.net>, Michal Hocko <mhocko@kernel.org>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Shuah Khan <shuah@kernel.org>, Zefan Li <lizefan.x@bytedance.com>,
	cgroups@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
	Waiman Long <longman@redhat.com>
Subject: Re: [PATCH 1/2] mm, memcg: cg2 memory{.swap,}.peak write handlers
Message-ID: <20240723142950.GA389003@cmpxchg.org>
References: <20240722235554.2911971-1-davidf@vimeo.com>
 <20240722235554.2911971-2-davidf@vimeo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20240722235554.2911971-2-davidf@vimeo.com>
X-Rspamd-Queue-Id: CA42F40023
X-Stat-Signature: txu9ydewppzk8kndas5yr61is3uzcoda
X-Rspamd-Server: rspam09
X-Rspam-User: 
X-HE-Tag: 1721744996-280275
X-HE-Meta: U2FsdGVkX19tiHxZtHsl7E2L8sWzAF+lKjaUIp6SSmmMkvIJdJcbegnjbJ9pL3C9QxkdAji0dQSy/3fsGLk8bLIpZ3z6D0nBaJZArNANpCeo8v+seQrdAiR8XNqRDuhTDwbSWAzVn7GsS75s97sSilxBt65nNOF4JaYhifUykt6DSnTswJ7iXKOQEi48DfclzKTCt/VTzqj/d13pXkggPUYDwIdIxfLnFnOvqYvlqw1iESlNdbm9oMA5Ky3skOMPHEPobtvwdF6iA5HmY9sYgKzFT9c81/z+b2WwehdRMy+KVAkWZ+ZEN86aEcCz6OamY07Xxcszwq62t2AGObZvCtBdwEVEi7TCtCkBGL3j95Url5LwlMjxXgrIERGI5TutcvTZMjSBsTLhempuSfK4dQCXmIBHOn/+v7Q1yv68v+e4mhHo/JWcnUXyFWGngvanwoerme57+D/htpumBIVcucU4rP3dE+0MEXB5g9B5nu0IX2pBYKY/FDdcQc7sIuXZ1t0mRxZuSnfeg6riFz6IzkvkODqTNS5652ZLRNoHpzLmmHIvThPb+3wV8UEZc128L6lw2vz84P7iJLQ77HseZPB1/OLrXPuvZ5kPLSzUWv5cMthfcTKksmNnWvykKyQ7myf9RIkxFS1kWbqmyTZQMD6u1skLbxdL87p114CDNwcV4fiqPQlWCYMJ7LN5ibH2AI9ehobqoNL5Syatk4YqhwUoc1eYh8YL45/46MDKPs95FFYhUrHipnhS1ijTuu3MFhVwJI1wM2+IhZnsYTRXdcbiRUaJ/GvAp5YaRXJfnFuYjafmNdOR1P7N9KsIFR6LV3ZtMxUmxp6UJS1wThvDyM1Uya+ZSly4iHL3FQpBavvmI243QXmzV6RZ+iWS63c33j/WOI8TxLw4X+UxHHt8rigaBC6pa01kTbJyEbHJU1ObPJieu9Xn6TetCn+wq9I91QsAxZWKeAy1ZUPOMxr
 QkcWe4lP
 sTqsR8/IKv8Ch8dWbgu2Rm2fc11D6Jug2RVdEwYkOyow5gyDFMYLj2OENbrCat8+ZX+4Unu3UWv0dTII+TRtFfi4O0p27OQtnXOuBQjiQZWaF+FIKXhRmOIrxLr1adx3HunAUuJbXk6nRkz1/Cx39LiPubvgbaIU2wXhCvu3hxocrQmZ77nArllrwjd7SB4J/G7zRJd53qOlLz472icAH7fgXJLnKem2/5xFzJT2l23S9VfdiwWFc/2biauyY31FFwGFl7v4FN89TbMjFwOq6MwPS/UwW7YywKRA770u7G4Y+4Fga9ZHpSNovnOZMy3D/vsYRUFWcP4DYZ7wRh7RUOgBqW8ux6Cv59yrBrgY/bvUge0bAlkxd/7hBbQtmx6FeAOln5vMp7eMnv2Ux6gniCVBrpFw/CNhh4EolEf+ElCFnJlDI6vHyioDP9A==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

Hi David,

thanks for pursuing this! A couple of comments below.

On Mon, Jul 22, 2024 at 07:55:53PM -0400, David Finkel wrote:
> @@ -1322,11 +1322,16 @@ PAGE_SIZE multiple when read back.
>  	reclaim induced by memory.reclaim.
>  
>    memory.peak
> -	A read-only single value file which exists on non-root
> -	cgroups.
> +	A read-write single value file which exists on non-root cgroups.
> +
> +	The max memory usage recorded for the cgroup and its descendants since
> +	either the creation of the cgroup or the most recent reset for that FD.
>  
> -	The max memory usage recorded for the cgroup and its
> -	descendants since the creation of the cgroup.
> +	A write of the string "reset" to this file resets it to the
> +	current memory usage for subsequent reads through the same
> +	file descriptor.
> +	Attempts to write any other non-empty string will return EINVAL
> +	(modulo leading and trailing whitespace).

Why not allow any write to reset? This makes it harder to use, and I'm
not sure accidental writes are a likely mistake to make.

> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 2150ca60394b..7001ed74e339 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -12,6 +12,7 @@
>  #include <linux/sched.h>
>  #include <linux/cpumask.h>
>  #include <linux/nodemask.h>
> +#include <linux/list.h>
>  #include <linux/rculist.h>
>  #include <linux/cgroupstats.h>
>  #include <linux/fs.h>
> @@ -855,4 +856,11 @@ static inline void cgroup_bpf_put(struct cgroup *cgrp) {}
>  
>  struct cgroup *task_get_cgroup1(struct task_struct *tsk, int hierarchy_id);
>  
> +struct memcg_peak_mem_ctx {
> +	long				local_watermark;
> +	struct list_head		peers;
> +};

Since this is generic cgroup code, and can be conceivably used by
other controllers, let's keep the naming generic as well. How about:

struct cgroup_of_peak {
	long			value;
	struct list_head	list;
};

cgroup-defs.h would be a better place for it.

> +struct memcg_peak_mem_ctx *memcg_extract_peak_mem_ctx(struct kernfs_open_file *of);

of_peak()

> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 030d34e9d117..cbc390234605 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -198,6 +198,11 @@ struct mem_cgroup {
>  	struct page_counter kmem;		/* v1 only */
>  	struct page_counter tcpmem;		/* v1 only */
>  
> +	/* lists of memcg peak watching contexts on swap and memory */
> +	struct list_head peak_memory_local_watermark_watchers;
> +	struct list_head peak_swap_local_watermark_watchers;
> +	spinlock_t swap_memory_peak_watchers_lock;

These names are too long. How about:

	/* Registered local usage peak watchers */
	struct list_head	memory_peaks;
	struct list_head	swap_peaks;
	spinlock_t		peaks_lock;

> diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h
> index 8cd858d912c4..06bb84218960 100644
> --- a/include/linux/page_counter.h
> +++ b/include/linux/page_counter.h
> @@ -26,6 +26,7 @@ struct page_counter {
>  	atomic_long_t children_low_usage;
>  
>  	unsigned long watermark;
> +	unsigned long local_watermark; /* track min of fd-local resets */
>  	unsigned long failcnt;
>  
>  	/* Keep all the read most fields in a separete cacheline. */
> @@ -78,7 +79,15 @@ int page_counter_memparse(const char *buf, const char *max,
>  
>  static inline void page_counter_reset_watermark(struct page_counter *counter)
>  {
> -	counter->watermark = page_counter_read(counter);
> +	unsigned long cur = page_counter_read(counter);

cur -> usage

> @@ -6907,12 +6912,109 @@ static u64 memory_current_read(struct cgroup_subsys_state *css,
>  	return (u64)page_counter_read(&memcg->memory) * PAGE_SIZE;
>  }
>  
> -static u64 memory_peak_read(struct cgroup_subsys_state *css,
> -			    struct cftype *cft)
> +inline int swap_memory_peak_show(
> +	struct seq_file *sf, void *v, bool swap_cg)
>  {

Leave inlining to the compiler. Just static int.

The name can be simply peak_show().

Customary coding style is to line wrap at the last parameter that
fits. Don't wrap if the line fits within 80 cols.

static int peak_show(struct seq_file *sf, void *v, ...,
		     ...)
{
	...
}

> +	struct cgroup_subsys_state *css = seq_css(sf);
>  	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
> +	struct page_counter *pc;
> +	struct kernfs_open_file *of = sf->private;
> +	struct memcg_peak_mem_ctx *ctx = memcg_extract_peak_mem_ctx(of);
> +	s64 fd_peak = ctx->local_watermark;
>  
> -	return (u64)memcg->memory.watermark * PAGE_SIZE;
> +	if (swap_cg)
> +		pc = &memcg->swap;
> +	else
> +		pc = &memcg->memory;
> +
> +	if (fd_peak == -1) {
> +		seq_printf(sf, "%llu\n", (u64)pc->watermark * PAGE_SIZE);
> +		return 0;
> +	}
> +
> +	s64 pc_peak = pc->local_watermark;
> +	s64 wm = fd_peak > pc_peak ? fd_peak : pc_peak;
> +
> +	seq_printf(sf, "%lld\n", wm * PAGE_SIZE);
> +	return 0;
> +}

As per Roman's feedback, don't mix decls and code.

You can simplify it by extracting css and memcg in the callers, then
pass the right struct page counter *pc directly.

That should eliminate most local variables as well.

static int peak_show(struct seq_file *sf, void *v, struct page_counter *pc)
{
	struct cgroup_of_peak *ofp = of_peak(sf->private);
	u64 peak;

	/* User wants global or local peak? */
	if (ofp->value == -1)
		peak = pc->watermark;
	else
		peak = max(ofp->value, pc->local_watermark);

	seq_printf(sf, "%lld\n", peak * PAGE_SIZE);
}

> +static int memory_peak_show(struct seq_file *sf, void *v)
> +{
> +	return swap_memory_peak_show(sf, v, false);

And then do:

	struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(sf));

	return peak_show(sf, v, &memcg->memory);

Then do the same with ... &memcg->swap.

> +inline ssize_t swap_memory_peak_write(
> +	struct kernfs_open_file *of,
> +	char *buf, size_t nbytes, loff_t off, bool swap_cg)
> +{

Same feedback as above. Please don't inline explicitly (unless it
really is measurably a performance improvement in a critical path),
and stick to surrounding coding style.

Here too, pass page_counter directly and save the branches.

> +	unsigned long cur;
> +	struct memcg_peak_mem_ctx *peer_ctx;
> +	struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
> +	struct memcg_peak_mem_ctx *ctx = memcg_extract_peak_mem_ctx(of);
> +	struct page_counter *pc;
> +	struct list_head *watchers, *pos;
> +
> +	buf = strstrip(buf);
> +	/* Only allow "reset" to keep the API clear */
> +	if (strcmp(buf, "reset"))
> +		return -EINVAL;
> +
> +	if (swap_cg) {
> +		pc = &memcg->swap;
> +		watchers = &memcg->peak_swap_local_watermark_watchers;
> +	} else {
> +		pc = &memcg->memory;
> +		watchers = &memcg->peak_memory_local_watermark_watchers;
> +	}
> +
> +	spin_lock(&memcg->swap_memory_peak_watchers_lock);
> +
> +	page_counter_reset_local_watermark(pc);
> +	cur = pc->local_watermark;
> +
> +	list_for_each(pos, watchers) {

	list_for_each_entry()

> +		peer_ctx = list_entry(pos, typeof(*ctx), peers);
> +		if (cur > peer_ctx->local_watermark)
> +			peer_ctx->local_watermark = cur;
> +	}

I don't think this is quite right. local_peak could be higher than the
current usage when a new watcher shows up. The other watchers should
retain the higher local_peak, not the current usage.

> +
> +	if (ctx->local_watermark == -1)
> +		/* only append to the list if we're not already there */
> +		list_add_tail(&ctx->peers, watchers);
> +
> +	ctx->local_watermark = cur;

This makes me think that page_counter_reset_local_watermark() is not a
good helper. It obscures what's going on. Try without it.

AFAICS the list ordering doesn't matter, so keep it simple and use a
plain list_add().

	/*
	 * A new local peak is being tracked in pc->local_watermark.
	 * Save current local peak in all watchers.
	 */
	list_for_each_entry(pos, ...)
		if (pc->local_watermark > pos->value)
			pos->value = pc->local_watermark;

	pc->local_watermark = page_counter_read(pc);

	/* Initital write, register watcher */
	if (ofp->value == -1)
		list_add()

	ofp->value = pc->local_watermark;

> diff --git a/mm/page_counter.c b/mm/page_counter.c
> index db20d6452b71..724d31508664 100644
> --- a/mm/page_counter.c
> +++ b/mm/page_counter.c
> @@ -79,9 +79,22 @@ void page_counter_charge(struct page_counter *counter, unsigned long nr_pages)
>  		/*
>  		 * This is indeed racy, but we can live with some
>  		 * inaccuracy in the watermark.
> +		 *
> +		 * Notably, we have two watermarks to allow for both a globally
> +		 * visible peak and one that can be reset at a smaller scope.
> +		 *
> +		 * Since we reset both watermarks when the global reset occurs,
> +		 * we can guarantee that watermark >= local_watermark, so we
> +		 * don't need to do both comparisons every time.
> +		 *
> +		 * On systems with branch predictors, the inner condition should
> +		 * be almost free.
>  		 */
> -		if (new > READ_ONCE(c->watermark))
> -			WRITE_ONCE(c->watermark, new);
> +		if (new > READ_ONCE(c->local_watermark)) {
> +			WRITE_ONCE(c->local_watermark, new);
> +			if (new > READ_ONCE(c->watermark))
> +				WRITE_ONCE(c->watermark, new);
> +		}
>  	}
>  }
>  
> @@ -131,10 +144,23 @@ bool page_counter_try_charge(struct page_counter *counter,
>  		propagate_protected_usage(c, new);
>  		/*
>  		 * Just like with failcnt, we can live with some
> -		 * inaccuracy in the watermark.
> +		 * inaccuracy in the watermarks.
> +		 *
> +		 * Notably, we have two watermarks to allow for both a globally
> +		 * visible peak and one that can be reset at a smaller scope.
> +		 *
> +		 * Since we reset both watermarks when the global reset occurs,
> +		 * we can guarantee that watermark >= local_watermark, so we
> +		 * don't need to do both comparisons every time.
> +		 *
> +		 * On systems with branch predictors, the inner condition should
> +		 * be almost free.

		/* See comment in page_counter_charge() */

> diff --git a/tools/testing/selftests/cgroup/cgroup_util.c b/tools/testing/selftests/cgroup/cgroup_util.c
> index 432db923bced..1e2d46636a0c 100644
> --- a/tools/testing/selftests/cgroup/cgroup_util.c
> +++ b/tools/testing/selftests/cgroup/cgroup_util.c
> @@ -141,6 +141,16 @@ long cg_read_long(const char *cgroup, const char *control)
>  	return atol(buf);
>  }

This should be in patch #2.