From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=hCR/=AV=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-10.0 required=3.0 tests=BAYES_00,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E63C2C433E1
	for <linux-mm@archiver.kernel.org>; Fri, 10 Jul 2020 12:29:22 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id AF2C620748
	for <linux-mm@archiver.kernel.org>; Fri, 10 Jul 2020 12:29:22 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AF2C620748
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 2AE378D0003; Fri, 10 Jul 2020 08:29:22 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 25FDD8D0002; Fri, 10 Jul 2020 08:29:22 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 1783B8D0003; Fri, 10 Jul 2020 08:29:22 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0246.hostedemail.com [216.40.44.246])
	by kanga.kvack.org (Postfix) with ESMTP id 006C58D0002
	for <linux-mm@kvack.org>; Fri, 10 Jul 2020 08:29:21 -0400 (EDT)
Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay05.hostedemail.com (Postfix) with ESMTP id 56466181AEF0B
	for <linux-mm@kvack.org>; Fri, 10 Jul 2020 12:29:21 +0000 (UTC)
X-FDA: 77022096522.21.knot32_60102d926ece
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin21.hostedemail.com (Postfix) with ESMTP id 2D6C9180442D4
	for <linux-mm@kvack.org>; Fri, 10 Jul 2020 12:29:21 +0000 (UTC)
X-HE-Tag: knot32_60102d926ece
X-Filterd-Recvd-Size: 5764
Received: from mail-ed1-f65.google.com (mail-ed1-f65.google.com [209.85.208.65])
	by imf02.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Fri, 10 Jul 2020 12:29:20 +0000 (UTC)
Received: by mail-ed1-f65.google.com with SMTP id by13so4494722edb.11
        for <linux-mm@kvack.org>; Fri, 10 Jul 2020 05:29:20 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:date:from:to:cc:subject:message-id:references
         :mime-version:content-disposition:in-reply-to;
        bh=Cgm+eurmiukbbtx74MhYfKemTBTq6ztp8WelN9Ojqdk=;
        b=DCEY3I4B2rqKftr5Tsx7RrVLUqUi3Re4M7j3aIHTv9B/TsnwPxU7XR3BAcWCbU0JAM
         rQ9OUcsRDAOMuX63Ya0IhruzuY4kJzyQk/ju6ZnfF8YJ0Dj+g+HGSnCJV6JmH7YhFM7S
         0nO3A8lst8/TK1HHPB1we++Iwvg9PMlDHm3vT7AhmkZunUEwkB9gsUqIBsP2Sn+Gpxo8
         iZMu558u57sEEyVTfdhI30xe82RPRRYHCACerjbHuBI7WAGLpQIKpjbUaFeZrvrCGhH7
         j7fVtFv7e+qWznS+yzxG3npVHYm0h88jPks+hykXJne1MXxnnX2kcFLDMpdhpMi9aMDT
         kdbg==
X-Gm-Message-State: AOAM530vT9zYcMHc/hgMy4uaQ8MOH9tSFS24UJg0nBzfeq0MErwd0gY5
	TnzC5GVeUhjb3MBeVObPCDY=
X-Google-Smtp-Source: ABdhPJy7KmfajTBKfpZvajV7rh56847Wmcn4VoMmjn7rSC+KMwoWRhzPWnQStOUD/st0L7QVkoqWyA==
X-Received: by 2002:aa7:c883:: with SMTP id p3mr79304897eds.128.1594384159754;
        Fri, 10 Jul 2020 05:29:19 -0700 (PDT)
Received: from localhost (ip-37-188-148-171.eurotel.cz. [37.188.148.171])
        by smtp.gmail.com with ESMTPSA id w24sm4199587edt.28.2020.07.10.05.29.18
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 10 Jul 2020 05:29:19 -0700 (PDT)
Date: Fri, 10 Jul 2020 14:29:17 +0200
From: Michal Hocko <mhocko@kernel.org>
To: Roman Gushchin <guro@fb.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Shakeel Butt <shakeelb@google.com>, linux-mm@kvack.org,
	kernel-team@fb.com, linux-kernel@vger.kernel.org,
	Domas Mituzas <domas@fb.com>, Tejun Heo <tj@kernel.org>,
	Chris Down <chris@chrisdown.name>
Subject: Re: [PATCH] mm: memcontrol: avoid workload stalls when lowering
 memory.high
Message-ID: <20200710122917.GB3022@dhcp22.suse.cz>
References: <20200709194718.189231-1-guro@fb.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20200709194718.189231-1-guro@fb.com>
X-Rspamd-Queue-Id: 2D6C9180442D4
X-Spamd-Result: default: False [0.00 / 100.00]
X-Rspamd-Server: rspam05
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Thu 09-07-20 12:47:18, Roman Gushchin wrote:
> Memory.high limit is implemented in a way such that the kernel
> penalizes all threads which are allocating a memory over the limit.
> Forcing all threads into the synchronous reclaim and adding some
> artificial delays allows to slow down the memory consumption and
> potentially give some time for userspace oom handlers/resource control
> agents to react.
> 
> It works nicely if the memory usage is hitting the limit from below,
> however it works sub-optimal if a user adjusts memory.high to a value
> way below the current memory usage. It basically forces all workload
> threads (doing any memory allocations) into the synchronous reclaim
> and sleep. This makes the workload completely unresponsive for
> a long period of time and can also lead to a system-wide contention on
> lru locks. It can happen even if the workload is not actually tight on
> memory and has, for example, a ton of cold pagecache.
> 
> In the current implementation writing to memory.high causes an atomic
> update of page counter's high value followed by an attempt to reclaim
> enough memory to fit into the new limit. To fix the problem described
> above, all we need is to change the order of execution: try to push
> the memory usage under the limit first, and only then set the new
> high limit.

Shakeel would this help with your pro-active reclaim usecase? It would
require to reset the high limit right after the reclaim returns which is
quite ugly but it would at least not require a completely new interface.
You would simply do
	high = current - to_reclaim
	echo $high > memory.high
	echo infinity > memory.high # To prevent direct reclaim
				    # allocation stalls

The primary reason to set the high limit in advance was to catch
potential runaways more effectively because they would just get
throttled while memory_high_write is reclaiming. With this change
the reclaim here might be just playing never ending catch up. On the
plus side a break out from the reclaim loop would just enforce the limit
so if the operation takes too long then the reclaim burden will move
over to consumers eventually. So I do not see any real danger.

> Signed-off-by: Roman Gushchin <guro@fb.com>
> Reported-by: Domas Mituzas <domas@fb.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Shakeel Butt <shakeelb@google.com>
> Cc: Chris Down <chris@chrisdown.name>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/memcontrol.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index b8424aa56e14..4b71feee7c42 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6203,8 +6203,6 @@ static ssize_t memory_high_write(struct kernfs_open_file *of,
>  	if (err)
>  		return err;
>  
> -	page_counter_set_high(&memcg->memory, high);
> -
>  	for (;;) {
>  		unsigned long nr_pages = page_counter_read(&memcg->memory);
>  		unsigned long reclaimed;
> @@ -6228,6 +6226,8 @@ static ssize_t memory_high_write(struct kernfs_open_file *of,
>  			break;
>  	}
>  
> +	page_counter_set_high(&memcg->memory, high);
> +
>  	return nbytes;
>  }
>  
> -- 
> 2.26.2

-- 
Michal Hocko
SUSE Labs