From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.0 required=3.0 tests=BAYES_00,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E63C2C433E1 for ; Fri, 10 Jul 2020 12:29:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AF2C620748 for ; Fri, 10 Jul 2020 12:29:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AF2C620748 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2AE378D0003; Fri, 10 Jul 2020 08:29:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 25FDD8D0002; Fri, 10 Jul 2020 08:29:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1783B8D0003; Fri, 10 Jul 2020 08:29:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0246.hostedemail.com [216.40.44.246]) by kanga.kvack.org (Postfix) with ESMTP id 006C58D0002 for ; Fri, 10 Jul 2020 08:29:21 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 56466181AEF0B for ; Fri, 10 Jul 2020 12:29:21 +0000 (UTC) X-FDA: 77022096522.21.knot32_60102d926ece Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id 2D6C9180442D4 for ; Fri, 10 Jul 2020 12:29:21 +0000 (UTC) X-HE-Tag: knot32_60102d926ece X-Filterd-Recvd-Size: 5764 Received: from mail-ed1-f65.google.com (mail-ed1-f65.google.com [209.85.208.65]) by imf02.hostedemail.com (Postfix) with ESMTP for ; Fri, 10 Jul 2020 12:29:20 +0000 (UTC) Received: by mail-ed1-f65.google.com with SMTP id by13so4494722edb.11 for ; Fri, 10 Jul 2020 05:29:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Cgm+eurmiukbbtx74MhYfKemTBTq6ztp8WelN9Ojqdk=; b=DCEY3I4B2rqKftr5Tsx7RrVLUqUi3Re4M7j3aIHTv9B/TsnwPxU7XR3BAcWCbU0JAM rQ9OUcsRDAOMuX63Ya0IhruzuY4kJzyQk/ju6ZnfF8YJ0Dj+g+HGSnCJV6JmH7YhFM7S 0nO3A8lst8/TK1HHPB1we++Iwvg9PMlDHm3vT7AhmkZunUEwkB9gsUqIBsP2Sn+Gpxo8 iZMu558u57sEEyVTfdhI30xe82RPRRYHCACerjbHuBI7WAGLpQIKpjbUaFeZrvrCGhH7 j7fVtFv7e+qWznS+yzxG3npVHYm0h88jPks+hykXJne1MXxnnX2kcFLDMpdhpMi9aMDT kdbg== X-Gm-Message-State: AOAM530vT9zYcMHc/hgMy4uaQ8MOH9tSFS24UJg0nBzfeq0MErwd0gY5 TnzC5GVeUhjb3MBeVObPCDY= X-Google-Smtp-Source: ABdhPJy7KmfajTBKfpZvajV7rh56847Wmcn4VoMmjn7rSC+KMwoWRhzPWnQStOUD/st0L7QVkoqWyA== X-Received: by 2002:aa7:c883:: with SMTP id p3mr79304897eds.128.1594384159754; Fri, 10 Jul 2020 05:29:19 -0700 (PDT) Received: from localhost (ip-37-188-148-171.eurotel.cz. [37.188.148.171]) by smtp.gmail.com with ESMTPSA id w24sm4199587edt.28.2020.07.10.05.29.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Jul 2020 05:29:19 -0700 (PDT) Date: Fri, 10 Jul 2020 14:29:17 +0200 From: Michal Hocko To: Roman Gushchin Cc: Andrew Morton , Johannes Weiner , Shakeel Butt , linux-mm@kvack.org, kernel-team@fb.com, linux-kernel@vger.kernel.org, Domas Mituzas , Tejun Heo , Chris Down Subject: Re: [PATCH] mm: memcontrol: avoid workload stalls when lowering memory.high Message-ID: <20200710122917.GB3022@dhcp22.suse.cz> References: <20200709194718.189231-1-guro@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200709194718.189231-1-guro@fb.com> X-Rspamd-Queue-Id: 2D6C9180442D4 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu 09-07-20 12:47:18, Roman Gushchin wrote: > Memory.high limit is implemented in a way such that the kernel > penalizes all threads which are allocating a memory over the limit. > Forcing all threads into the synchronous reclaim and adding some > artificial delays allows to slow down the memory consumption and > potentially give some time for userspace oom handlers/resource control > agents to react. > > It works nicely if the memory usage is hitting the limit from below, > however it works sub-optimal if a user adjusts memory.high to a value > way below the current memory usage. It basically forces all workload > threads (doing any memory allocations) into the synchronous reclaim > and sleep. This makes the workload completely unresponsive for > a long period of time and can also lead to a system-wide contention on > lru locks. It can happen even if the workload is not actually tight on > memory and has, for example, a ton of cold pagecache. > > In the current implementation writing to memory.high causes an atomic > update of page counter's high value followed by an attempt to reclaim > enough memory to fit into the new limit. To fix the problem described > above, all we need is to change the order of execution: try to push > the memory usage under the limit first, and only then set the new > high limit. Shakeel would this help with your pro-active reclaim usecase? It would require to reset the high limit right after the reclaim returns which is quite ugly but it would at least not require a completely new interface. You would simply do high = current - to_reclaim echo $high > memory.high echo infinity > memory.high # To prevent direct reclaim # allocation stalls The primary reason to set the high limit in advance was to catch potential runaways more effectively because they would just get throttled while memory_high_write is reclaiming. With this change the reclaim here might be just playing never ending catch up. On the plus side a break out from the reclaim loop would just enforce the limit so if the operation takes too long then the reclaim burden will move over to consumers eventually. So I do not see any real danger. > Signed-off-by: Roman Gushchin > Reported-by: Domas Mituzas > Cc: Johannes Weiner > Cc: Michal Hocko > Cc: Tejun Heo > Cc: Shakeel Butt > Cc: Chris Down Acked-by: Michal Hocko > --- > mm/memcontrol.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index b8424aa56e14..4b71feee7c42 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -6203,8 +6203,6 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, > if (err) > return err; > > - page_counter_set_high(&memcg->memory, high); > - > for (;;) { > unsigned long nr_pages = page_counter_read(&memcg->memory); > unsigned long reclaimed; > @@ -6228,6 +6226,8 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, > break; > } > > + page_counter_set_high(&memcg->memory, high); > + > return nbytes; > } > > -- > 2.26.2 -- Michal Hocko SUSE Labs