From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 297F6C2BD09 for ; Mon, 24 Jun 2024 18:59:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B01036B0379; Mon, 24 Jun 2024 14:59:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AB06B6B037B; Mon, 24 Jun 2024 14:59:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 978936B0381; Mon, 24 Jun 2024 14:59:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 7A6F26B0379 for ; Mon, 24 Jun 2024 14:59:35 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 2769E1212A8 for ; Mon, 24 Jun 2024 18:59:35 +0000 (UTC) X-FDA: 82266695910.05.D0AAEC4 Received: from out-184.mta0.migadu.com (out-184.mta0.migadu.com [91.218.175.184]) by imf06.hostedemail.com (Postfix) with ESMTP id 862CF180007 for ; Mon, 24 Jun 2024 18:59:32 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Zob7r+dc; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf06.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.184 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719255561; a=rsa-sha256; cv=none; b=2bZP4U4r+CfC8E1nLO52wbR7BXFtony+LS2+PFUPHbD4R0FnZNOUcS97bVcnNz7aQKIXbf x9uBp+XUjc21rlSRnZC8D2Vz0owxbpLUp3EhmJgEbF3dGy48/kYkgz7ROoJeQheSO36iMn TFAkp/Nifmqe4fdKrZUuuo+0NLNBIYc= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Zob7r+dc; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf06.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.184 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719255561; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3xut/snTyFh3/c2AVt/fR731d9rDR0i/G+Mm1ffae0E=; b=Ms/iNGO9kEz4P760o41/2f3HDHShBbsYO/o6+BzKy1oWNqX/7beGtT/PFXlDp2pYe6maHT xBpArvRuHrKltRkDpel0JCSZx3PqKpyVz1Ql7f0gLqbujtBSvzq2v9QfZ6AWtiXoIy4xSp +oUjM9I8SYGOp5PS/p6AqQdAgrfN7Ys= X-Envelope-To: yosryahmed@google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719255570; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3xut/snTyFh3/c2AVt/fR731d9rDR0i/G+Mm1ffae0E=; b=Zob7r+dcq19p6uNhLmu/2Blc9Yx37pqRtHXXDCvV9L1I0hc2UNqJ4W8inmfytVdenwwsJy b3OTXDyBzEsakIsurpoiBDvnymCUjcRPNvHdQRz3akrW0cnFQ9w9wWUkzUujgsc7yASwuc G3WhvcReCngtgHiNO2kvGB/LfA+yKRA= X-Envelope-To: akpm@linux-foundation.org X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@suse.com X-Envelope-To: roman.gushchin@linux.dev X-Envelope-To: hawk@kernel.org X-Envelope-To: yuzhao@google.com X-Envelope-To: songmuchun@bytedance.com X-Envelope-To: kernel-team@meta.com X-Envelope-To: linux-mm@kvack.org X-Envelope-To: linux-kernel@vger.kernel.org Date: Mon, 24 Jun 2024 11:59:24 -0700 X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Yosry Ahmed Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Jesper Dangaard Brouer , Yu Zhao , Muchun Song , Facebook Kernel Team , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] memcg: use ratelimited stats flush in the reclaim Message-ID: References: <20240615081257.3945587-1-shakeel.butt@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 862CF180007 X-Stat-Signature: ybcp5s1akiyarh4wnsjypqpdfxwad49n X-Rspam-User: X-HE-Tag: 1719255572-687198 X-HE-Meta: U2FsdGVkX19PACPvVl5HaiC8oz39cZ4B+o5+Ji4f5Wivwxe3B3DNtHPibl3a23Z6VM7grvo3noLhm83Qqtnlpz0EOLJcZa2Hk20ZjE4Z4BFuChVqR25k1KrPMTTy6xHfF1APA479BVDD9BFeaN7NEq/3r6PTgUQ5AaZ8+tf52aJTM+vuEI3uRnJhRHccyfkVFws7bqIFVg1i7OP5FKCocwM6/cwjhimD6lVN/ha53zsLSFi1Tkgn/aqaYwHZYriXBQ0h8hQ5mGlzjBpimviCfPNKuVGWdyOqpmVog64jpLpvTRPLZOldsSTQqTP8U4FZbOIhGH58Bsio9RY/ghZSf/LEfHC3198hJTIaCCrUcHm9iQ4n3Hc5BjCHeQoORrW1MG7XNf+VFjP143vk9+1lA61TmY28Cfq70x379PnNhlFEpFwhF7x8KNcz5luo+yN+GFzokLbDgdltbw3X8A/FS2LmO7s4jsJPxfhRCfekToRuTLq5X2IZ+Wp9PKDWUYditCaMSbE1Db5wtcVauAEyzIB6WUHvq53IdtgPtGXqhsNGXxfXqP0Zt9/mBXB+GniEsscTSqxLz07yLw5wt7ijCbXYrKqHR3uuDOy93Oclqj1h6OrpOeX1OZTpk2E/1eow/K4K2OeHA8sG1J8yf90xoT2iyHYVGQsUAO1hL3w2oDEeJAhd2cyL3/oW/QIKMe9Sg7OT50B/Sq3NW2OZhZv4XLhPt5JoWZn3S39pcHwL7XjTMp/s5Zf02mjExeNxXFbXsYmHQcToKHVUAgEXZGNqrWhz/VcnaNGVGAriya553yL+zMKtArszyuvhQs25+OXIJlg6Lx/xBmLreuy+crp/Aok8Q522b46WIRiIi4VpuWuvhEOrxdGGZQGsbxjJDu9RcStdKHULIfleN9Sx9z7TXVuTnmT9l4lGndoZvL+zfk+DrcwJny7KwNYcWTsf9upS9kXJp+aSrrepD43+g4u 4+XNiwvX 1Dlr4262N6f/VPLn1rTektFbHK7EsGG+pH+kHz6DXdW+BuPVPyuq28TI5nB3QVqdiMKDm++XMMNIaPeJBvJzlPJxVTr2y+JISy/wJ2KJGqeBmxyPIUQerSdiGZ5iisi/G12lbXe8eBLSAInc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000005, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jun 24, 2024 at 10:15:38AM GMT, Yosry Ahmed wrote: > On Mon, Jun 24, 2024 at 10:02 AM Shakeel Butt wrote: > > > > On Mon, Jun 24, 2024 at 05:57:51AM GMT, Yosry Ahmed wrote: > > > > > and I will explain why below. I know it may be a necessary > > > > > evil, but I would like us to make sure there is no other option before > > > > > going forward with this. > > > > > > > > Instead of necessary evil, I would call it a pragmatic approach i.e. > > > > resolve the ongoing pain with good enough solution and work on long term > > > > solution later. > > > > > > It seems like there are a few ideas for solutions that may address > > > longer-term concerns, let's make sure we try those out first before we > > > fall back to the short-term mitigation. > > > > > > > Why? More specifically why try out other things before this patch? Both > > can be done in parallel. This patch has been running in production at > > Meta for several weeks without issues. Also I don't see how merging this > > would impact us on working on long term solutions. > > The problem is that once this is merged, it will be difficult to > change this back to a normal flush once other improvements land. We > don't have a test that reproduces the problem that we can use to make > sure it's safe to revert this change later, it's only using data from > prod. > I am pretty sure the work on long term solution would be iterative which will involve many reverts and redoing things differently. So, I think it is understandable that we may need to revert or revert the reverts. > Once this mitigation goes in, I think everyone will be less motivated > to get more data from prod about whether it's safe to revert the > ratelimiting later :) As I said I don't expect "safe in prod" as a strict requirement for a change. > > > > > [...] > > > > > > Thanks for explaining this in such detail. It does make me feel > > > better, but keep in mind that the above heuristics may change in the > > > future and become more sensitive to stale stats, and very likely no > > > one will remember that we decided that stale stats are fine > > > previously. > > > > > > > When was the last time this heuristic change? This heuristic was > > introduced in 2008 for anon pages and extended to file pages in 2016. In > > 2019 the ratio enforcement at 'reclaim root' was introduce. I am pretty > > sure we will improve the whole rstat flushing thing within a year or so > > :P > > Fair point, although I meant it's easy to miss that the flush is > ratelimited and the stats are potentially stale in general :) > > > > > > > > > > > For the cache trim mode, inactive file LRU size is read and the kernel > > > > scales it down based on the reclaim iteration (file >> sc->priority) and > > > > only checks if it is zero or not. Again precise information is not > > > > needed. > > > > > > It sounds like it is possible that we enter the cache trim mode when > > > we shouldn't if the stats are stale. Couldn't this lead to > > > over-reclaiming file memory? > > > > > > > Can you explain how this over-reclaiming file will happen? > > In one reclaim iteration, we could flush the stats, read the inactive > file LRU size, confirm that (file >> sc->priority) > 0 and enter the > cache trim mode, reclaiming file memory only. Let's assume that we > reclaimed enough file memory such that the condition (file >> > sc->priority) > 0 does not hold anymore. > > In a subsequent reclaim iteration, the flush could be skipped due to > ratelimiting. Now we will enter the cache trim mode again and reclaim > file memory only, even though the actual amount of file memory is low. > This will cause over-reclaiming from file memory and dismissing anon > memory that we should have reclaimed, which means that we will need > additional reclaim iterations to actually free memory. > > I believe this scenario would be possible with ratelimiting, right? > So, the (old_file >> sc->priority) > 0 is true but the (new_file >> sc->priority) > is false. In the next iteration, (old_file >> (sc->priority-1)) > 0 will still be true but somehow (new_file >> (sc->priority-1)) > 0 is false. It can happen if in the previous iteration, somehow kernel has reclaimed more than double what it was supposed to reclaim or there are concurrent reclaimers. In addition the nr_reclaim is still less than nr_to_reclaim and there is no file deactivation request. Yeah it can happen but a lot of wierd conditions need to happen concurrently for this to happen.