From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2CDE9C2BD09 for ; Mon, 24 Jun 2024 19:07:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B07806B00C4; Mon, 24 Jun 2024 15:07:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AB78E6B00E6; Mon, 24 Jun 2024 15:07:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 930B96B00F5; Mon, 24 Jun 2024 15:07:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 739C26B00C4 for ; Mon, 24 Jun 2024 15:07:08 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 37C7DC073D for ; Mon, 24 Jun 2024 19:07:08 +0000 (UTC) X-FDA: 82266714936.20.FEC7847 Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) by imf11.hostedemail.com (Postfix) with ESMTP id 658F84001E for ; Mon, 24 Jun 2024 19:07:06 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=LuFnq8ye; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf11.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719256015; a=rsa-sha256; cv=none; b=WW175K0PEZcah22CT7MjJ9dDwRTVbw/fpjvAqrM5dQz0gkkXqn0T90gYZ+zxYHSz9j5XW6 0vuSCd+noIRyxl/ZVAb+uG4ZpKSC2RxySHx+RtJ/O0n2T+AfKX0ZGtLM3kzObRpuThmuRl ANDy3Vc0lnfR9m03FvAhICzUmpmbGoo= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=LuFnq8ye; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf11.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719256015; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lR62gI/1SAvRGFQwjUeGp5M8TTVr01/DKqtXz/J0W10=; b=B+6X6TA3zVAVnh69xTnkYAQyD3Bfun1Llwt9hYid4tbhzolp5oqyToEQwBtVhrHoyArZ+d B0mrChdkJWwnWxCMxweMUFdQ9BCRFm/oERRPjyiUT1sCijjygcstOEUdEhuXaZlicbze4P EU8zRylwlmeeJxushJBiKW347czaPIo= Received: by mail-ej1-f49.google.com with SMTP id a640c23a62f3a-a72517e6225so188838466b.0 for ; Mon, 24 Jun 2024 12:07:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1719256025; x=1719860825; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=lR62gI/1SAvRGFQwjUeGp5M8TTVr01/DKqtXz/J0W10=; b=LuFnq8yeUcnyxnjTdx3RSk+XW//5Ka6MLi0tFT3XrzMHuyxSrLE8NDTQZIvrG+dIfZ DGxLWRfYI2uTJv5b3BckuO7HCzwBWzRKwg95ha7+x+OauH6CkpdeS9Nk/Qg/LLXwVLNG CuqtL8V90jGxjhHBeZdWi3qmnxNXlOOC+yVYmzlHwPltnpmP9YIMgSNqaWLwpJqgSpCH SDES83jzTneeeOjT+Y6jdCPX6fa+N3Hax/XLihVZDlpYj5FFKoLq/5hb7wkYrBvcqFmd xZ1bb56QDhNyPloo/lGseOCHjWujNm1y+q9GRFA7T0CioKzbLkFTM1yCMKjD6brZnwII TTKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719256025; x=1719860825; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lR62gI/1SAvRGFQwjUeGp5M8TTVr01/DKqtXz/J0W10=; b=M3RefZNXaJudJPzk2RP7vm0nt/oLv+O9jLUoyUmPIpd0Wj66hSLg3Un2PjRnFf3Hhx socuddG/mmuvanKxV/C7jFzCrNYJwTjPjikLZ0fi3zGkiSs8tCfbrmeEu3SXMb8xQEfj 5djhaxqNFBGi7sKvkoN/FIz2Tyqu5J+D5fai7mlGOYc4EAliRNK3tjvfWTR24+w8Cdav XxQNB8bm5nTP97aFDRICtvRaUn0DpFPJ/8G6eRo9HnkRgo6VaUGcvbA9SOpDHGOJpDVs xvFyj/DDjX96rs8oFckbQ8J6DNwdRwoNSxmXFzjTNPNSfS/ai5f2FGeH8UkKcNuQr0cR f8hg== X-Forwarded-Encrypted: i=1; AJvYcCUuRclS4C3Op/KMQlArHDvvnDYkYxoekSNVNyBrk4UTjgkidEI45OgHq0lRx9SHXWYR1QmHtlOmXgldAbeqZz7keIA= X-Gm-Message-State: AOJu0Yzjs5NI+E/07elIgjb7OoWOJCMirzIw97fhccLmoJBsbq/jWA4u 5uw8evkH1zY0Hcwni2ud7lnCPNrBO9oWDEx9MDcgHkTvCeLLfK4EFHQlyrnSEgGWtksP257mR7d Yhkv0myEMf9iXEPn5VBlIJpe+pH8HpWgIC64J X-Google-Smtp-Source: AGHT+IGQGMLtBV20gTmfKdSojAfvaC7E4ljNx1sZ1gUfGpKwgiNYyPDTR9/rZjR+aXfJjbjlFCWO2UZ7I6NB4jdknJM= X-Received: by 2002:a17:906:418:b0:a6f:5723:fb11 with SMTP id a640c23a62f3a-a715f9cbb00mr381699066b.58.1719256024286; Mon, 24 Jun 2024 12:07:04 -0700 (PDT) MIME-Version: 1.0 References: <20240615081257.3945587-1-shakeel.butt@linux.dev> In-Reply-To: From: Yosry Ahmed Date: Mon, 24 Jun 2024 12:06:28 -0700 Message-ID: Subject: Re: [PATCH] memcg: use ratelimited stats flush in the reclaim To: Shakeel Butt Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Jesper Dangaard Brouer , Yu Zhao , Muchun Song , Facebook Kernel Team , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 658F84001E X-Stat-Signature: qs5h7adq3xoijbxwqfkh1xdqrbk3jfku X-Rspam-User: X-HE-Tag: 1719256026-917780 X-HE-Meta: U2FsdGVkX18gYLYDLP2lZst/w0gIgG16QXhe7+Ftr49XVqs/oyFshCd4ubfOvzceL7C42rvtvtwW5XcRTO4sdWQCmkQqD4BayzvIyICJzFzd9qWEKdF5i9Qyw5tjKuX+r+J1ufPgGlMxEt/y5TJM623lj5PQ8wC+FbHw5U4KFe+1sGOx0Q92bTo2EiypGuvMNo+x2Ts59mBfF2GchrE70XEReOyngSQlZNlJZh8r6lSqmFw1yuf/o3OJ4ONhyOnT0JJVo+PJ9dQ84yDqK4Dk9DCxm1FDdiKgWFN1zWipVPAQTsjNBaXlfVJWy5B+sm/Zp9cGKntp0u9BJC91YoaiWG+7z/HPi7D/sqipzJKUEmWOLpZcl7gMpauW9gBVpTsC2dC5Fg3r6GjxeANEHHKWvI85+qw/+LFpdtzgkCq0O0Dcgt/xnnzFjDUPlHTdlPqz6Fygw8LmefPvBcV3BfnePhBBULl63rNPx3T+KNJSvJAKT+Mb6cTykFahhymQV6Q3JjqIxMf+NvbORelWAZ5cnqbUCyBGxrQJW27XDIxeVs3Dbj2md/W4tMlHttw67j/2tgO/W2sqdn0TFylEJiicoUjSL0mkwP9/Xd1s/8eKHs3o6x58kiEJzz8+EkARcT6ysky03ue6lVR0L5H1W8Q8uq6ftGkjGh7gIpisFJrXD3512Vmuw53UJfT4xgkDt5s1/+BLiL0vBpRX6aM3WRQFOXE2aCPR5M0Dk7L4jzjGf+taVdR6fifL+udnKgNOsKfslugsUPAq6DKTkPPF/w3mzsDac07MQE4WC14k6UEZwXwoQ3FVYZoNYbWX0Q73GQGR8zD2ZvrAdkZrzExrcBJsaSDHCGbDGqdz5AyvCx92OfNS0ri75jXVDV1qfDMOtDe6+8vpSejk4f8/dVlfXhzILxMHS+ijNVT0VFSfAinrw5i/YqNQE15jT/5uRHkdxMxXqK9TNM89Gbp63ycYa81 FDGxEEWs 78YS1NAiyTPr+lPlcn3gdsy7fvRoayOfNu+ketyTV2fTw+XUCWNFJSgv/ndN3yKRzLRBKU58Q1XMcTtuxKvCRge9+OkDrhRtxuBbuivdeLpADqxoC0KBa/IEqRIWrVqqVnvB6PCzg5fWB2JFhtvZ2rQ2w2BZp4KCsj3TlyxsnGyg+gt0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.043215, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jun 24, 2024 at 11:59=E2=80=AFAM Shakeel Butt wrote: > > On Mon, Jun 24, 2024 at 10:15:38AM GMT, Yosry Ahmed wrote: > > On Mon, Jun 24, 2024 at 10:02=E2=80=AFAM Shakeel Butt wrote: > > > > > > On Mon, Jun 24, 2024 at 05:57:51AM GMT, Yosry Ahmed wrote: > > > > > > and I will explain why below. I know it may be a necessary > > > > > > evil, but I would like us to make sure there is no other option= before > > > > > > going forward with this. > > > > > > > > > > Instead of necessary evil, I would call it a pragmatic approach i= .e. > > > > > resolve the ongoing pain with good enough solution and work on lo= ng term > > > > > solution later. > > > > > > > > It seems like there are a few ideas for solutions that may address > > > > longer-term concerns, let's make sure we try those out first before= we > > > > fall back to the short-term mitigation. > > > > > > > > > > Why? More specifically why try out other things before this patch? Bo= th > > > can be done in parallel. This patch has been running in production at > > > Meta for several weeks without issues. Also I don't see how merging t= his > > > would impact us on working on long term solutions. > > > > The problem is that once this is merged, it will be difficult to > > change this back to a normal flush once other improvements land. We > > don't have a test that reproduces the problem that we can use to make > > sure it's safe to revert this change later, it's only using data from > > prod. > > > > I am pretty sure the work on long term solution would be iterative which > will involve many reverts and redoing things differently. So, I think it > is understandable that we may need to revert or revert the reverts. > > > Once this mitigation goes in, I think everyone will be less motivated > > to get more data from prod about whether it's safe to revert the > > ratelimiting later :) > > As I said I don't expect "safe in prod" as a strict requirement for a > change. If everyone agrees that we can experiment with reverting this change later without having to prove that it is safe, then I think it's fine. Let's document this in the commit log though, so that whoever tries to revert this in the future (if any) does not have to re-explain all of this :) [..] > > > > > > > > > > For the cache trim mode, inactive file LRU size is read and the k= ernel > > > > > scales it down based on the reclaim iteration (file >> sc->priori= ty) and > > > > > only checks if it is zero or not. Again precise information is no= t > > > > > needed. > > > > > > > > It sounds like it is possible that we enter the cache trim mode whe= n > > > > we shouldn't if the stats are stale. Couldn't this lead to > > > > over-reclaiming file memory? > > > > > > > > > > Can you explain how this over-reclaiming file will happen? > > > > In one reclaim iteration, we could flush the stats, read the inactive > > file LRU size, confirm that (file >> sc->priority) > 0 and enter the > > cache trim mode, reclaiming file memory only. Let's assume that we > > reclaimed enough file memory such that the condition (file >> > > sc->priority) > 0 does not hold anymore. > > > > In a subsequent reclaim iteration, the flush could be skipped due to > > ratelimiting. Now we will enter the cache trim mode again and reclaim > > file memory only, even though the actual amount of file memory is low. > > This will cause over-reclaiming from file memory and dismissing anon > > memory that we should have reclaimed, which means that we will need > > additional reclaim iterations to actually free memory. > > > > I believe this scenario would be possible with ratelimiting, right? > > > > So, the (old_file >> sc->priority) > 0 is true but the (new_file >> > sc->priority) > is false. In the next iteration, (old_file >> > (sc->priority-1)) > 0 will still be true but somehow (new_file >> > (sc->priority-1)) > 0 is false. It can happen if in the previous > iteration, somehow kernel has reclaimed more than double what it was > supposed to reclaim or there are concurrent reclaimers. In addition the > nr_reclaim is still less than nr_to_reclaim and there is no file > deactivation request. > > Yeah it can happen but a lot of wierd conditions need to happen > concurrently for this to happen. Not necessarily sc->priority-1. Consider two separate sequential reclaim attempts. At the same priority, the first reclaim attempt could rightfully enter cache trim mode, while the second one wrongfully enters cache trim mode due to stale stats, over-reclaim file memory, and stall longer to actually reclaim the anon memory. I am sure such a scenario is not going to be common, but I am also sure if it happens it will be a huge pain to debug. If others agree that this is fine, let's document this with a comment and in the commit log. I am not sure how common the cache trim mode is in practice to understand the potential severity of such problems. There may also be other consequences that I am not aware of.