From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC346C4332F for ; Thu, 15 Dec 2022 08:22:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 14F298E0003; Thu, 15 Dec 2022 03:22:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0FFEC8E0002; Thu, 15 Dec 2022 03:22:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F09288E0003; Thu, 15 Dec 2022 03:22:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id E130B8E0002 for ; Thu, 15 Dec 2022 03:22:38 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id B28681404F8 for ; Thu, 15 Dec 2022 08:22:38 +0000 (UTC) X-FDA: 80243849196.04.7B9AE58 Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com [209.85.221.45]) by imf05.hostedemail.com (Postfix) with ESMTP id D9087100012 for ; Thu, 15 Dec 2022 08:22:36 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=oHf8JRex; spf=pass (imf05.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.221.45 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1671092557; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=i9/7t+tat4EemCr//boC7wbCJ7YhkoD/mqd5kzz+IFs=; b=F9z+HrLQOgbtehItGev9RuGveVCJhyOaB4EML7qgnf2ALk5DsS6MIFzEgl33t48tU2qB/O aWJ69mHnl2Nt0Wllt5cXwdxVTalK99WXRDQrahTFE4CNHGyIJdxjtPoHPrZl9F7TGB/sB+ TdtOUIvx14zo3eGWjFpwPwuONwNQqPo= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=oHf8JRex; spf=pass (imf05.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.221.45 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671092557; a=rsa-sha256; cv=none; b=vN0huAPbXhFxkjPV9jUj55EkygM4KrMRriYcEfPwQyLyhZWfBbjJW57lU9+JOwQHOjOJhb 4VoeAYLn2klDOCZPIbZBcoAJvIiwVSG2Dcbwk0GM9pmrnXXN6yIrKxPxRgEhbv2fV+NXTH yEABXpM33vUsZfhID+H8jfcLej/8uSg= Received: by mail-wr1-f45.google.com with SMTP id f18so2231479wrj.5 for ; Thu, 15 Dec 2022 00:22:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20210112.gappssmtp.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=i9/7t+tat4EemCr//boC7wbCJ7YhkoD/mqd5kzz+IFs=; b=oHf8JRexe86kqz1atzpIPnL/2pUtvk4jhtDCAgaRAUrGWKixHyeVVUQxPXzCSDHGGt XkuKOuKXIq7ryB2bjb7jzTtvH1z29vQ6Rq9znUw3LWxUG4a7ZLZG/zTmi095SAEq7ooG TtDoU/NKMDMWnhnJzJguKQSl4GaTPkH3+M/J96Y+XPjv2d5rkuKQbyGDPGTl7SR5EVee 9MYdz9q5EgZMnvNAU2rsNNAOhu6+CiaUQU8Sn/YqzHzdVqCZfUe8iakKhLCs/F9cwakb DeI4ZUjOKtsIvWLGRsk3ivSzLLduGA7P2+dPB/tODrIpu2Uf+Q391O6WEVcOgM40hPmc N3vg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=i9/7t+tat4EemCr//boC7wbCJ7YhkoD/mqd5kzz+IFs=; b=KwvUnEAqaM27rBvoqmiT2Y3mmCW/y2KsJUb+vO6e7KNYQj5wbdyie41etmARs/+9/f XIsCQtSFRhO9D2RscWfxIGy3jwPEiC4ujc4NCb44zmmxODdx3qLZulLkeUUFbUN1F2Je SAB9R9DONuOesbK0Vkl2mTbAS6ZTBuqRXG9QGd9T6XPGIzAnVoT7Ce8224hbNGojE09z phn6hal+Z+UJY1VGHr8f3eww7MsuvcyZwkaFZVJc/3lbpmNgIjClS1I0Tj1VZd0VNyJt bZMzMwSC1UBFklJdTEUrFiguPeq3e/jw2niOvawPBITU7SuW850rGFo73mv1iBaCMWFX qtuw== X-Gm-Message-State: ANoB5plzWkoLlkRFP2QTp2ikJf3oed/rly2O9b+kc8bhLU7vQsllbgem oGhwZ08qk70I5GQ6TxeJ68liQw== X-Google-Smtp-Source: AA0mqf4a90zrVSQ9gSsZg5oNVHYqQGV57b1QY3NSapHU26qgkrqjbJodGKu5joLMQDrvnHSvFD7C5g== X-Received: by 2002:a05:6000:5c5:b0:242:63a3:2f20 with SMTP id bh5-20020a05600005c500b0024263a32f20mr24767004wrb.11.1671092555206; Thu, 15 Dec 2022 00:22:35 -0800 (PST) Received: from localhost (ip-046-005-139-011.um12.pools.vodafone-ip.de. [46.5.139.11]) by smtp.gmail.com with ESMTPSA id l18-20020a5d4bd2000000b00236488f62d6sm5162332wrt.79.2022.12.15.00.22.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Dec 2022 00:22:34 -0800 (PST) Date: Thu, 15 Dec 2022 09:22:33 +0100 From: Johannes Weiner To: "Huang, Ying" Cc: Michal Hocko , Dave Hansen , Yang Shi , Wei Xu , Andrew Morton , linux-mm@kvack.org, LKML Subject: Re: memcg reclaim demotion wrt. isolation Message-ID: References: <87edt1dwd2.fsf@yhuang6-desk2.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87edt1dwd2.fsf@yhuang6-desk2.ccr.corp.intel.com> X-Rspamd-Queue-Id: D9087100012 X-Stat-Signature: a17c7y1z1r7deyiycg9cz8i1bkqup73h X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1671092556-909750 X-HE-Meta: U2FsdGVkX18USJjB9NXWiNw9tELosFiywmUguNa5vcU2J92+7ZOHJr+QPzTh5zlu76DD/SYyhJuPQKhoL5c18ee7VW1Hi5FDpyP1JG586DWD7gE9VxtWZcJbdJG6txRiOVDx9eGa7kyNYnMVTh/k6rHBdC77cjDS+ZsEJPyNPJtDFDRg7KFYzffAazKd4wfbAdcwLrzv3TAfVtOQLqcOpXyKiLK4BLKPlK+HlUwzQDVPwIrvkBuD7XWZxFbfJTq9mBdy4tpTwISdunWJpYS65zvUMZCHnUgDh2zwPB9j5iFhWwiFIzKHSR+L9sI1GZKvEmjViWvH+ZcEQOVzb0uy/2dRs+ZQd+v7nDCZRBYfROlH5n8VisBg2zx+78WUp8Qcgc5qpvRVZnqTu38wntYOERkmestw/5HLXCmAH588fu5AhhCXIQ5K1QzHVlGC6B9hTX33A/axCHR5mFU4mrcPJiKXF7fb3HmQ5k6uN4KWAeIKfDJnzPr+2vz71yG0zXS71GSWZJVN3sqq3ml0wDfcwQI6pE82pF3S0vSDt1iBnHbVDormF2xDdVlXoXYdHomBibsgJ+YhyK283nX/IS+O9lmZL3PH5vNE8KjSWkipNDV/SZr/duwSYKHA0FWfrLEsezoM6tB6FHftqGclTaEBJgyBR87PyIREXPR7nWSMJ3P5A/Yt/8sLv6mjfEaZ1EHgusM/Hhno1dHcSq+6PLm5MoVBNszctRvi3Wbkt6fw+7ULiZCVZMmPE8NXMFA07nGMCs/vxsUFW50+bUAhU0DegLndLD77cjWJxmi2VpVFPXiOEIxgo0YXqSaOR8Ks0gHvNvNfoOlAP6Z5kRUkefIUykxCFF8C6egrGatalpBJ0OfsRm2698kyGxYn+LkR2rwE4toPvDEj4FV2BND0+zpz57PgB+v65q/+WK4C+gBStjespgAH3Pf4AMPwJuoIyBFzbhInxD3vVwK3NaUOXu6 yoPwqsUR 9+T1qP78zuBJqrH8DawpVGOQknIYhq23qR/bylde4zo6JwgxAda4a6fGTVjkgeYGwFXz6QZM+WkV4iGuGMun1wyEaOeZ2nR7tLRm5TsYmS+ufnrTyKjGZRNTjumKmJkqQuXc4g5zyuXUIeipLdynAIWbbP9fGux5PLmOZ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Dec 15, 2022 at 02:17:13PM +0800, Huang, Ying wrote: > Michal Hocko writes: > > > On Tue 13-12-22 17:14:48, Johannes Weiner wrote: > >> On Tue, Dec 13, 2022 at 04:41:10PM +0100, Michal Hocko wrote: > >> > Hi, > >> > I have just noticed that that pages allocated for demotion targets > >> > includes __GFP_KSWAPD_RECLAIM (through GFP_NOWAIT). This is the case > >> > since the code has been introduced by 26aa2d199d6f ("mm/migrate: demote > >> > pages during reclaim"). I suspect the intention is to trigger the aging > >> > on the fallback node and either drop or further demote oldest pages. > >> > > >> > This makes sense but I suspect that this wasn't intended also for > >> > memcg triggered reclaim. This would mean that a memory pressure in one > >> > hierarchy could trigger paging out pages of a different hierarchy if the > >> > demotion target is close to full. > >> > >> This is also true if you don't do demotion. If a cgroup tries to > >> allocate memory on a full node (i.e. mbind()), it may wake kswapd or > >> enter global reclaim directly which may push out the memory of other > >> cgroups, regardless of the respective cgroup limits. > > > > You are right on this. But this is describing a slightly different > > situaton IMO. > > > >> The demotion allocations don't strike me as any different. They're > >> just allocations on behalf of a cgroup. I would expect them to wake > >> kswapd and reclaim physical memory as needed. > > > > I am not sure this is an expected behavior. Consider the currently > > discussed memory.demote interface when the userspace can trigger > > (almost) arbitrary demotions. This can deplete fallback nodes without > > over-committing the memory overall yet push out demoted memory from > > other workloads. From the user POV it would look like a reclaim while > > the overall memory is far from depleted so it would be considered as > > premature and a warrant a bug report. > > > > The reclaim behavior would make more sense to me if it was constrained > > to the allocating memcg hierarchy so unrelated lruvecs wouldn't be > > disrupted. > > When we reclaim/demote some pages from a memcg proactively, what is our > goal? To free up some memory in this memcg for other memcgs to use? If > so, it sounds reasonable to keep the pages of other memcgs as many as > possible. The goal of proactive aging is to free up any resources that aren't needed to meet the SLAs (e.g. end-to-end response time of webserver). Meaning, to run things as leanly as possible within spec. Into that free space, another container can then be co-located. This means that the goal is to free up as many resources as possible, starting with the coveted hightier. If a container has been using all-hightier memory but is able demote to lowtier, there are 3 options for existing memory in the lower tier: 1) Colder/stale memory - should be displaced 2) Memory that can be promoted once the hightier is free - reclaim/demotion of the coldest pages needs to happen at least temporarily, or the tierswap is in stale mate. 3) Equally hot memory - if this exceeds capacity of the lower tier, the hottest overall pages should stay, the excess demoted/reclaimed. You can't know what scenario you're in until you put the demoted pages in direct LRU competition with what's already there. And in all three scenarios, direct LRU competition also produces the optimal outcome.