From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0B2CC47DAF for ; Fri, 19 Jan 2024 12:59:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 55E046B0081; Fri, 19 Jan 2024 07:59:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 50D9B6B0082; Fri, 19 Jan 2024 07:59:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3FDE16B0085; Fri, 19 Jan 2024 07:59:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 2F4C66B0081 for ; Fri, 19 Jan 2024 07:59:33 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 09B3E1C17DD for ; Fri, 19 Jan 2024 12:59:33 +0000 (UTC) X-FDA: 81696067026.20.2EAA4AA Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by imf29.hostedemail.com (Postfix) with ESMTP id 77E5412001E for ; Fri, 19 Jan 2024 12:59:29 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf29.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705669171; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4zrlMXyEhSmzZ1tj4JXNEs3A6LNgvNTga4tMg2r1Yxc=; b=hqcVEIaP56gXrmV4vCLRs6PeKH7ZwsqeldLGg9vCsFeYDpdZeufpK6kTCWBngdd4SX4uNR wFn4u2a6VfHdnF2lWr10TOM5J8V/jdPrdMKAHwoK6n96hNXJXt0kh9xD/0KC4l0jDr1opw wpu2M5g0OB/Ze9eHOVvHxIErhqSHMS8= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf29.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705669171; a=rsa-sha256; cv=none; b=cAIZbz0qg399ThmvRXGLF3ZzL2GY6xuCWDcY4awWVLxa8/NQMm4ZF0h+iLRsZDcJ6vA2sF 2dq2/EeyGqFhMPw3UfmD2df/45EVdwcz55C/WO5cVlTgY3bWSoc9eQI9SnZZqizPQrsEaC Jibk6uJUESe9Wprca9+3S2c/dCmY3lo= Received: from mail.maildlp.com (unknown [172.19.88.105]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4TGfmz0QQ3zsVs0; Fri, 19 Jan 2024 20:58:27 +0800 (CST) Received: from dggpemm100001.china.huawei.com (unknown [7.185.36.93]) by mail.maildlp.com (Postfix) with ESMTPS id 7D9CD140153; Fri, 19 Jan 2024 20:59:23 +0800 (CST) Received: from [10.174.177.243] (10.174.177.243) by dggpemm100001.china.huawei.com (7.185.36.93) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Fri, 19 Jan 2024 20:59:23 +0800 Message-ID: <14ae628d-a9ef-42f3-9201-e90c5c88c133@huawei.com> Date: Fri, 19 Jan 2024 20:59:22 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] mm: memory: move mem_cgroup_charge() into alloc_anon_folio() Content-Language: en-US To: Michal Hocko CC: Andrew Morton , , , , Matthew Wilcox , David Hildenbrand References: <20240117103954.2756050-1-wangkefeng.wang@huawei.com> From: Kefeng Wang In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To dggpemm100001.china.huawei.com (7.185.36.93) X-Rspamd-Queue-Id: 77E5412001E X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 89tnuzqoosjema8b6bx569mzcchchqrw X-HE-Tag: 1705669169-293290 X-HE-Meta: U2FsdGVkX19KBn6JY1H7bBBnlz/2qW5jfpbTUTLoIWtjtn6HfDGVGFdtJ+a6pzxl9bLbg6sSk2+B0oZqBPUlQafwVuHAcsJvVoZK8gD6ES8znmLsgGZOe/v3ldkmDmi6HBukNAU1E/nSFzx4l0NAwk6MehM5M/E9hldHZOa0gI/4CDpXrkVT/01oirHEihAo866uo3eWqvGFbzhCdHpu4VFnxhd+M2mZdID4BSm73YTRSihfw/J2Y+IoiJeYMmkatXK4Nib7ggQKG6XeTPW5L2azqTskUNpgoCrEwn4xdKJOntDddqqN2pPDIZlwbx2Andlk2cJqKAa87aMUW7fRyLecacP1XgGrz24UGEkU4qaQjMJ5bN3NKqg+SBMyUJ+IQKn638mzkvbVaiEdV8l4CeqWv4dL8MplNvfL2i7tmCnY8N9jnoKMB7Q0CxX1bHbU1am87YI4ds6JI4EAmZZTBRpX7b6M9WmO8bT2SL4Hs1XDAHzboJ2hFJSodwMCa8XJXHY22qwe2v8Xq7tFSHhlNzaqI+A3y0I1Q8ESlfiJUJk/osrz2zB783A/LvuspktWu8RcL3ifV5p1QIusQNUsXxE/AXSo+8EHadhIgy01ly5RlT0QePyMKXSotEaCvYiyWrQRUl913AQmwB1WnCZSPx/9Qu1Wc2Ui+LJ9EK/MDZLegEKkPDpzAkHmXM+60BWR6bARcv19urR+0iAvPPT2v2+YIcxI0BMKcibxbyflhL1vuWiIxfW4YR91XovZzUKUIdrT54tiQUZyiuyiibSPz5HCeo0bAb+mOPFFtfzKuDroUA7XMhnQm8iX+Adr5jHqZYI6qBXIR586j9xEgHPH9FwhDGpaHMeU0Whs74ETdrYUS9rgZzUDkqxu9wHlMIfjRYQjPQk6vBfIYhBx///6bCOwMMLyrywwPm6WbThZCc90Pepl1VMeIzwGS+2l50hXgbSWNEG77qcgcYj/ojt EM5lATZ1 BEi5TFEb2qp0lsn7C1K3byFbLWa4uuWD2078xhgGkTech8jxtrH33klZJmqCe4rxd88SegS+1845JusfpvNWkjFSgix2RwPueg+lefwF68zZ3Q2Aoan/GsRNvNfKBF/3wdfDQnB8VILdF6JmTC6eGThTE1DOuv8v4/jnzoX9DzcsI0iSm7IHCY5a16QMULrWhLZAwhx8AWsz7yhkNltSmsEV4Y3lnWCWGgRdX X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/1/19 16:00, Michal Hocko wrote: > On Fri 19-01-24 10:05:15, Kefeng Wang wrote: >> >> >> On 2024/1/18 23:59, Michal Hocko wrote: >>> On Wed 17-01-24 18:39:54, Kefeng Wang wrote: >>>> mem_cgroup_charge() uses the GFP flags in a fairly sophisticated way. >>>> In addition to checking gfpflags_allow_blocking(), it pays attention >>>> to __GFP_NORETRY and __GFP_RETRY_MAYFAIL to ensure that processes within >>>> this memcg do not exceed their quotas. Using the same GFP flags ensures >>>> that we handle large anonymous folios correctly, including falling back >>>> to smaller orders when there is plenty of memory available in the system >>>> but this memcg is close to its limits. >>> >>> The changelog is not really clear in the actual problem you are trying >>> to fix. Is this pure consistency fix or have you actually seen any >>> misbehavior. From the patch I suspect you are interested in THPs much >>> more than regular order-0 pages because those are GFP_KERNEL like when >>> it comes to charging. THPs have a variety of options on how aggressive >>> the allocation should try. From that perspective NORETRY and >>> RETRY_MAYFAIL are not all that interesting because costly allocations >>> (which THPs are) already do imply MAYFAIL and NORETRY. >> >> I don't meet actual issue, it founds from code inspection. >> >> mTHP is introduced by Ryan(19eaf44954df "mm: thp: support allocation of >> anonymous multi-size THP"),so we have similar check for mTHP like PMD THP >> in alloc_anon_folio(), it will try to allocate large order folio below >> PMD_ORDER, and fallback to order-0 folio if fails, meanwhile, >> it get GFP flags from vma_thp_gfp_mask() according to user configuration >> like PMD THP allocation, so >> >> 1) the memory charge failure check should be moved into fallback >> logical, because it will make us to allocated as much as possible large >> order folio, although the memcg's memory usage is close to its limits. >> >> 2) using seem GFP flags for allocate/mem charge, be consistent with PMD >> THP firstly, in addition, according to GFP flag returned for >> vma_thp_gfp_mask(), GFP_TRANSHUGE_LIGHT could make us skip direct reclaim, >> _GFP_NORETRY will make us skip mem_cgroup_oom and won't kill >> any progress from large order folio charging. > > OK, makes sense. Please turn that into the changelog. Sure. > >>> GFP_TRANSHUGE_LIGHT is more interesting though because those do not dive >>> into the direct reclaim at all. With the current code they will reclaim >>> charges to free up the space for the allocated THP page and that defeats >>> the light mode. I have a vague recollection of preparing a patch to >> >> We are interesting to GFP_TRANSHUGE_LIGHT and _GFP_NORETRY as mentioned >> above. > > if mTHP can be smaller than COSTLY_ORDER then you are correct and > NORETRY makes a difference. Please mention that in the changelog as > well. > For memory cgroup charge, _GFP_NORETRY checked to make us directly skip mem_cgroup_oom(), it has no concern with folio order or COSTLY_ORDER when check _GFP_NORETRY in try_charge_memcg(), so I think NORETRY should always make difference for all large order folio.