From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB165C61DB2 for ; Mon, 9 Jun 2025 05:35:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2F2FB6B007B; Mon, 9 Jun 2025 01:35:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2A43A6B0088; Mon, 9 Jun 2025 01:35:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1BB046B0089; Mon, 9 Jun 2025 01:35:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id EF89D6B007B for ; Mon, 9 Jun 2025 01:35:23 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 6A8C5BE7DC for ; Mon, 9 Jun 2025 05:35:23 +0000 (UTC) X-FDA: 83534749326.04.D3FF9D4 Received: from mail-pg1-f175.google.com (mail-pg1-f175.google.com [209.85.215.175]) by imf29.hostedemail.com (Postfix) with ESMTP id 8BBAE120004 for ; Mon, 9 Jun 2025 05:35:21 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=KQLNEvOG; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf29.hostedemail.com: domain of ritesh.list@gmail.com designates 209.85.215.175 as permitted sender) smtp.mailfrom=ritesh.list@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749447321; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=tkg9IMCTu6TKB1rVl4zEog0FDDNMrzDftkEt+V/q1Ho=; b=XjTLuUA8Q+thwvPkiX9yK2dYL5AVoM61HSl+2tDcQlvhq8UhQKQB+fkaaU4zpdzYvhfgN1 l7QRJ9BL7tgQJtEPJaE1usO/dG8KD52PC0TFTH7eFJ6OWlhmad4WPQtji2ye4qQss9keUr Xn0mJvPhAygDAYtKFBaRsLrNyS+yVBg= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=KQLNEvOG; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf29.hostedemail.com: domain of ritesh.list@gmail.com designates 209.85.215.175 as permitted sender) smtp.mailfrom=ritesh.list@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749447321; a=rsa-sha256; cv=none; b=Qv+VVrT4rpmcAbm/qvK2rhNaPAmB5/375jk9mOK7gM8zmtEIfekO53lYhej0E5fHYhPEsS zsQf7OHZWV5MQ978INqNXMbPUhE/Kc66cWq5HPN9cIM+4IwujgL3wYBB65m0j8AgQcyXWD XesCFHbHpsV6WvcbUCEgax1qGM7Tb18= Received: by mail-pg1-f175.google.com with SMTP id 41be03b00d2f7-b200047a6a5so4245762a12.0 for ; Sun, 08 Jun 2025 22:35:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749447320; x=1750052120; darn=kvack.org; h=references:message-id:date:in-reply-to:subject:cc:to:from:from:to :cc:subject:date:message-id:reply-to; bh=tkg9IMCTu6TKB1rVl4zEog0FDDNMrzDftkEt+V/q1Ho=; b=KQLNEvOG/RKuNhwQ2J0pmGEgZHidg5QEPtohqX+EMZtTy8pyNNQPj/5xUEDMTuzb52 P8cpk71Lp5m0w+HI9ikTwr3utExsXvHCHTual5tqVFVQsq108NmPJxk0xSinqD39DUxQ 57tD1Sko6kScZTBWhBAwMeBWUBrYjrqogdzPZtVI+s4hfcCvURc+vE6bf9Sz4/5x8mm3 jDNiMA3fUU2yDBiYss77IC/AwGPed629XKfVFi/1Igyl4486DFWV1EtRb+zdhhjbfoV+ 8GefsmE9nWwN6AzX47SL3uJo7N15T1lBVz6bW6sSZbgciFNtcXbXaL9NXW7hJJHvwHII ORJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749447320; x=1750052120; h=references:message-id:date:in-reply-to:subject:cc:to:from :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tkg9IMCTu6TKB1rVl4zEog0FDDNMrzDftkEt+V/q1Ho=; b=mXyC4CE4lelcdJLymTb9otmNnhn+U6PcKr6jamtpjGxCEGzKx7PBnuNSV7meGOeRT3 jjiIbhxjl/OqBcPWGUcRLmSZCKzHR6RdsWHSKfk2T75hEz6pLxk3a7HeVBxbo5c7mrGO C5cKHagh7wvAYT433ADHJIOeEqNPdr4TpkkBkDS7hFLw3k+462ozmlJ+B7YhIxp+7nxV uN/s3v1uktQGr32suFb5/k0Yh4hC6a14VQlWJmBwXp3KsMAX4MwB+fxltEIu3LLkTiXz cpNyqEC6RQ8WHt+dVeo/uJD98tjy6gW4KKx8II0FTNltGniHKp+tLHXBlrkSyjGeGhAm 3HOw== X-Forwarded-Encrypted: i=1; AJvYcCVpXtM5Lzo2Ji8FSmcSN8dWbf0KJVx9FRM6VkfGULRLOt6uDqGGM17voRli4s1kDv13P0oW9PIfEg==@kvack.org X-Gm-Message-State: AOJu0YwMtUpmVDmXGQiR0LCIiwhg13lnUigxXtsGYRlt16noUn2Clgba zMQl7o4VeeOIez5OZyhAxMIeGOT66rCmyfNawZIN8LD8ntPpbuZorn6P X-Gm-Gg: ASbGncsjTN4w6RoJF94YVLlr2guriqZ6qCcnVJuBrZ91RnZQIekSzXdwGCwX0c9tJRU hQ5+FFuNyPG2mhV5aQK54N/aO48pNGkHBpqdRpNfjU/FDiXoZr/EVAsT4uskcv0wlviJ+AsqCAF OqdkXyI/2OgWkicj/Q5E1osPElYItf2aOQMT1daRsjZv0KmzO86rYxnbw6E5eB5s5k5Hr5aG7zp 9k7tSVkXcnI67u33kNLdzsotsjxWla3xbnp6cXgsOWG8ik64HJEvrACAHj31twzzzqL/CDBEf5W gTxowhTkZPxcJr3R2x3YcagbOofdCUueKDOqxViOD+UiB5ehNCjcjA== X-Google-Smtp-Source: AGHT+IHbXNWjLe2scKNmHHh2XXD2X5FEQWghFG6+O13KNwkyiPgSiq0hAShH7180cL2YHTstjE9a3w== X-Received: by 2002:a17:90b:5291:b0:2ff:556f:bf9 with SMTP id 98e67ed59e1d1-31349f2eacfmr15726530a91.4.1749447320031; Sun, 08 Jun 2025 22:35:20 -0700 (PDT) Received: from dw-tp ([171.76.83.10]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-31349ffc151sm4843135a91.48.2025.06.08.22.35.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 08 Jun 2025 22:35:18 -0700 (PDT) From: Ritesh Harjani (IBM) To: Baolin Wang , akpm@linux-foundation.org, david@redhat.com, shakeel.butt@linux.dev Cc: lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, donettom@linux.ibm.com, aboorvad@linux.ibm.com, sj@kernel.org, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2] mm: fix the inaccurate memory statistics issue for users In-Reply-To: Date: Mon, 09 Jun 2025 10:57:41 +0530 Message-ID: <87bjqx4h82.fsf@gmail.com> References: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 8BBAE120004 X-Stat-Signature: 3pqtnb8sx1uum7uxasyzkdyuf9jwck1r X-Rspam-User: X-HE-Tag: 1749447321-543226 X-HE-Meta: U2FsdGVkX19fqo+xYVctYxcZa3sot5/qXJIjHTelTDHyxaI4fXCot/Mz4/ToM3kAxLlHO2mkubUOM8s67RcLyOL9voFQi+J3pgpEcxxBEOGhZZNMk7jpaqfyuaxWleX7SapPzuBqPeqzhB387Jj4S5JLGqkNGoMtdybSa6qYqlVzistAGWRH6xdzrKw4I75ddriUxYIiw5nDHcTesnmOf/x/FAqqgTUb1hZTw+jY1m4kqf7cF+sl/I6CzG12WXInqiuwvhz3MCA6WYsHUDvCe/EyUoV1ScxilZcDN7sSZxJTF84Wv+0iRucB5bJgrXdHvqE0euIUgVTWFpCupG5OGydxpmhJc0iM/kcJrPEVqJP9B5ryoxgYKkrZJTpxyd/fTFx35radXkRXD71R34PsrHwcD6SpWXPpqO7puExQIZjLUZtbM+/ltZG9jkvEuexArSzyLPsZCQod+aFGhvGO+LNS0pm4JkvDqwAX4j0rVw9qAClaIy8hPg1zdFEbuTv5DXjNLYg7WlZu30j3ofjpT8dKPXIsjP8vox/0QGf7ItObAK0qnqq+GvYj9YaUQcAdvW+G5ZbaLsHusiNo5Vok2/w3luMDExJnztaQ3kZgKjQBwT4P2UkRqgmuGureXNPk3Ub+WU0DhRYV4eHSQyfhm9HZPWzRidjAWZqPWOPTTEsHusskkb/g4OakQl2UYsMfV/gbPcY1hq/AQoj1E0C0kq429Fp2+6rcjnVJyXOChGU/GQhcKdvscchKcMmhwsy8xpAvG2Md+FD/iKsmGFvcv6M/QmAawvcqnWDL2FdNjuCWGiu1XUpG6QS7qpreQKcYvu5Z1rFVX82RIykclGxEQJirosvJ519vffSKpihLbOTVzHpZk9LrC7aJJtrLoricrbgy3+XbWXgmfVnnEev1WZrcx+00Ay3ppnD59w3wJpoWKdH2BIN2Z+USiX55IhWvnOzZmJ9Q9jN9MEJMVMV k6ihZlKy vmUCXrq+iUHrbYdBCGORhQ/kJBeoUAFrpTNVzvTYB2P2NAh0fSdAL0pv7cSsaptUWWPqbs75LUlwvN+CfpTzQN9sW46od+Pd6u88tsUB6autVFmMAYAWnlQ1thniYoIsjIX8N2uRORadVq3+laoim175jsMBQglIqVzPjmvBSX8HCQ/XWxcqUBBMdEk2cdT1so9M7BFVjE5os6mQ8uPjeHhuc/vJOuVlFmcCtcbXEOHNWRDaKkctzgQ7vVppFaH5tuK0ZLRzWfXKxF6k/dFZO5o7SODQkFWd3ZiMYonHMGosAL15TPmSCjX0lz3hZdHTucZU6+v5xkKgClmB2am0pv42mwHHOoRKxuRUBR1t4GDQz8prddn1d9dhrHRRYk7MAUdVLCgsgLiB94tJxOuCjqBUrDUufwQxZqffHnW0MhbfApHG7YGpo8uKvy0xtN1RAdu6gnq8/R79Lx6GfS5ulLC28w7Lqs8SzdWMVklYCZIpWINiN9Z6c9x8+0/KAzBdkcVSC X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Baolin Wang writes: > On some large machines with a high number of CPUs running a 64K pagesize > kernel, we found that the 'RES' field is always 0 displayed by the top > command for some processes, which will cause a lot of confusion for users. > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 875525 root 20 0 12480 0 0 R 0.3 0.0 0:00.08 top > 1 root 20 0 172800 0 0 S 0.0 0.0 0:04.52 systemd > > The main reason is that the batch size of the percpu counter is quite large > on these machines, caching a significant percpu value, since converting mm's > rss stats into percpu_counter by commit f1a7941243c1 ("mm: convert mm's rss > stats into percpu_counter"). Intuitively, the batch number should be optimized, > but on some paths, performance may take precedence over statistical accuracy. > Therefore, introducing a new interface to add the percpu statistical count > and display it to users, which can remove the confusion. In addition, this > change is not expected to be on a performance-critical path, so the modification > should be acceptable. > > In addition, the 'mm->rss_stat' is updated by using add_mm_counter() and > dec/inc_mm_counter(), which are all wrappers around percpu_counter_add_batch(). > In percpu_counter_add_batch(), there is percpu batch caching to avoid 'fbc->lock' > contention. This patch changes task_mem() and task_statm() to get the accurate > mm counters under the 'fbc->lock', but this should not exacerbate kernel > 'mm->rss_stat' lock contention due to the percpu batch caching of the mm > counters. The following test also confirm the theoretical analysis. > > I run the stress-ng that stresses anon page faults in 32 threads on my 32 cores > machine, while simultaneously running a script that starts 32 threads to > busy-loop pread each stress-ng thread's /proc/pid/status interface. From the > following data, I did not observe any obvious impact of this patch on the > stress-ng tests. > > w/o patch: > stress-ng: info: [6848] 4,399,219,085,152 CPU Cycles 67.327 B/sec > stress-ng: info: [6848] 1,616,524,844,832 Instructions 24.740 B/sec (0.367 instr. per cycle) > stress-ng: info: [6848] 39,529,792 Page Faults Total 0.605 M/sec > stress-ng: info: [6848] 39,529,792 Page Faults Minor 0.605 M/sec > > w/patch: > stress-ng: info: [2485] 4,462,440,381,856 CPU Cycles 68.382 B/sec > stress-ng: info: [2485] 1,615,101,503,296 Instructions 24.750 B/sec (0.362 instr. per cycle) > stress-ng: info: [2485] 39,439,232 Page Faults Total 0.604 M/sec > stress-ng: info: [2485] 39,439,232 Page Faults Minor 0.604 M/sec > > Tested-by Donet Tom > Reviewed-by: Aboorva Devarajan > Tested-by: Aboorva Devarajan > Acked-by: Shakeel Butt > Acked-by: SeongJae Park > Acked-by: Michal Hocko > Signed-off-by: Baolin Wang > --- > Changes from v1: > - Update the commit message to add some measurements. > - Add acked tag from Michal. Thanks. > - Drop the Fixes tag. Any reason why we dropped the Fixes tag? I see there were a series of discussion on v1 and it got concluded that the fix was correct, then why drop the fixes tag? Background: Recently few folks internally reported this issue on Power too. e.g. $ ps -o rss $$ RSS 0 So it would be nice if we had fixes tag so that it gets backported to all stable release. Does anybody sees any concern with that? -ritesh