From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C3A3C5B552 for ; Mon, 9 Jun 2025 08:04:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D0D856B007B; Mon, 9 Jun 2025 04:04:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CBBAF6B0088; Mon, 9 Jun 2025 04:04:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BAA766B0089; Mon, 9 Jun 2025 04:04:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 8A83B6B007B for ; Mon, 9 Jun 2025 04:04:50 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 10E771613A9 for ; Mon, 9 Jun 2025 08:04:50 +0000 (UTC) X-FDA: 83535125940.08.BE369CB Received: from out30-113.freemail.mail.aliyun.com (out30-113.freemail.mail.aliyun.com [115.124.30.113]) by imf07.hostedemail.com (Postfix) with ESMTP id 21EB240010 for ; Mon, 9 Jun 2025 08:04:46 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=QNTUnUU3; spf=pass (imf07.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.113 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749456288; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=h7tfqTud6IhH1XCgFxtry8B8MPksGWJrLr6a4xwF/rA=; b=L0GK4U+0tAvLyPxN8UEVAaShhuDnCFF9iPZSy+b7oft0/urYBoiBgP/yeA9cx8h0GL/pm9 Izwfr10aQTKJRPkBFx7NiHIqJ8GSjDG3qTEHSLSBDlu02/RdbmObfzCf8AxLbDrUHDJGY2 TemvlHS7e/SWDDqGc66f7z0hGnhfxfE= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=QNTUnUU3; spf=pass (imf07.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.113 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749456288; a=rsa-sha256; cv=none; b=Kj2CMwZJI4/06Awfoq9pKHd7OAdC623ZpgdczHP11y3mibXUApoXJp0ELEFExdOyxgrujG 6VPk/sYFRDR1qF/ACVr3r0tNyaaTbNx8StzhQMYh98f92moDZMoLBW07qIZlSntLtKtVQ/ tpz2ptzNRUDYM4XAoF2Bex6a0O58rpM= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1749456281; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=h7tfqTud6IhH1XCgFxtry8B8MPksGWJrLr6a4xwF/rA=; b=QNTUnUU3FX4Hg+2lVOmGkG1EIANvtKdzbgA/133Yy+OTYwQ6gpS/BMTzyhqgDFzhGOv9HjMJs/xHRYE98TtGmgBOivWpgXkA7/5iDYX1luLkr1xOBr8Cj97cIhfDQ0GSgj5X+1Gq1NCOWhqIdCJHv3rx6W0M3f0InU+G9Gew12Y= Received: from 30.74.144.144(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WdNLRJ2_1749456278 cluster:ay36) by smtp.aliyun-inc.com; Mon, 09 Jun 2025 16:04:39 +0800 Message-ID: <890b825e-b3b1-4d32-83ec-662495e35023@linux.alibaba.com> Date: Mon, 9 Jun 2025 16:04:38 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] mm: fix the inaccurate memory statistics issue for users To: Michal Hocko , Ritesh Harjani Cc: akpm@linux-foundation.org, david@redhat.com, shakeel.butt@linux.dev, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, donettom@linux.ibm.com, aboorvad@linux.ibm.com, sj@kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org References: <87bjqx4h82.fsf@gmail.com> From: Baolin Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 21EB240010 X-Stat-Signature: bjnfrps8fh51tnmtnrqd4ppndkb94qnp X-Rspam-User: X-HE-Tag: 1749456286-459672 X-HE-Meta: U2FsdGVkX18khu0sAd9Di15A4Ud625jKYrRAITL8TN7uH8+/EmZthwXvIbxwVg2/8CeQx3bAWbwewyYFBIGufyrLuIspQRarr83TfKbOPAib4UkrrpAug52rpEPe31jWGVbGYFW+Uf2Kxaavci/3QaFF0aq+yHuCWgYP3xg1pztxdtZmRHtv2Tm6kWrrOeMEJAJXQ0Rs+kiUVBcA8fstMYcEMlqi65lCtWbJjhmiIqULsW32It7ktRjBj3JBPLT5vdrLdheYxENPf4xcy2BZDWMIof2lwjELXN5XC2zXLB5+xGo5j5Sbj00sLTpqDUrLu5/PouTki8A05/v4ddYaFhV5J4ca2A2qiTrFxNO5zjXbJJtEsBVn6Qu+1wl0+nGLANZHlqq02Fs3bPVVGhs0JxJD9HYs7LY8FaMEK2Ui57T4ieDPGay8Fv4WjL8Ne1Alr4zbniDp/q+0E26NHtH10poMhj7MZtMkTubTJU+m8QRpVgdKHSbBXvUNSD1CPPfS2evP6I6xw0pUjSay6YGU/hojRdISzH50iV7K6O9LRRnzJe1Qgy6FAfGA8HCMyFMslXtmB62tHdbRbBVxcwNo791bCo/Own1Iz7f/4Unewpjxmpu1p+XsRWIaW7e0g9XzglfghYVYv86/yXkd9wrx6FOBszKVUJQDACmpIshqas58fZvK5nCUatImT+CQVy4VV5G1uL6+RJBzRtNbslGDBpVAkeodm3LOCDhQSkWgq4l0DTI1BeuiqyvcJk7VbRCQ1SVQiMqT6MZC70F3Nz0I8zN9QlRUtnH/3BVdo+DbgxtBoBTD4FB3B8RxZAu+R9GqhUeEQ6dFmfjhm3ljvL/yHsCJs1JrqZTpdPdhy57opg5wTuTMVoEaPFirSRhHV+pjC6LgN7XR4L6BVyo5p4lXZzsJvhJ74NnyhYUkuQAeep4s9iW8bFD8xIfb2V4csOZjEOQuWRUIupS9eKeK+PX zxDK93om 2YyuuQ6yD89MLR/RjMHCW913B3AJ1kI/aVaLuvW/k+Ufb4WCaV3Er0h4GiwZN2q09gRxVnGxvx1oahIOfgghLL4TxTvpkoMexY2DO1c16bE0d1a6chWBaiF9FcNEK6XQAiCeAunph8OY+VL7HoFDGgm/cSW2NU2rZQ/ZdG5t6UwXCNtpveIE3ZlaLN1i0kD61IzY9Nz6T6/JO3CLTqquw84iddcsDFF/Rhslie7ZuuAZVU6MwI2aukmzgS4SpkvWCabjA4o7YERkXtoQKEeo9rOHOYN78WDiMLA9SgInWwPwyDdAb1ZUhduRYvz7ooIMNOl1oYevGaWoEeTTH5X1FqX2UvneQcXPUDO3PrOYPQCp+2aRq85Kj1IbM3ZBSt9B2SZyidadhMMhAqDbBKGF6XM95gQaa8wY/swgif9tHf/NHcHkm08UJ2bWFYRVodWHW0RUX5uAAQhNf2WtSN+w6Lqxw6XRa6+/I8oZEttszMSYr4iI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/6/9 15:35, Michal Hocko wrote: > On Mon 09-06-25 10:57:41, Ritesh Harjani wrote: >> Baolin Wang writes: >> >>> On some large machines with a high number of CPUs running a 64K pagesize >>> kernel, we found that the 'RES' field is always 0 displayed by the top >>> command for some processes, which will cause a lot of confusion for users. >>> >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>> 875525 root 20 0 12480 0 0 R 0.3 0.0 0:00.08 top >>> 1 root 20 0 172800 0 0 S 0.0 0.0 0:04.52 systemd >>> >>> The main reason is that the batch size of the percpu counter is quite large >>> on these machines, caching a significant percpu value, since converting mm's >>> rss stats into percpu_counter by commit f1a7941243c1 ("mm: convert mm's rss >>> stats into percpu_counter"). Intuitively, the batch number should be optimized, >>> but on some paths, performance may take precedence over statistical accuracy. >>> Therefore, introducing a new interface to add the percpu statistical count >>> and display it to users, which can remove the confusion. In addition, this >>> change is not expected to be on a performance-critical path, so the modification >>> should be acceptable. >>> >>> In addition, the 'mm->rss_stat' is updated by using add_mm_counter() and >>> dec/inc_mm_counter(), which are all wrappers around percpu_counter_add_batch(). >>> In percpu_counter_add_batch(), there is percpu batch caching to avoid 'fbc->lock' >>> contention. This patch changes task_mem() and task_statm() to get the accurate >>> mm counters under the 'fbc->lock', but this should not exacerbate kernel >>> 'mm->rss_stat' lock contention due to the percpu batch caching of the mm >>> counters. The following test also confirm the theoretical analysis. >>> >>> I run the stress-ng that stresses anon page faults in 32 threads on my 32 cores >>> machine, while simultaneously running a script that starts 32 threads to >>> busy-loop pread each stress-ng thread's /proc/pid/status interface. From the >>> following data, I did not observe any obvious impact of this patch on the >>> stress-ng tests. >>> >>> w/o patch: >>> stress-ng: info: [6848] 4,399,219,085,152 CPU Cycles 67.327 B/sec >>> stress-ng: info: [6848] 1,616,524,844,832 Instructions 24.740 B/sec (0.367 instr. per cycle) >>> stress-ng: info: [6848] 39,529,792 Page Faults Total 0.605 M/sec >>> stress-ng: info: [6848] 39,529,792 Page Faults Minor 0.605 M/sec >>> >>> w/patch: >>> stress-ng: info: [2485] 4,462,440,381,856 CPU Cycles 68.382 B/sec >>> stress-ng: info: [2485] 1,615,101,503,296 Instructions 24.750 B/sec (0.362 instr. per cycle) >>> stress-ng: info: [2485] 39,439,232 Page Faults Total 0.604 M/sec >>> stress-ng: info: [2485] 39,439,232 Page Faults Minor 0.604 M/sec >>> >>> Tested-by Donet Tom >>> Reviewed-by: Aboorva Devarajan >>> Tested-by: Aboorva Devarajan >>> Acked-by: Shakeel Butt >>> Acked-by: SeongJae Park >>> Acked-by: Michal Hocko >>> Signed-off-by: Baolin Wang >>> --- >>> Changes from v1: >>> - Update the commit message to add some measurements. >>> - Add acked tag from Michal. Thanks. >>> - Drop the Fixes tag. >> >> Any reason why we dropped the Fixes tag? I see there were a series of >> discussion on v1 and it got concluded that the fix was correct, then why >> drop the fixes tag? > > This seems more like an improvement than a bug fix. Yes. I don't have a strong opinion on this, but we (Alibaba) will backport it manually, because some of user-space monitoring tools depend on these statistics.