From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2997C433EF for ; Thu, 24 Feb 2022 14:47:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 084988D0002; Thu, 24 Feb 2022 09:47:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 032A68D0001; Thu, 24 Feb 2022 09:47:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E3D678D0002; Thu, 24 Feb 2022 09:47:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0151.hostedemail.com [216.40.44.151]) by kanga.kvack.org (Postfix) with ESMTP id CFBAE8D0001 for ; Thu, 24 Feb 2022 09:47:35 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 8211F181C9B83 for ; Thu, 24 Feb 2022 14:47:35 +0000 (UTC) X-FDA: 79177952070.26.2FDB94A Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) by imf29.hostedemail.com (Postfix) with ESMTP id 62C24120013 for ; Thu, 24 Feb 2022 14:46:39 +0000 (UTC) Received: by mail-pl1-f174.google.com with SMTP id ay3so1913035plb.1 for ; Thu, 24 Feb 2022 06:46:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=qPfOZAd1Iv6887k017pz5Z+dabm8WoQQGQHeGgnugxw=; b=fLs7RlrK4B3g36P76KmepwzFIiSWgbW2kG+Uw5kh10UHRDP7ipBMwHgJ/ygfX180rO 4ZpaTcxcpzX/vLp3ohOyJ2Gb+CufdD4GL5hT4hk8nr/PfxAUFU+X2O4EREJYGu6J/pEt Y/BReVJMyLTzKeHyY8KzFngGqq8ESASXr1OnU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=qPfOZAd1Iv6887k017pz5Z+dabm8WoQQGQHeGgnugxw=; b=d7amEjmm3847A/mHlshmq43bl15BzLWdu7ufSd1AmAOUvgO5QPMvp4ZfzP/oLINT8c buQtoQY/ci88ann4/eZlIAtC2ipJ4jnqeVV3gl9wiY0klI7BA0sgazvXEMXlFft+6xVm hduMCdqA9L3JL9THZXbbw9akV2NN/Zse6qrc0/73zovx6R+BZ7/2A4gVQkwHuCgkJ1E/ AMz8eglECcArgaHWhhInFkZxCdk0/0jl2B6/xG/MEgwf2B1HCHNaTDvSY1zRyTd+83sL 45NIaqAfC+XQdtYJhnsi4lXL14P980fret3Fy0AJqLvp6Ew+VxoVe2OuQI/01UAD0LHQ 962A== X-Gm-Message-State: AOAM530xaM6+NL4nZzqLHey2cV9CRRkayTE47K8OtSzzzY9oqTeIpAIc Nb0H4aHRqQSD0TRP9dAAmmXyfq6vU07m4cioWy8xAA== X-Google-Smtp-Source: ABdhPJyNAwryob/0Ve2t+/xbS4QOTx2+EQ5xZZwM1plXD/aQYp8QWg9Nqkin9CK828eKbXwGYqX2OleJIAZS+GyNEN0= X-Received: by 2002:a17:902:7c92:b0:14c:7b25:5d14 with SMTP id y18-20020a1709027c9200b0014c7b255d14mr2833327pll.137.1645713998060; Thu, 24 Feb 2022 06:46:38 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Daniel Dao Date: Thu, 24 Feb 2022 14:46:27 +0000 Message-ID: Subject: Re: Regression in workingset_refault latency on 5.15 To: Ivan Babrou Cc: Shakeel Butt , kernel-team , Linux MM , Johannes Weiner , Roman Gushchin , Feng Tang , Michal Hocko , Hillf Danton , =?UTF-8?Q?Michal_Koutn=C3=BD?= , Andrew Morton , Linus Torvalds Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 62C24120013 X-Stat-Signature: jbfmo58erjs6az8rksf9wk9nb548xkks Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=cloudflare.com header.s=google header.b=fLs7RlrK; dmarc=pass (policy=reject) header.from=cloudflare.com; spf=none (imf29.hostedemail.com: domain of dqminh@cloudflare.com has no SPF policy when checking 209.85.214.174) smtp.mailfrom=dqminh@cloudflare.com X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1645713999-28924 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Feb 23, 2022 at 9:16 PM Ivan Babrou wrote: > > Thanks for the info. Is it possible to test > https://lore.kernel.org/all/20210929235936.2859271-1-shakeelb@google.com/ We built the patch on top of 5.15.25 and deployed it with swap-on-zram on some nodes. Test data follow: 1) Latency of workingset_refault over 60 seconds $ sudo /usr/share/bcc/tools/funclatency -d 60 workingset_refault Tracing 1 functions for "workingset_refault"... Hit Ctrl-C to end. nsecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 0 | | 128 -> 255 : 0 | | 256 -> 511 : 4643 |********************** | 512 -> 1023 : 4259 |******************** | 1024 -> 2047 : 1178 |***** | 2048 -> 4095 : 4044 |******************* | 4096 -> 8191 : 1804 |******** | 8192 -> 16383 : 109 | | 16384 -> 32767 : 52 | | 32768 -> 65535 : 45 | | 65536 -> 131071 : 15 | | 131072 -> 262143 : 2 | | 262144 -> 524287 : 0 | | 524288 -> 1048575 : 7 | | 1048576 -> 2097151 : 900 |**** | 2097152 -> 4194303 : 8218 |****************************************| 4194304 -> 8388607 : 967 |**** | 8388608 -> 16777215 : 89 | | avg = 1229607 nsecs, total: 32378027676 nsecs, count: 26332 2) Latency of cgroup_rstat_flush_locked over 60 seconds $ sudo /usr/share/bcc/tools/funclatency -d 60 cgroup_rstat_flush_locked Tracing 1 functions for "cgroup_rstat_flush_locked"... Hit Ctrl-C to end. nsecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 0 | | 128 -> 255 : 0 | | 256 -> 511 : 0 | | 512 -> 1023 : 0 | | 1024 -> 2047 : 36 | | 2048 -> 4095 : 73 | | 4096 -> 8191 : 151 | | 8192 -> 16383 : 606 |** | 16384 -> 32767 : 387 |* | 32768 -> 65535 : 49 | | 65536 -> 131071 : 26 | | 131072 -> 262143 : 6 | | 262144 -> 524287 : 3 | | 524288 -> 1048575 : 14 | | 1048576 -> 2097151 : 2085 |********* | 2097152 -> 4194303 : 8580 |****************************************| 4194304 -> 8388607 : 772 |*** | 8388608 -> 16777215 : 117 | | 16777216 -> 33554431 : 1 | | avg = 2592894 nsecs, total: 33463898782 nsecs, count: 12906 3) Summary of stack traces when mem_cgroup_flush_stats is over 5ms @stackcount[ mem_cgroup_flush_stats+1 workingset_refault+296 __read_swap_cache_async+512 swap_cluster_readahead+440 shmem_swapin+164 shmem_swapin_page+396 shmem_getpage_gfp+1228 shmem_fault+104 __do_fault+55 __handle_mm_fault+3102 handle_mm_fault+191 do_user_addr_fault+428 exc_page_fault+103 asm_exc_page_fault+30 ]: 1 @stackcount[ mem_cgroup_flush_stats+1 workingset_refault+296 __read_swap_cache_async+512 swap_cluster_readahead+333 shmem_swapin+164 shmem_swapin_page+396 shmem_getpage_gfp+1228 shmem_fault+104 __do_fault+55 __handle_mm_fault+3649 handle_mm_fault+191 do_user_addr_fault+428 exc_page_fault+103 asm_exc_page_fault+30 ]: 1 @stackcount[ mem_cgroup_flush_stats+1 workingset_refault+296 __read_swap_cache_async+512 swap_cluster_readahead+333 shmem_swapin+164 shmem_swapin_page+396 shmem_getpage_gfp+1228 shmem_fault+104 __do_fault+55 __handle_mm_fault+3102 handle_mm_fault+191 do_user_addr_fault+428 exc_page_fault+103 asm_exc_page_fault+30 ]: 4 @stackcount[ mem_cgroup_flush_stats+1 shrink_node+163 balance_pgdat+791 kswapd+508 kthread+295 ret_from_fork+34 ]: 7 @stackcount[ mem_cgroup_flush_stats+1 workingset_refault+296 add_to_page_cache_lru+159 page_cache_ra_unbounded+340 filemap_get_pages+569 filemap_read+164 xfs_file_buffered_read+152 xfs_file_read_iter+106 new_sync_read+280 vfs_read+242 __x64_sys_pread64+137 __noinstr_text_start+59 entry_SYSCALL_64_after_hwframe+68 ]: 10 @stackcount[ mem_cgroup_flush_stats+1 workingset_refault+296 add_to_page_cache_lru+159 page_cache_ra_unbounded+340 filemap_get_pages+233 filemap_read+164 xfs_file_buffered_read+152 xfs_file_read_iter+106 new_sync_read+280 vfs_read+242 __x64_sys_pread64+137 __noinstr_text_start+59 entry_SYSCALL_64_after_hwframe+68 ]: 111 @stackcount[ mem_cgroup_flush_stats+1 workingset_refault+296 add_to_page_cache_lru+159 page_cache_ra_unbounded+340 force_page_cache_ra+226 filemap_get_pages+233 filemap_read+164 xfs_file_buffered_read+152 xfs_file_read_iter+106 new_sync_read+280 vfs_read+242 __x64_sys_pread64+137 __noinstr_text_start+59 entry_SYSCALL_64_after_hwframe+68 ]: 225 This was collected with the following bpftrace script kprobe:mem_cgroup_flush_stats { @start[tid] = nsecs; @stack[tid] = kstack; } kretprobe:mem_cgroup_flush_stats /@start[tid]/ { $usecs = (nsecs - @start[tid]) / 1000; if ($usecs >= 5000) { @stackcount[@stack[tid]]++; } delete(@start[tid]); delete(@stack[tid]); } interval:s:60 { print(@stackcount); exit(); } > > If that patch did not help then we either have to optimize rstat > flushing or further increase the update buffer which is nr_cpus * 32. I think overall, we don't see any significant improvements. The appearance of expensive mem_cgroup_flush_stats is still there.