From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA5DFC678D4 for ; Thu, 2 Mar 2023 08:11:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1A8816B0071; Thu, 2 Mar 2023 03:11:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 130A86B0073; Thu, 2 Mar 2023 03:11:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F139A6B0078; Thu, 2 Mar 2023 03:11:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id DD30A6B0071 for ; Thu, 2 Mar 2023 03:11:17 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id B3A621A02E7 for ; Thu, 2 Mar 2023 08:11:17 +0000 (UTC) X-FDA: 80523238194.05.F2E9D69 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf04.hostedemail.com (Postfix) with ESMTP id 944424000A for ; Thu, 2 Mar 2023 08:11:13 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=FBD8UYT7; spf=pass (imf04.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677744675; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AfZuvCzEA+DH+tUMo+UTrVYkdbLfRY8V6DEBqiwEEPE=; b=c8q2JbfQrnKnfF0kilF0lVyrbFiGWRUG89wW3SfHYZbhLHoSwaKsr9t+iW+/D0r7GYJxXs oiViRkODNRSqE3a2JXq4R6I37l/s1r+9UMhU3xniQVsEtyNYgB/X7Vox1x4H8DZbGcWN1S UKmCZvJgU2qz045kXSqEY/ec5PIQt9k= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=FBD8UYT7; spf=pass (imf04.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677744675; a=rsa-sha256; cv=none; b=K97EwF5x1L65U87D7rwBAZhPUXaSoH25h0EKXqIsOd+9Aem2GzqdcO866MYWiFVfWGKpCY x/TGo2teYai6rgzl9sm5b7Cqee3ef/lJ8RNZNmUh0T9wxik7y/zcwGJNb0c83uyeKNFpXG KuxMO2GiGn7hGlYskMXdD+XJaEDxx0o= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1677744673; x=1709280673; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=2C5Ik7/fiXySFCOa9/1bcQNlInhbuuMxzpCF3INXVxs=; b=FBD8UYT7KDIco/3JhCflKXWbjZvIf7agikYra1gvHYWpCLLxU2VT5iLQ R0J4gNon4ChI98GtO7grKJSmssvCubL8KUSyItAHH/S4mMuW4Qqp2Pz0K w/i9c003+RXmUQJQomXEzDj6A1HjvepkWMyJ7AZMhtSW6SxtDqcXez6Da EGSuaLiTRzN/S+g7wOISzmKUvWMa9ZjII3BujogXYDNEW9APmAQA3GNjK F46i18Fp1NSZD7UWmkGge9vT+iRP5Ajciuc1jKjfaQiwlkUkSlEtReuHZ vR/AXW/dlsYnPOHbVFfMen9vgqzBxsxCn/+ejvVgsWX5UWic53SVH0gtz Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10636"; a="362241750" X-IronPort-AV: E=Sophos;i="5.98,227,1673942400"; d="scan'208";a="362241750" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Mar 2023 00:11:11 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10636"; a="798736344" X-IronPort-AV: E=Sophos;i="5.98,227,1673942400"; d="scan'208";a="798736344" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Mar 2023 00:11:07 -0800 From: "Huang, Ying" To: Bharata B Rao Cc: , , , , , , , , , , , , Subject: Re: [RFC PATCH 0/5] Memory access profiler(IBS) driven NUMA balancing References: <20230208073533.715-1-bharata@amd.com> <878rh2b5zt.fsf@yhuang6-desk2.ccr.corp.intel.com> <72b6ec8b-f141-3807-d7f2-f853b0f0b76c@amd.com> <87zg9i9iw2.fsf@yhuang6-desk2.ccr.corp.intel.com> <1547d291-1512-faae-aba5-0f84c3502be4@amd.com> <87zg9c7rrf.fsf@yhuang6-desk2.ccr.corp.intel.com> <8fea74ec-8feb-1709-14f2-cecb63fdc9ed@amd.com> <87v8jnbl22.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Thu, 02 Mar 2023 16:10:01 +0800 In-Reply-To: (Bharata B. Rao's message of "Wed, 1 Mar 2023 16:51:25 +0530") Message-ID: <87jzzz8tgm.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 944424000A X-Rspam-User: X-Stat-Signature: dyzpk493xr4fwdxunyh1hmsqkaomjg3f X-HE-Tag: 1677744673-296687 X-HE-Meta: U2FsdGVkX18YoP7OH6Wy2sfSM/WAO2zt7rBOaxQ6nLUYl8mFPJA5yo3rO9rGJY7HoaManusO7lE5/j+LxQJJHb/bjCNSEK5Z+5CfLDQj77O2pfa/vf/FMqNoEfe5PhhFKR7Wpu8xBu0wLAII/avFytizVN+0UAYCfr2+f2jqxTLX4CMhmGBzl4vImskxVq4Ku4SMlrGbBMd5Eq2G2KE7jYqXuCX3EVMV07ilPfWtPLG5joyCdzNFlZLbxRQSfK2XdTnODLSerHaFnqe+xnuw6zQx+0aZX6Cv8cozd1qJk016pPbI/pXV9R/csOHtlS3H2/toB4fIb/oyIZ4dBwB46H0eZkH+S7LpZccUUtubDif3FxAoiFA9dQ3eTydQaogLQsGWBiaDoZfrpWNDy+qOskNxePAgjFIWMikVBU5HWGlmcZuXWoNBjllFvos+4tp+/sJyTOQ64JyvduAhWkjnlnxN3HmKnD04/cS+afXHt4hhmRutECPNKcBxZMI1c50KKyBNUkTgpwz3opzlKQEtBtdud0i31dRJgk/vkIM+S4K1/dZN7S7G/bF9zmp+nsSgC/De6wUpZxm3S0Dvnp4IZbCzyMTsb/bidMuyuvHxYYxNvbm7j8Ok6hQSzxkuYknfIPgfn8XN9lgUKo00MHmQIsZohPyeBXbesZO5jYCujzsk0+5oXe/J2HfuGzJ8mGRoJpBPDmLHavFKpXPfEeuyB1Qgg1wi5mcdfGr2+sdxrNdpquZxm5p8iN3yx1jhkrpCq0joUA8tOpyBTkXcvlS08oFMcjuJBnRPyS9wlGV/oaW40utRYKeX5o22eiScIlF0NR3WiNUZ/8QegoY8YWOZ2fe/5Ez8B0alfBtL51q+0XUWi0+TVZCMTugaarKmOOE//r08gQrngqtJ2zzi+gIpCN9o8/kvidFGBfQ3MW0fthT9lFfMIAsF6iQblYHJM9aYmAya8jf6UM9WdwLzdrj du2hLJp0 TT5CkL1spBGm5NrEraC52FI641C37AYVEWYk99ANHIqDCGbX1jA/urK4+JE7w8T54HBEQxwyPRkU9jrfp+RFKTrHJuZhT1II3BieWrT+iNG6KZWds5o/PDDX4d2GGzDI1iN1u3rq/kMYGXqw1UaEyec8ggWNpBcTL3PCShFuUeuPK0x/XpAXemFeowS0BtuJAMKe9cihsYfItEGdTuCMhp5JL6f2kFw/SdhFJlUMXIMRN7xI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Bharata B Rao writes: > On 27-Feb-23 1:24 PM, Huang, Ying wrote: >> Thank you very much for detailed data. Can you provide some analysis >> for your data? > > The overhead numbers I shared earlier weren't correct as I > realized that while obtaining those numbers from function_graph > tracing, the trace buffer was silently getting overrun. I had to > reduce the number of memory access iterations to ensure that I get > the full trace buffer. I will be summarizing the findings > based on this new numbers below. > > Just to recap - The microbenchmark is run on an AMD Genoa > two node system. The benchmark has two set of threads, > (one affined to each node) accessing two different chunks > of memory (chunk size 8G) which are initially allocated > on first node. The benchmark touches each page in the > chunk iteratively for a fixed number of iterations (384 > in this case given below). The benchmark score is the > amount of time it takes to complete the specified number > of accesses. > > Here is the data for the benchmark run: > > Time taken or overhead (us) for fault, task_work and sched_switch > handling > > Default IBS > Fault handling 2875354862 2602455 > Task work handling 139023 24008121 > Sched switch handling 37712 > Total overhead 2875493885 26648288 > > Default > ------- > Total Min Max Avg > do_numa_page 2875354862 0.08 392.13 22.11 > task_numa_work 139023 0.14 5365.77 532.66 > Total 2875493885 > > IBS > --- > Total Min Max Avg > ibs_overflow_handler 2602455 0.14 103.91 1.29 > task_ibs_access_work 24008121 0.17 485.09 37.65 > hw_access_sched_in 37712 0.15 287.55 1.35 > Total 26648288 > > > Default IBS > Benchmark score(us) 160171762.0 40323293.0 > numa_pages_migrated 2097220 511791 > Overhead per page 1371 52 > Pages migrated per sec 13094 12692 > numa_hint_faults_local 2820311 140856 > numa_hint_faults 38589520 652647 For default, numa_hint_faults >> numa_pages_migrated. It's hard to be understood. I guess that there aren't many shared pages in the benchmark? And I guess that the free pages in the target node is enough too? > hint_faults_local/hint_faults 7% 22% > > Here is the summary: > > - In case of IBS, the benchmark completes 75% faster compared to > the default case. The gain varies based on how many iterations of > memory accesses we run as part of the benchmark. For 2048 iterations > of accesses, I have seen a gain of around 50%. > - The overhead of NUMA balancing (as measured by the time taken in > the fault handling, task_work time handling and sched_switch time > handling) in the default case is seen to be pretty high compared to > the IBS case. > - The number of hint-faults in the default case is significantly > higher than the IBS case. > - The local hint-faults percentage is much better in the IBS > case compared to the default case. > - As shown in the graphs (in other threads of this mail thread), in > the default case, the page migrations start a bit slowly while IBS > case shows steady migrations right from the start. > - I have also shown (via graphs in other threads of this mail thread) > that in IBS case the benchmark is able to steadily increase > the access iterations over time, while in the default case, the > benchmark doesn't do forward progress for a long time after > an initial increase. Hard to understand this too. Pages are migrated to local, but performance doesn't improve. > - Early migrations due to relevant access sampling from IBS, > is most probably the significant reason for the uplift that IBS > case gets. In original kernel, the NUMA page table scanning will delay for a while. Please check the below comments in task_tick_numa(). /* * Using runtime rather than walltime has the dual advantage that * we (mostly) drive the selection from busy threads and that the * task needs to have done some actual work before we bother with * NUMA placement. */ I think this is generally reasonable, while it's not best for this micro-benchmark. Best Regards, Huang, Ying > - It is consistently seen that the benchmark in the IBS case manages > to complete the specified number of accesses even before the entire > chunk of memory gets migrated. The early migrations are offsetting > the cost of remote accesses too. > - In the IBS case, we re-program the IBS counters for the incoming > task in the sched_switch path. It is seen that this overhead isn't > that significant to slow down the benchmark. > - One of the differences between the default case and the IBS case > is about when the faults-since-last-scan is updated/folded into the > historical faults stats and subsequent scan period update. Since we > don't have the notion of scanning in IBS, I have a threshold (number > of access faults) to determine when to update the historical faults > and the IBS sample period. I need to check if quicker migrations > could result from this change. > - Finally, all this is for the above mentioned microbenchmark. The > gains on other benchmarks is yet to be evaluated. > > Regards, > Bharata.