From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80475C433F5 for ; Fri, 17 Dec 2021 06:05:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D6B9C6B0071; Fri, 17 Dec 2021 01:04:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D1B316B0072; Fri, 17 Dec 2021 01:04:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE4446B0073; Fri, 17 Dec 2021 01:04:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0153.hostedemail.com [216.40.44.153]) by kanga.kvack.org (Postfix) with ESMTP id ABC4E6B0071 for ; Fri, 17 Dec 2021 01:04:54 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 5D86A8249980 for ; Fri, 17 Dec 2021 06:04:44 +0000 (UTC) X-FDA: 78926247288.22.62FACE8 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf17.hostedemail.com (Postfix) with ESMTP id 7E71640005 for ; Fri, 17 Dec 2021 06:04:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1639721083; x=1671257083; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=bFEVMJxeRa9ekTrfK0RrMtiUWZHIrEtrBM4r//TN7XA=; b=OxN5zIhwrGsWsc9QoS9JOFLjaRQ5AJzH+/U2n9whAYRME/0inhSy7MgL +99G7Q2LTD2HxlmzwFyasit4ZiXxYFb9kwpO4eKSctc6YrXZ12onRSO1+ rXYf5UF0LKuPprRbXGezZBsThOGvgj9efzce9WzrMgeOdj9bdcbskoDrh koAOoLrKYBAbOaSN52d8H9yScYT6IzDj54pl39KUWhySj/tupuJjFhg2i i9VT3alGOFdDEQK1+GOBNlvfmuxMGOP9K0i6V96Ntv+V2abngN4AeUkNH FA3CSkN79RW/yw6HGEmBOLAOVdYnlPYmRcrbOeo+h+T72lPvT6LWC5xxl g==; X-IronPort-AV: E=McAfee;i="6200,9189,10200"; a="263860457" X-IronPort-AV: E=Sophos;i="5.88,213,1635231600"; d="scan'208";a="263860457" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Dec 2021 22:04:41 -0800 X-IronPort-AV: E=Sophos;i="5.88,213,1635231600"; d="scan'208";a="506641621" Received: from unknown (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.239.13.11]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Dec 2021 22:04:38 -0800 From: Huang Ying To: Peter Zijlstra , Mel Gorman Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , Mel Gorman , Valentin Schneider , Greg Kroah-Hartman , stable@vger.kernel.org Subject: [PATCH -V3 RESEND] numa balancing: move some document to make it consistent with the code Date: Fri, 17 Dec 2021 14:04:26 +0800 Message-Id: <20211217060426.3076856-1-ying.huang@intel.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 X-Stat-Signature: ix8amjf7po5yon3zp9dfydxhd4bh17f5 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 7E71640005 Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=OxN5zIhw; spf=none (imf17.hostedemail.com: domain of ying.huang@intel.com has no SPF policy when checking 192.55.52.88) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1639721075-2596 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: After commit 8a99b6833c88 ("sched: Move SCHED_DEBUG sysctl to debugfs"), some NUMA balancing sysctls enclosed with SCHED_DEBUG has been moved to debugfs. This patch move the document for these sysctls from Documentation/admin-guide/sysctl/kernel.rst to Documentation/scheduler/sched-debug.rst to make the document consistent with the code. Signed-off-by: "Huang, Ying" Acked-by: Mel Gorman Reviewed-by: Valentin Schneider Cc: Peter Zijlstra (Intel) Cc: Greg Kroah-Hartman Cc: stable@vger.kernel.org --- Documentation/admin-guide/sysctl/kernel.rst | 46 +----------------- Documentation/scheduler/index.rst | 1 + Documentation/scheduler/sched-debug.rst | 54 +++++++++++++++++++++ 3 files changed, 56 insertions(+), 45 deletions(-) create mode 100644 Documentation/scheduler/sched-debug.rst diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/= admin-guide/sysctl/kernel.rst index 0e486f41185e..603469d42fb9 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -609,51 +609,7 @@ be migrated to a local memory node. The unmapping of pages and trapping faults incur additional overhead tha= t ideally is offset by improved memory locality but there is no universal guarantee. If the target workload is already bound to NUMA nodes then th= is -feature should be disabled. Otherwise, if the system overhead from the -feature is too high then the rate the kernel samples for NUMA hinting -faults may be controlled by the `numa_balancing_scan_period_min_ms, -numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, -numa_balancing_scan_size_mb`_, and numa_balancing_settle_count sysctls. - - -numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, numa_ba= lancing_scan_period_max_ms, numa_balancing_scan_size_mb -=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D - - -Automatic NUMA balancing scans tasks address space and unmaps pages to -detect if pages are properly placed or if the data should be migrated to= a -memory node local to where the task is running. Every "scan delay" the = task -scans the next "scan size" number of pages in its address space. When th= e -end of the address space is reached the scanner restarts from the beginn= ing. - -In combination, the "scan delay" and "scan size" determine the scan rate= . -When "scan delay" decreases, the scan rate increases. The scan delay an= d -hence the scan rate of every task is adaptive and depends on historical -behaviour. If pages are properly placed then the scan delay increases, -otherwise the scan delay decreases. The "scan size" is not adaptive but -the higher the "scan size", the higher the scan rate. - -Higher scan rates incur higher system overhead as page faults must be -trapped and potentially data must be migrated. However, the higher the s= can -rate, the more quickly a tasks memory is migrated to a local node if the -workload pattern changes and minimises performance impact due to remote -memory accesses. These sysctls control the thresholds for scan delays an= d -the number of pages scanned. - -``numa_balancing_scan_period_min_ms`` is the minimum time in millisecond= s to -scan a tasks virtual memory. It effectively controls the maximum scannin= g -rate for each task. - -``numa_balancing_scan_delay_ms`` is the starting "scan delay" used for a= task -when it initially forks. - -``numa_balancing_scan_period_max_ms`` is the maximum time in millisecond= s to -scan a tasks virtual memory. It effectively controls the minimum scannin= g -rate for each task. - -``numa_balancing_scan_size_mb`` is how many megabytes worth of pages are -scanned for a given scan. - +feature should be disabled. =20 oops_all_cpu_backtrace =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D diff --git a/Documentation/scheduler/index.rst b/Documentation/scheduler/= index.rst index 88900aabdbf7..30cca8a37b3b 100644 --- a/Documentation/scheduler/index.rst +++ b/Documentation/scheduler/index.rst @@ -17,6 +17,7 @@ Linux Scheduler sched-nice-design sched-rt-group sched-stats + sched-debug =20 text_files =20 diff --git a/Documentation/scheduler/sched-debug.rst b/Documentation/sche= duler/sched-debug.rst new file mode 100644 index 000000000000..4d3d24f2a439 --- /dev/null +++ b/Documentation/scheduler/sched-debug.rst @@ -0,0 +1,54 @@ +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Scheduler debugfs +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Booting a kernel with CONFIG_SCHED_DEBUG=3Dy will give access to +scheduler specific debug files under /sys/kernel/debug/sched. Some of +those files are described below. + +numa_balancing +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +`numa_balancing` directory is used to hold files to control NUMA +balancing feature. If the system overhead from the feature is too +high then the rate the kernel samples for NUMA hinting faults may be +controlled by the `scan_period_min_ms, scan_delay_ms, +scan_period_max_ms, scan_size_mb` files. + + +scan_period_min_ms, scan_delay_ms, scan_period_max_ms, scan_size_mb +------------------------------------------------------------------- + +Automatic NUMA balancing scans tasks address space and unmaps pages to +detect if pages are properly placed or if the data should be migrated to= a +memory node local to where the task is running. Every "scan delay" the = task +scans the next "scan size" number of pages in its address space. When th= e +end of the address space is reached the scanner restarts from the beginn= ing. + +In combination, the "scan delay" and "scan size" determine the scan rate= . +When "scan delay" decreases, the scan rate increases. The scan delay an= d +hence the scan rate of every task is adaptive and depends on historical +behaviour. If pages are properly placed then the scan delay increases, +otherwise the scan delay decreases. The "scan size" is not adaptive but +the higher the "scan size", the higher the scan rate. + +Higher scan rates incur higher system overhead as page faults must be +trapped and potentially data must be migrated. However, the higher the s= can +rate, the more quickly a tasks memory is migrated to a local node if the +workload pattern changes and minimises performance impact due to remote +memory accesses. These files control the thresholds for scan delays and +the number of pages scanned. + +``scan_period_min_ms`` is the minimum time in milliseconds to scan a +tasks virtual memory. It effectively controls the maximum scanning +rate for each task. + +``scan_delay_ms`` is the starting "scan delay" used for a task when it +initially forks. + +``scan_period_max_ms`` is the maximum time in milliseconds to scan a +tasks virtual memory. It effectively controls the minimum scanning +rate for each task. + +``scan_size_mb`` is how many megabytes worth of pages are scanned for +a given scan. --=20 2.30.2