From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21B0CC52D7B for ; Tue, 13 Aug 2024 18:24:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 96F8F6B009A; Tue, 13 Aug 2024 14:24:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 920156B009E; Tue, 13 Aug 2024 14:24:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7BFE76B009F; Tue, 13 Aug 2024 14:24:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 594E46B009A for ; Tue, 13 Aug 2024 14:24:38 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id E10D1A72AF for ; Tue, 13 Aug 2024 18:24:37 +0000 (UTC) X-FDA: 82448047794.23.127123D Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf19.hostedemail.com (Postfix) with ESMTP id 047AD1A0004 for ; Tue, 13 Aug 2024 18:24:35 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Il/l47X3"; spf=pass (imf19.hostedemail.com: domain of longman@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=longman@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723573464; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZjeRpoHvIJg5MafdM5/VBNm6LQMFYWDnbQpg1j0Ad4c=; b=n0IMOc1xxIGlsIrbUI0TDBk0TFU1P1C4dAL1ou9p7pM8phVuphoSj6qL4BQU3Ox6+hALrF slLaOCI83p+RZrLlwvrLtOe3PP5E5KS8auGFBXqhu5XfLFP4iPXLmYUNalVEoiMexHh5pY fGHoN+EKWlONQISn/7dhkOZeD3IVpAk= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Il/l47X3"; spf=pass (imf19.hostedemail.com: domain of longman@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=longman@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723573464; a=rsa-sha256; cv=none; b=jDTnrCKdNkU/tM+b4PWqkPw2Vuaa05YfLt/4FApi3dhQ7L9cYD+wrSPXwAjsVf6bpNt3SN sHvx9WLbee7dnNaPyodX5KC9gXZOh0pJRvHxiVvXa73bP5Cao1+rI10onmCWnZiTsphX5f O/AuzlUzvpTs9K7L2uOX6joHUsUQMzI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1723573475; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZjeRpoHvIJg5MafdM5/VBNm6LQMFYWDnbQpg1j0Ad4c=; b=Il/l47X3xu0mM3gJc7z5lDefoC4QL8xh8knW44aTg5lL2vgbBJ1R72kU+0MKsgfBrrP5RH WpNrmKdFoLcVg2z8ksLEs0OF5WogAj/SX4foJg+3L5s8cimZR31efJuljxAjXv3yyTDDqY Q0+9spn+R3JuONW1LaU4U/rNajPien8= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-20-Gu2X6rZNPP6Xwvztoi3viw-1; Tue, 13 Aug 2024 14:24:16 -0400 X-MC-Unique: Gu2X6rZNPP6Xwvztoi3viw-1 Received: from mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.40]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 25BCC1925388; Tue, 13 Aug 2024 18:24:10 +0000 (UTC) Received: from [10.2.16.208] (unknown [10.2.16.208]) by mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id C4FA619560A3; Tue, 13 Aug 2024 18:24:00 +0000 (UTC) Message-ID: <6ca76824-331f-407f-afa6-bf75cdca6d96@redhat.com> Date: Tue, 13 Aug 2024 14:23:57 -0400 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 7/7] Docs/admin-guide/mm/workingset_report: document sysfs and memcg interfaces To: Yuanchu Xie , David Hildenbrand , "Aneesh Kumar K.V" , Khalid Aziz , Henry Huang , Yu Zhao , Dan Williams , Gregory Price , Huang Ying , Andrew Morton , Lance Yang , Randy Dunlap , Muhammad Usama Anjum Cc: Kalesh Singh , Wei Xu , David Rientjes , Greg Kroah-Hartman , "Rafael J. Wysocki" , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Shuah Khan , Yosry Ahmed , Matthew Wilcox , Sudarshan Rajagopalan , Kairui Song , "Michael S. Tsirkin" , Vasily Averin , Nhat Pham , Miaohe Lin , Qi Zheng , Abel Wu , "Vishal Moola (Oracle)" , Kefeng Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org References: <20240813165619.748102-1-yuanchu@google.com> <20240813165619.748102-8-yuanchu@google.com> Content-Language: en-US From: Waiman Long In-Reply-To: <20240813165619.748102-8-yuanchu@google.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 3.0 on 10.30.177.40 X-Rspam-User: X-Stat-Signature: kf8wrfmjie6kxo19qu48y69friec8ih8 X-Rspamd-Queue-Id: 047AD1A0004 X-Rspamd-Server: rspam11 X-HE-Tag: 1723573475-645440 X-HE-Meta: U2FsdGVkX19PBgRbsC7hBZx6n4t7M15e8r4MATXmb+WCftNAt/3qlDZKJNrO7znyyXFn2Ahr2mjm6DEhmrzO7htwwj4AWZExlOTKlApst9B47sBOqwSCE61yg6v4dJqdiY91yONfQTbhLKSaDQBORRhyuNjozxesHSnDwZDsVSVgl93YdJ9MqDpqUyyzkFcEPvSgYoCT58FZ3fCvB0/BYgCKX+myyhhJW4x/zdQT0OlW1pKllIVc77PZRJnHS6FHYTgUMUjtb1LFIlSEIJCpBMvkJ+XQZM6640HMetVsttQ8Wdp3vpelx0yXznl5fRnbTK38yN8ML3rARaRQdbwpHMvtC4mCnDa6pZfkjPT6FjNygU7WJTR/Sd64IjXDewB8iu1Jo2YsJvC5/62hYO8/OjdtFBPU55C9Ig6MJH9IT3uubEYWZAxZ7a2iAtNeQHcZi9vug6USCnUBAu2gDwQTCRdZUkSe9PttXSAGFkCTCLeN90lZzhCfGyFVApwHFU/uQwvt+tpTp4KOV0XrHGKhr3USAXNFvpwo+PbBEUPHoOqGERZO2xNFC/YE4/43veVWsWKHmJqxHgK6MLC67Jqh17IbI/KGuxiGxHKPaGoZqTGJojzAPUl1dIV1/xu5QGvsprDwJutVubWxOi5ueS5+39ilrRpbhtu2cshJ7Ng2r8AU/ySZDhuskOPVyoaR0mSPUos+pr3SvN8gjIakPOkRI5PJEseAkuRu8N7j12iUBqGSqk9CxB99KlGl14yCSGAYu4iJI9GMhwiimA9M8QO7louTVHQ1OCycADgqjxOOrDjLUfA5DfF0QqtcgX0s3yRcvrn4EtrjfWNDnSIttkIYv0JIpT+BIzw7PA1G3YzSXXu9Y/V/vYXLErd4oqfa9b0HMaiqtTCT+KpOzrKdaALyJOv9hpU8NgcudS3VYNAX2t9C+3GHaOZI0Q7simCYaJOLKb69BPlbDODIcMa471l nxcUkMZ9 /T5FAbSRB82KTeAtupFepf2wZsTPQ9j/IctNZfMkpvDAzejQQvgp59coF0Y14CAlMWk43oX244+a7ELGsOFghop1thtGCr47v80rgOaDbBhxuwknXgmnBA6Urfa/rytOmfTwzdWFyvUcnuQqJul9SP+XT/9YWlVIIl/HdilLQKQqQLoa3D84ecmxcQGvvQL6og4wIMx4gZKYAaWO/+y3Y69dBQ5L9yoVIWiNGsxneBY/z0ggV4jRuha2Aa/Gf0uqr6bm/Hp9cNN9dWP2WwBgbW06MHuTKpwr8OeaqBjtYJqXv7Wu2wWGwKMxrS5yaKcHmAo5Q2yKVwLyRUDANCG2j25lfTXXbw7ORpOoXwibL8xsq9tKQSAZmQs3vgv+BBSsate6SUu5UHmShBWUbDlOL0VJRgZMb7bjRoKUT4g5fWaMI+O/nAfb7OS4R9Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 8/13/24 12:56, Yuanchu Xie wrote: > Add workingset reporting documentation for better discoverability of > its sysfs and memcg interfaces. Also document the required kernel > config to enable workingset reporting. > > Change-Id: Ib9dfc9004473baa6ef26ca7277d220b6199517de > Signed-off-by: Yuanchu Xie > --- > Documentation/admin-guide/mm/index.rst | 1 + > .../admin-guide/mm/workingset_report.rst | 105 ++++++++++++++++++ The new memory cgroup control files need to be documented in Documentation/admin-guide/cgroup-v2.rst as well. Cheers, Longman > 2 files changed, 106 insertions(+) > create mode 100644 Documentation/admin-guide/mm/workingset_report.rst > > diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst > index 8b35795b664b..61a2a347fc91 100644 > --- a/Documentation/admin-guide/mm/index.rst > +++ b/Documentation/admin-guide/mm/index.rst > @@ -41,4 +41,5 @@ the Linux memory management. > swap_numa > transhuge > userfaultfd > + workingset_report > zswap > diff --git a/Documentation/admin-guide/mm/workingset_report.rst b/Documentation/admin-guide/mm/workingset_report.rst > new file mode 100644 > index 000000000000..ddcc0c33a8df > --- /dev/null > +++ b/Documentation/admin-guide/mm/workingset_report.rst > @@ -0,0 +1,105 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +================= > +Workingset Report > +================= > +Workingset report provides a view of memory coldness in user-defined > +time intervals, i.e. X bytes are Y milliseconds cold. It breaks down > +the user pages in the system per-NUMA node, per-memcg, for both > +anonymous and file pages into histograms that look like: > +:: > + > + 1000 anon=137368 file=24530 > + 20000 anon=34342 file=0 > + 30000 anon=353232 file=333608 > + 40000 anon=407198 file=206052 > + 9223372036854775807 anon=4925624 file=892892 > + > +The workingset reports can be used to drive proactive reclaim, by > +identifying the number of cold bytes in a memcg, then writing to > +``memory.reclaim``. > + > +Quick start > +=========== > +Build the kernel with the following configurations. The report relies > +on Multi-gen LRU for page coldness. > + > +* ``CONFIG_LRU_GEN=y`` > +* ``CONFIG_LRU_GEN_ENABLED=y`` > +* ``CONFIG_WORKINGSET_REPORT=y`` > + > +Optionally, the aging kernel daemon can be enabled with the following > +configuration. > +* ``CONFIG_WORKINGSET_REPORT_AGING=y`` > + > +Sysfs interfaces > +================ > +``/sys/devices/system/node/nodeX/workingset_report/page_age`` provides > +a per-node page age histogram, showing an aggregate of the node's lruvecs. > +Reading this file causes a hierarchical aging of all lruvecs, scanning > +pages and creates a new Multi-gen LRU generation in each lruvec. > +For example: > +:: > + > + 1000 anon=0 file=0 > + 2000 anon=0 file=0 > + 100000 anon=5533696 file=5566464 > + 18446744073709551615 anon=0 file=0 > + > +``/sys/devices/system/node/nodeX/workingset_report/page_age_intervals`` > +is a comma separated list of time in milliseconds that configures what > +the page age histogram uses for aggregation. For the above histogram, > +the intervals are: > +:: > + 1000,2000,100000 > + > +``/sys/devices/system/node/nodeX/workingset_report/refresh_interval`` > +defines the amount of time the report is valid for in milliseconds. > +When a report is still valid, reading the ``page_age`` file shows > +the existing valid report, instead of generating a new one. > + > +``/sys/devices/system/node/nodeX/workingset_report/report_threshold`` > +specifies how often the userspace agent can be notified for node > +memory pressure, in milliseconds. When a node reaches its low > +watermarks and wakes up kswapd, programs waiting on ``page_age`` are > +woken up so they can read the histogram and make policy decisions. > + > +Memcg interface > +=============== > +While ``page_age_interval`` is defined per-node in sysfs, ``page_age``, > +``refresh_interval`` and ``report_threshold`` are available per-memcg. > + > +``/sys/fs/cgroup/.../memory.workingset.page_age`` > +The memcg equivalent of the sysfs workingset page age histogram > +breaks down the workingset of this memcg and its children into > +page age intervals. Each node is prefixed with a node header and > +a newline. Non-proactive direct reclaim on this memcg can also > +wake up userspace agents that are waiting on this file. > +e.g. > +:: > + > + N0 > + 1000 anon=0 file=0 > + 2000 anon=0 file=0 > + 3000 anon=0 file=0 > + 4000 anon=0 file=0 > + 5000 anon=0 file=0 > + 18446744073709551615 anon=0 file=0 > + > +``/sys/fs/cgroup/.../memory.workingset.refresh_interval`` > +The memcg equivalent of the sysfs refresh interval. A per-node > +number of how much time a page age histogram is valid for, in > +milliseconds. > +e.g. > +:: > + > + echo N0=2000 > memory.workingset.refresh_interval > + > +``/sys/fs/cgroup/.../memory.workingset.report_threshold`` > +The memcg equivalent of the sysfs report threshold. A per-node > +number of how often userspace agent waiting on the page age > +histogram can be woken up, in milliseconds. > +e.g. > +:: > + > + echo N0=1000 > memory.workingset.report_threshold