From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BAC2C27C55 for ; Thu, 6 Jun 2024 17:03:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 83DF86B00A4; Thu, 6 Jun 2024 13:03:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7C7966B00A7; Thu, 6 Jun 2024 13:03:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6405C6B00A8; Thu, 6 Jun 2024 13:03:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 41FDE6B00A4 for ; Thu, 6 Jun 2024 13:03:07 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D4CF640E0A for ; Thu, 6 Jun 2024 17:03:06 +0000 (UTC) X-FDA: 82201083972.13.6B2706F Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) by imf18.hostedemail.com (Postfix) with ESMTP id 4E1D61C0034 for ; Thu, 6 Jun 2024 17:03:01 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=infradead.org header.s=bombadil.20210309 header.b=JdB6g2Uj; spf=none (imf18.hostedemail.com: domain of rdunlap@infradead.org has no SPF policy when checking 198.137.202.133) smtp.mailfrom=rdunlap@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717693384; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sEoAy6edqgFguOV2DSnAI0H9NUQjpmesJsuIe22sxxo=; b=oLcbADs4nRPLGusTJEjIQMMML9i2RHhssvKcenxCG/QTonehTqLI4RxaMESAhX3jmNo0k4 V3dm5qzwQWb5Wj4D969tSqhrE4Mm5fddOwHjB1O4GdTvWdKmTBrd4OEoWy9WbQosNJk14w JG9SPT6q6U5UU7iHOe1peOzG66+jpRk= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=infradead.org header.s=bombadil.20210309 header.b=JdB6g2Uj; spf=none (imf18.hostedemail.com: domain of rdunlap@infradead.org has no SPF policy when checking 198.137.202.133) smtp.mailfrom=rdunlap@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717693384; a=rsa-sha256; cv=none; b=xi9nT9zRcRHGzHR9qFMLfd/W+t5cFBAxE9H8Q8yOpmhXly78rF88x3RKgv9ZozeFU0gKEj ghfBK3DOb+AK/+qBBVN5sTDydSNEaO4KuMVsYJaaUez6d9248MUGIZcHRlq1OLC8jUQ82l oguXumieASWjffUbzfouHpPGZDtHk7k= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: Content-Type:In-Reply-To:From:References:Cc:To:Subject:MIME-Version:Date: Message-ID:Sender:Reply-To:Content-ID:Content-Description; bh=sEoAy6edqgFguOV2DSnAI0H9NUQjpmesJsuIe22sxxo=; b=JdB6g2UjYDYtHDCOjp5NsNNEcO qVNfnIrQWV+DYpMt2eS2OOLdllzYnUSqAwwZrXvHXsjfH94NkxZisXAgXA7htzRK296gfbR3JkOZD MYlpC5xq+Jp86vYvVdUl/F/XgJnUqTr4kqA4h6uawzBrKf+uVySfT6QocJfmSXjP7pzeI6/ezEXVc +myeYfHZNsr1yXqSB6VVs8kx3juFiWga3dXv/VJELekjuO663XN6jcIjiktEEYAFSHpvfNklO3k5S oJolq3cEZKG4sdvDUkf6xL6ClVw5s3+7TinwFYgUwT/LvRZiEHTW2YKQQnQAwyqFYYKmI+Q54Bb9O LtecWOSA==; Received: from [50.53.4.147] (helo=[192.168.254.15]) by bombadil.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sFGVS-0000000AcqX-3vpJ; Thu, 06 Jun 2024 17:02:31 +0000 Message-ID: Date: Thu, 6 Jun 2024 10:02:27 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 8/8] Docs/admin-guide/mm/workingset_report: document sysfs and memcg interfaces To: Yuanchu Xie , David Hildenbrand , "Aneesh Kumar K.V" , Khalid Aziz , Henry Huang , Yu Zhao , Dan Williams , Gregory Price , Huang Ying , Muhammad Usama Anjum Cc: Kalesh Singh , Wei Xu , David Rientjes , Greg Kroah-Hartman , "Rafael J. Wysocki" , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Shuah Khan , Yosry Ahmed , Matthew Wilcox , Sudarshan Rajagopalan , Kairui Song , "Michael S. Tsirkin" , Vasily Averin , Nhat Pham , Miaohe Lin , Qi Zheng , Abel Wu , "Vishal Moola (Oracle)" , Kefeng Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org References: <20240604020549.1017540-1-yuanchu@google.com> <20240604020549.1017540-9-yuanchu@google.com> Content-Language: en-US From: Randy Dunlap In-Reply-To: <20240604020549.1017540-9-yuanchu@google.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 4E1D61C0034 X-Stat-Signature: gpnnbqptijj5j6tjm6zhqx4x71tijji1 X-Rspam-User: X-HE-Tag: 1717693381-894179 X-HE-Meta: U2FsdGVkX1+lVBD0ZaEZnAv1W+l56SU3SVQIYzBfnrt1Fe5ZO9ipLKYHDvbTr0O8L516fX4WJCmOcSkyH4LEj02s0zpW9sRSQm4CdBtw2CxsMwt69yQOS9j5C2o7iFhcV4zEeK325z9fsQUTe7bRAErfacFMOcXsFuUPE8YGAKqjQ8OawMPl7jUztWNcp7b/r4Fadhn2Ya8sgnB2whzi6GIS9TTR27SfCoSVDcoWDTeJK5XmYw722TRv2AFyB2QPVxKVuF38/JDzA001IfbBQOrc6jBoA5g6g4V9TRB+zgwJYgg4N1I5GBsvRL+Wo701cOeMdOPhsqfZ/yvpilBUVM+Z70gVQ8mlrkIkdBjfJODPNcSz2BKSSxGm/qvP1wrH6PTdBU/qmvRsfrrJ2K/a8mWuLbda51epX6M/pwoPHIHYFzCncNJ2lcfUDWKNrgXFRut03d311RLgpBv9ah53tlsB+xiVaUO2n1Lcsw9utZ2xj8iaxRkX0SQPi66EdXG8CFvObYEUwdUOdZi0asqEkZg+DfsSn2XQp1lYv8DqcVGJNpSFyOmIS5fB+y/jlmphGdZwSGK0ifONsXHMLX3ViSBd1YQu4nKJPaOi4R7CqxM9L0XmBi9KxCwwwRL9Q2EayBP1Aamlgaq5h1NcqKqrhXR1z9Mn9hPkcWXQV4ws5P8Bwq+/SLp3CoNxsgSIMZ9swxxMpD27ZbSu7TySthhMToAl/qz/jIQC12ojfS40QGrVbyBqmKnRFq2kK3KEB6ocORgJAfbiEod/7QlG/gbVM5YrA/+s5BO3dII4LFZYwdVPV0FgatkBa0t5lINyuOhvLtjhENRE2IBj352CIKp3uF86JZjkT296xAsrRoU7G5TcM7Mw+4/N6YFhsG5NzuLuQd6u1DZnBx3HkZxb1aNLL60z04cK9BLz62octIsP86maHqUuCieTmIRrcdAb8eERSCEnRRQYDVfNw5LlKE1 F0X9oFDS AMz0czlydHft6XgznpLPWRJZ9KHBVlxcZl4mertllUfpaRhlySazVBRD7WGjxQIL6mV+4/xd0eZXFI1K2ZIiunUCcBwjC1ejT6Qon2i3XjMmjvCtLfDbxLAEs2kfxGOBaXqk0KdBRKMNDb7csAuNkezUqOsQj6NbuBgQS/NCNp/dQeQcsJ9kuD+ZjViehr3l0RzzNV5bGdIo2oRA/98N3cU5rqP720r7ieH8Xl8JNZ03BBeng+REVyv233D8rLdqaNAX6/B5Y45GKP31nguK3PhxEE2HCDx6MI48R857RUCd1D8zYh8KbAG3LmmUu2ae5W/c7LddxJ7e8fZuWRLWclwnWmdopcPbC737PeZ8RDcBi/hOmorpxghPJBCr+UakcK0xmqf4+s3tbFs/zv6xEpUUyfW3H3Y4KqAwN2uvH1KsoVsyZBJGuHsMN2nAGbyOAMlq8 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, Just a couple of nits below: On 6/3/24 7:05 PM, Yuanchu Xie wrote: > Add workingset reporting documentation for better discoverability of > its sysfs and memcg interfaces. Also document the required kernel > config to enable workingset reporting. > > Signed-off-by: Yuanchu Xie > --- > Documentation/admin-guide/mm/index.rst | 1 + > .../admin-guide/mm/workingset_report.rst | 105 ++++++++++++++++++ > 2 files changed, 106 insertions(+) > create mode 100644 Documentation/admin-guide/mm/workingset_report.rst > > diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst > index 1f883abf3f00..fba987de8997 100644 > --- a/Documentation/admin-guide/mm/index.rst > +++ b/Documentation/admin-guide/mm/index.rst > @@ -41,4 +41,5 @@ the Linux memory management. > swap_numa > transhuge > userfaultfd > + workingset_report > zswap > diff --git a/Documentation/admin-guide/mm/workingset_report.rst b/Documentation/admin-guide/mm/workingset_report.rst > new file mode 100644 > index 000000000000..f455ae93b30e > --- /dev/null > +++ b/Documentation/admin-guide/mm/workingset_report.rst > @@ -0,0 +1,105 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +================= > +Workingset Report > +================= > +Workingset report provides a view of memory coldness in user-defined > +time intervals, i.e. X bytes are Y milliseconds cold. It breaks down > +the user pages in the system per-NUMA node, per-memcg, for both > +anonymous and file pages into histograms that look like: > +:: > + > + 1000 anon=137368 file=24530 > + 20000 anon=34342 file=0 > + 30000 anon=353232 file=333608 > + 40000 anon=407198 file=206052 > + 9223372036854775807 anon=4925624 file=892892 > + > +The workingset reports can be used to drive proactive reclaim, by > +identifying the number of cold bytes in a memcg, then writing to > +``memory.reclaim``. > + > +Quick start > +=========== > +Build the kernel with the following configurations. The report relies > +on Multi-gen LRU for page coldness. > + > +* ``CONFIG_LRU_GEN=y`` > +* ``CONFIG_LRU_GEN_ENABLED=y`` > +* ``CONFIG_WORKINGSET_REPORT=y`` > + > +Optionally, the aging kernel daemon can be enabled with the following > +configuration. > +* ``CONFIG_LRU_GEN_ENABLED=y`` > + > +Sysfs interfaces > +================ > +``/sys/devices/system/node/nodeX/page_age`` provides a per-node page > +age histogram, showing an aggregate of the node's lruvecs. > +Reading this file causes a hierarchical aging of all lruvecs, scanning > +pages and creates a new Multi-gen LRU generation in each lruvec. > +For example: > +:: > + > + 1000 anon=0 file=0 > + 2000 anon=0 file=0 > + 100000 anon=5533696 file=5566464 > + 18446744073709551615 anon=0 file=0 > + > +``/sys/devices/system/node/nodeX/page_age_interval`` is a comma > +separated list of time in milliseconds that configures what the page > +age histogram uses for aggregation. For the above histogram, > +the intervals are: > +:: > + 1000,2000,100000 > + > +``/sys/devices/system/node/nodeX/workingset_report/refresh_interval`` > +defines the amount of time the report is valid for in milliseconds. > +When a report is still valid, reading the ``page_age`` file shows > +the existing valid report, instead of generating a new one. > + > +``/sys/devices/system/node/nodeX/workingset_report/report_threshold`` > +specifies how often the userspace agent can be notified for node > +memory pressure, in milliseconds. When a node reaches its low > +watermarks and wakes up kswapd, programs waiting on ``page_age`` are > +woken up so they can read the histogram and make policy decisions. > + > +Memcg interface > +=============== > +While ``page_age_interval`` is defined per-node in sysfs. ``page_age``, sysfs, > +``refresh_interval`` and ``report_threshold`` are available per-memcg. > + > +``/sys/fs/cgroup/.../memory.workingset.page_age`` > +The memcg equivalent of the sysfs workingset page age histogram, no comma ^ > +breaks down the workingset of this memcg and its children into > +page age intervals. Each node is prefixed with a node header and > +a newline. Non-proactive direct reclaim on this memcg can also > +wake up userspace agents that are waiting on this file. > +e.g. > +:: > + > + N0 > + 1000 anon=0 file=0 > + 2000 anon=0 file=0 > + 3000 anon=0 file=0 > + 4000 anon=0 file=0 > + 5000 anon=0 file=0 > + 18446744073709551615 anon=0 file=0 > + > +``/sys/fs/cgroup/.../memory.workingset.refresh_interval`` > +The memcg equivalent of the sysfs refresh interval. A per-node > +number of how much time a page age histogram is valid for, in > +milliseconds. > +e.g. > +:: > + > + echo N0=2000 > memory.workingset.refresh_interval > + > +``/sys/fs/cgroup/.../memory.workingset.report_threshold`` > +The memcg equivalent of the sysfs report threshold. A per-node > +number of how often userspace agent waiting on the page age > +histogram can be woken up, in milliseconds. > +e.g. > +:: > + > + echo N0=1000 > memory.workingset.report_threshold -- #Randy https://people.kernel.org/tglx/notes-about-netiquette https://subspace.kernel.org/etiquette.html