From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F7F5C5B543 for ; Fri, 30 May 2025 03:09:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 824A56B0082; Thu, 29 May 2025 23:09:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7FBFD6B0083; Thu, 29 May 2025 23:09:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 739AC6B0085; Thu, 29 May 2025 23:09:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 4F53F6B0082 for ; Thu, 29 May 2025 23:09:51 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id B85E3121FCE for ; Fri, 30 May 2025 03:09:50 +0000 (UTC) X-FDA: 83498094540.16.AFC1594 Received: from out-176.mta1.migadu.com (out-176.mta1.migadu.com [95.215.58.176]) by imf10.hostedemail.com (Postfix) with ESMTP id 9422BC0010 for ; Fri, 30 May 2025 03:09:48 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=YhlVsGlt; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf10.hostedemail.com: domain of ye.liu@linux.dev designates 95.215.58.176 as permitted sender) smtp.mailfrom=ye.liu@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748574589; a=rsa-sha256; cv=none; b=U32KqwSyCF9CRKZJOY3Y8amFvNa95QvYMgv/p4vFipqODUEqAipj9vRpE3zczg/TI2f7BX xS37tLR6kQQoW9HoRe1cX75oYwRWhwBk+QzXDE0Zx1/3v9n50fGSbI8U63ak8N3mhTfVm6 yCPj5WDweWz42pe8t+9Zj7KPHRQvuo8= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=YhlVsGlt; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf10.hostedemail.com: domain of ye.liu@linux.dev designates 95.215.58.176 as permitted sender) smtp.mailfrom=ye.liu@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748574589; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2178ioBP+lANadFDk4SVqqHjKen33UU9fPSSsD3ub2I=; b=Ep1JxTF/s5ouYsxUl1QSjyBhKH39Z1nE/na0BWC6Ta/8tMDG3KkVGwGckN621p1agEp7Bw U6XqTm27XzA5qzyKbe+rWPJoRhNfmnprud/UdSpz4Vl0A7Vu6ncfrqb+Wq9LGax/Dmt6Ty ceLg2NYBqma4EFRSd2GiST3716ApeqQ= Content-Type: multipart/alternative; boundary="------------2CrJUTzOUFXivycuE4dDs4Xf" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1748574586; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=2178ioBP+lANadFDk4SVqqHjKen33UU9fPSSsD3ub2I=; b=YhlVsGlt39Cf+pXe7EQWtyASls2W2jQzwiZR6cPmeU1B9v+OmsvBcmgRyXlWmCxhRsTvGY mEkX+mDwFfMxBf9J8Z/4kL1zPkx6hNld3162ct1LhfwBwybKrfULXO6YDjYzVpdvopmjIr WhwwHZrUFwDN3xp95RnlIovxbjALpFM= Message-ID: <1f8af317-2fff-4a1f-ad0e-9d9c6c15f3a1@linux.dev> Date: Fri, 30 May 2025 11:09:12 +0800 MIME-Version: 1.0 Subject: Re: [PATCH v4] tools/mm: Add script to display page state for a given PID and VADDR To: Stephen Brennan , akpm@linux-foundation.org Cc: linux-debuggers@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-toolchains@vger.kernel.org, osandov@osandov.com, paulmck@kernel.org, sweettea-kernel@dorminy.me, liuye@kylinos.cn, fweimer@redhat.com, sj@kernel.org References: <20250528091543.355386-1-ye.liu@linux.dev> <87iklkadzy.fsf@oracle.com> Content-Language: en-US X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Ye Liu In-Reply-To: <87iklkadzy.fsf@oracle.com> X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 9422BC0010 X-Stat-Signature: 9sgw5uc3p6pfuksunassr6buyp5nfts4 X-Rspam-User: X-HE-Tag: 1748574588-425280 X-HE-Meta: U2FsdGVkX18mvSkukbRey+xUhgwwQBwWG+nhUGQdV+zqQ3fXVqbbfy3ufbHBcgRJ/Oy6VdOblXsrDuzbb1cLxnUanpT55QOxMXy71W0hFaXX6VVSMKLzV8SlpSKzUcKk70+Y8yPSJni9O224lEM2wCaPwzc1JUYEJ9Qyv0hcK74PsZIykKLFNCaqbc8iphH7CxADJLvtrssn05rpgfhTfFl81Ria0jIJG5sGP6d8fPCaVBCnCbdBxiSmHSwBNCRwBUO82H5lzll6NNCFE29+vQ37RVl35QchdJ+yd7Xe7Yve+R+GZ2Uf8OT1K1Bm3boZyeAhkyGCZqYcEVLlG47jRsexOMxlU/fRzAMnghVLDgCa6nkFPxW8EdUOazLQ1+xdci88hFbsvO+/cw7/pCdsZL9RMY104HPMPuR//AhoB5/FTX1mmxc2S3n3iEXr+//JNnxM1+k7EP8NvcZT/8Smz+UoqIow2HzxCUPzT6Ltmxcw1YmU0JfoV8id3zNKojbeK5Hc9iqA7/GlBkWrqmvk4xt06l7QASX4GRLjKx9Vy9W3r6fwjsDpOwo20YzMaaBmfnqPrywSxb/vYIybvkAz0ImFU+x6bqfzvx/Nb/lUC4igLRxO9OwZgNuto7BkG8ywgJVht8+4TUScvd6SeF+zXacD9cmn+gBUUtBhi3JEpWLfehto8giPxy6L6B4AoKWXYZFjP+l2Zhd99XG+h9/ABE1oIt/AGsxhr9pDKnOTqxxE8H2c1J/lBxYWg61XE9+JOMzhkALqeFZibgI6xM3dlDbrSALIoeNPyH9mbXquMTKwkMrMiWSKTaMg33E4g8zihqdsScUZeiJNmul+OTJytgmQWm9XGiaKBi5lAp2QtVMp2NkM4oY5HZtttSbSx2dFQDJkfqFmOyYT6Oo//eiJuRHP5rYHpM3o9J/Xxq+0DUAUi4uiHQay3hoAzrpvcOp4gD9hnJ4y4xQXcEoBheV s6lQDwL6 3XHB029AmsjDcXOnTpXM61rSTRajfYVhndizURSFVU1kw8PhaX+bxq/D5nOVGXUUPi2ZNvGmJm7QNmD8g0EQLgTxbOucPcAJkPV6sue6FT/mlc1xhfk10HLQELimyZ7EmAJP88x/wpf38xQovmBCzlsmf5eMl/Mfnz8ShGDxM9ZDEiWNDk9h3KqndD2DlIcF/kska9RTqek/8X1c562XIy/MSACXFgZ0OUvP2JlgRt7/LJgBGC8e01LBjLAcIuPvWUntBStG2ma9WREIEzghOl/gPuKjWl9iWxtaXFJ7GC7d9XknMburtIfAF21jWf7mntQuuQtGxs4hcN7dG6zNG85Hb9f8JrlLgadIO9XScrNpseQj1DvAr6egevZVwID3SYM8j9LoCC7mN7OQGPkyky/YHi6u+CZKRIhTAj7iLvRoqLNdlDZAkF+VD2qO5HWtF5V+3qCrzvpy/dyuU/MzelNem/TBcf+AKdAkSOAG1aUb9V+Bp04qCVR/cIe3tmyBZEPe+twvz8Gd4/UZ4NLr3yRXZsw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This is a multi-part message in MIME format. --------------2CrJUTzOUFXivycuE4dDs4Xf Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 在 2025/5/29 00:36, Stephen Brennan 写道: > Hi Ye, > > I just wanted to leave a code review related to the drgn/Python elements > of this patch. I'm no mm expert, and most of the things I'm flagging > here are small changes that I don't think are critical. > > Ye Liu writes: >> From: Ye Liu >> >> Introduces a new drgn script, `show_page_info.py`, which allows users >> to analyze the state of a page given a process ID (PID) and a virtual >> address (VADDR). This can help kernel developers or debuggers easily >> inspect page-related information in a live kernel or vmcore. >> >> The script extracts information such as the page flags, mapping, and >> other metadata relevant to diagnosing memory issues. >> >> Output example: >> sudo ./show_page_info.py 1 0x7fb3eb1b2000 >> PID: 1 Comm: systemd mm: 0xffff8d27279f9cc0 >> Raw: 0017ffffc000416c fffff31105a61b08 fffff31105a63608 ffff8d27121326a8 >> Raw: 0000000000000000 ffff8d271b9dcc40 0000002500000007 ffff8d2711f12700 >> User Virtual Address: 0x7fb3eb1b2000 >> Page Address: 0xfffff31106356a00 >> Page Flags: PG_referenced|PG_uptodate|PG_lru|PG_head|PG_active| >> PG_private|PG_reported|PG_has_hwpoisoned >> Page Size: 4096 >> Page PFN: 0x18d5a8 >> Page Physical: 0x18d5a8000 >> Page Virtual: 0xffff8d274d5a8000 >> Page Refcount: 37 >> Page Mapcount: 7 >> Page Index: 0x0 >> Page Memcg Data: 0xffff8d2711f12700 >> Memcg Name: init.scope >> Memcg Path: /sys/fs/cgroup/memory/init.scope >> Page Mapping: 0xffff8d27121326a8 >> Page Anon/File: File >> Page VMA: 0xffff8d26cac47600 >> VMA Start: 0x7fb3eb1b2000 >> VMA End: 0x7fb3eb1b6000 >> This page is part of a compound page. >> This page is the head page of a compound page. >> Head Page: 0xfffff31106356a00 >> Compound Order: 2 >> Number of Pages: 4 >> >> Signed-off-by: Ye Liu >> >> Changes in v4: >> - Add error and exception handling. >> - Adjust the way to obtain PAGE_SIZE. >> - Fix the acquisition of memcg. >> - Link to v3:https://lore.kernel.org/all/20250423014850.344501-1-ye.liu@linux.dev/ >> >> Changes in v3: >> - Adjust display style. >> - Link to v2:https://lore.kernel.org/all/20250421080748.114750-1-ye.liu@linux.dev/ >> >> Changes in v2: >> - Move the show_page_info.py file to tools/mm. >> - Link to v1: https://lore.kernel.org/all/20250415075024.248232-1-ye.liu@linux.dev/ >> --- >> MAINTAINERS | 5 ++ >> tools/mm/show_page_info.py | 152 +++++++++++++++++++++++++++++++++++++ >> 2 files changed, 157 insertions(+) >> create mode 100755 tools/mm/show_page_info.py >> >> diff --git a/MAINTAINERS b/MAINTAINERS >> index 0cb9e55021cb..3cbd46bf1eab 100644 >> --- a/MAINTAINERS >> +++ b/MAINTAINERS >> @@ -18726,6 +18726,11 @@ F: Documentation/mm/page_table_check.rst >> F: include/linux/page_table_check.h >> F: mm/page_table_check.c >> >> +PAGE STATE DEBUG SCRIPT >> +M: Ye Liu >> +S: Maintained >> +F: tools/mm/show_page_info.py >> + >> PANASONIC LAPTOP ACPI EXTRAS DRIVER >> M: Kenneth Chan >> L: platform-driver-x86@vger.kernel.org >> diff --git a/tools/mm/show_page_info.py b/tools/mm/show_page_info.py >> new file mode 100755 >> index 000000000000..5c46501e24f4 >> --- /dev/null >> +++ b/tools/mm/show_page_info.py >> @@ -0,0 +1,152 @@ >> +#!/usr/bin/env drgn >> +# SPDX-License-Identifier: GPL-2.0-only >> +# Copyright (C) 2025 Ye Liu >> + >> +import argparse >> +from drgn import Object, FaultError >> +from drgn.helpers.linux import find_task, follow_page, page_size >> +from drgn.helpers.linux.mm import ( >> + decode_page_flags, page_to_pfn, page_to_phys, page_to_virt, vma_find, >> + PageSlab, PageCompound, PageHead, PageTail, compound_head, compound_order, compound_nr >> +) >> +from drgn.helpers.linux.cgroup import cgroup_name, cgroup_path > Anything in "drgn.helpers.linux.*" can be imported from > "drgn.helpers.linux" instead, which would help if any helper moved > around from one module to another. I've recently started preferring > that, but I don't know if it's a huge improvement. EG: > > from drgn.helpers.linux import ( > PageCompound, PageHead, PageSlab, PageTail, cgroup_name, > cgroup_path, compound_head, compound_nr, compound_order, > decode_page_flags, find_task, follow_page, page_size, page_to_pfn, > page_to_phys, page_to_virt, vma_find, > ) > > Again, not sure it improves anything :) Thanks for the suggestion! After considering the trade-offs, I prefer keeping the current imports for clarity: Readability: Explicit module paths (e.g., mm/, cgroup/) make helper origins clearer. Debugging: Functional grouping helps when analyzing code. Both styles work, but the current approach aligns better with drgn’s documentation and our workflow. Happy to revisit if needs change. >> +DESC = """ >> +This is a drgn script to show the page state. >> +For more info on drgn, visit https://github.com/osandov/drgn. >> +""" >> + >> +MEMCG_DATA_OBJEXTS = 1 << 0 >> +MEMCG_DATA_KMEM = 1 << 1 >> +__NR_MEMCG_DATA_FLAGS = 1 << 2 > These are available as enums since commit 87944e2992bd2 ("mm: Introduce > page memcg flags"). So you can access them without redeclaring their > values like this: > > prog.constant("MEMCG_DATA_OBJEXTS") > > You can still save them as globals for efficiency. Got it. I'll use prog.constant("MEMCG_DATA_OBJEXTS") directly. >> +def format_page_data(data): >> + """Format raw page data into a readable hex dump.""" >> + try: >> + chunks = [data[i:i+8] for i in range(0, len(data), 8)] >> + hex_chunks = ["".join(f"{b:02x}" for b in chunk[::-1]) for chunk in chunks] >> + lines = [" ".join(hex_chunks[i:i+4]) for i in range(0, len(hex_chunks), 4)] >> + return "\n".join(f"Raw: {line}" for line in lines) >> + except Exception as e: >> + return f"Error formatting page data: {e}" > I'm thinking this would show big-endian addresses backwards, and on > 32-bit architectures it would group the data into 8-byte words, when I > think it would probably be more valuable to show them in the native word > size. > > You could resolve the endianness issue by using Program.read_word() to > read each individual word of memory in the correct byte order, and use > Program.platform.flags to distinguish a 32-bit architecture so that you > can compute each address. In fact, check print_annotated_memory() in > drgn which has very similar logic: > > https://github.com/osandov/drgn/blob/main/drgn/helpers/common/memory.py > I'll refer to print_annotated_memory() and use Program.read_word() to handle endianness and native word size properly. >> +def get_memcg_info(page): >> + """Retrieve memory cgroup information for a page.""" >> + try: >> + memcg_data = page.memcg_data.value_() >> + if memcg_data & MEMCG_DATA_OBJEXTS: >> + slabobj_ext = Object(prog, "struct slabobj_ext *", address=memcg_data & ~(__NR_MEMCG_DATA_FLAGS - 1)) > A slightly shorter and more idiomatic way to do it would be using the > cast() function: > > slabobj_ext = cast("struct slabobj_ext *", memcg_data & ~(__NR_MEMCG_DATA_FLAGS - 1)) > I'll use cast() for cleaner pointer conversion. >> + memcg_value = slabobj_ext.objcg.memcg.value_() >> + elif memcg_data & MEMCG_DATA_KMEM: >> + objcg = Object(prog, "struct obj_cgroup *", address=memcg_data & ~(__NR_MEMCG_DATA_FLAGS - 1)) >> + memcg_value = objcg.memcg.value_() >> + else: >> + memcg_value = memcg_data & ~(__NR_MEMCG_DATA_FLAGS - 1) >> + >> + if memcg_value == 0: >> + return "none", "/sys/fs/cgroup/memory/" >> + >> + memcg = Object(prog, "struct mem_cgroup *", address=memcg_value) >> + cgrp = memcg.css.cgroup >> + return cgroup_name(cgrp).decode(), f"/sys/fs/cgroup/memory{cgroup_path(cgrp).decode()}" >> + except FaultError as e: >> + return "unknown", f"Error retrieving memcg info: {e}" >> + except Exception as e: >> + return "unknown", f"Unexpected error: {e}" >> + >> +def show_page_state(page, addr, mm, pid, task): >> + """Display detailed information about a page.""" >> + try: >> + print(f'PID: {pid} Comm: {task.comm.string_().decode()} mm: {hex(mm)}') >> + try: >> + print(format_page_data(prog.read(page.value_(), 64))) > Rather than hard-code the size of struct page, you can use sizeof(page). > And in fact, all drgn Objects have a .bytes_() that will just give you > the bytes of the object directly, which would even avoid the sizeof(). I didn't find the .bytes_() method. Can you give an example? I used prog.type("struct page").size instead.              > Though in this case, I'd argue for just passing the page into > format_page_data() and letting it use Program.read_word() to read each > word in the correct endianness, like I said above. > >> + except FaultError as e: >> + print(f"Error reading page data: {e}") >> + >> + fields = { >> + "User Virtual Address": hex(addr), >> + "Page Address": hex(page.value_()), >> + "Page Flags": decode_page_flags(page), >> + "Page Size": prog["PAGE_SIZE"].value_(), >> + "Page PFN": hex(page_to_pfn(page).value_()), >> + "Page Physical": hex(page_to_phys(page).value_()), >> + "Page Virtual": hex(page_to_virt(page).value_()), >> + "Page Refcount": page._refcount.counter.value_(), >> + "Page Mapcount": page._mapcount.counter.value_(), >> + "Page Index": hex(page.__folio_index.value_()), >> + "Page Memcg Data": hex(page.memcg_data.value_()), >> + } >> + >> + memcg_name, memcg_path = get_memcg_info(page) >> + fields["Memcg Name"] = memcg_name >> + fields["Memcg Path"] = memcg_path >> + fields["Page Mapping"] = hex(page.mapping.value_()) >> + fields["Page Anon/File"] = "Anon" if page.mapping.value_() & 0x1 else "File" >> + >> + try: >> + vma = vma_find(mm, addr) >> + fields["Page VMA"] = hex(vma.value_()) >> + fields["VMA Start"] = hex(vma.vm_start.value_()) >> + fields["VMA End"] = hex(vma.vm_end.value_()) >> + except FaultError as e: >> + fields["Page VMA"] = "Unavailable" >> + fields["VMA Start"] = "Unavailable" >> + fields["VMA End"] = "Unavailable" >> + print(f"Error retrieving VMA information: {e}") >> + >> + # Calculate the maximum field name length for alignment >> + max_field_len = max(len(field) for field in fields) >> + >> + # Print aligned fields >> + for field, value in fields.items(): >> + print(f"{field}:".ljust(max_field_len + 2) + f"{value}") >> + >> + # Additional information about the page >> + if PageSlab(page): >> + print("This page belongs to the slab allocator.") >> + >> + if PageCompound(page): >> + print("This page is part of a compound page.") >> + if PageHead(page): >> + print("This page is the head page of a compound page.") >> + if PageTail(page): >> + print("This page is the tail page of a compound page.") >> + print(f"{'Head Page:'.ljust(max_field_len + 2)}{hex(compound_head(page).value_())}") >> + print(f"{'Compound Order:'.ljust(max_field_len + 2)}{compound_order(page).value_()}") >> + print(f"{'Number of Pages:'.ljust(max_field_len + 2)}{compound_nr(page).value_()}") >> + else: >> + print("This page is not part of a compound page.") >> + except FaultError as e: >> + print(f"Error accessing page state: {e}") >> + except Exception as e: >> + print(f"Unexpected error: {e}") >> + >> +def main(): >> + """Main function to parse arguments and display page state.""" >> + parser = argparse.ArgumentParser(description=DESC, formatter_class=argparse.RawTextHelpFormatter) >> + parser.add_argument('pid', metavar='PID', type=int, help='Target process ID (PID)') >> + parser.add_argument('vaddr', metavar='VADDR', type=str, help='Target virtual address in hexadecimal format (e.g., 0x7fff1234abcd)') >> + args = parser.parse_args() >> + >> + try: >> + vaddr = int(args.vaddr, 16) >> + except ValueError: >> + print(f"Error: Invalid virtual address format: {args.vaddr}") >> + return > I find it quite useful to replace things like this with: > > sys.exit(f"Error: Invalid virtual address format: {args.vaddr}") > > Which will result in the script exiting with a non-zero exit code, and > it will print the message to stderr, rather than stdout. All while being > one line shorter, for the code golfers :) Agree, I can replace it in the main() function, but in other places, I prefer the script to continue running instead of exiting." > The actual logic looks excellent, and most of my suggestions are just > that: suggestions. Sorry for putting this review on v4, I should have > sat down and done this sooner. Thank you for the above suggestions. These revisions will be incorporated in V5. Thanks, Ye                                                                               > Regards, > Stephen > >> + try: >> + task = find_task(args.pid) >> + mm = task.mm >> + page = follow_page(mm, vaddr) >> + >> + if page: >> + show_page_state(page, vaddr, mm, args.pid, task) >> + else: >> + print(f"Address {hex(vaddr)} is not mapped.") >> + except FaultError as e: >> + print(f"Error accessing task or memory: {e}") >> + except Exception as e: >> + print(f"Unexpected error: {e}") >> + >> +if __name__ == "__main__": >> + main() >> -- >> 2.25.1 --------------2CrJUTzOUFXivycuE4dDs4Xf Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit


在 2025/5/29 00:36, Stephen Brennan 写道:
Hi Ye,

I just wanted to leave a code review related to the drgn/Python elements
of this patch. I'm no mm expert, and most of the things I'm flagging
here are small changes that I don't think are critical.

Ye Liu <ye.liu@linux.dev> writes:
From: Ye Liu <liuye@kylinos.cn>

Introduces a new drgn script, `show_page_info.py`, which allows users
to analyze the state of a page given a process ID (PID) and a virtual
address (VADDR). This can help kernel developers or debuggers easily
inspect page-related information in a live kernel or vmcore.

The script extracts information such as the page flags, mapping, and
other metadata relevant to diagnosing memory issues.

Output example:
sudo ./show_page_info.py 1 0x7fb3eb1b2000
PID: 1 Comm: systemd mm: 0xffff8d27279f9cc0
Raw: 0017ffffc000416c fffff31105a61b08 fffff31105a63608 ffff8d27121326a8
Raw: 0000000000000000 ffff8d271b9dcc40 0000002500000007 ffff8d2711f12700
User Virtual Address: 0x7fb3eb1b2000
Page Address:         0xfffff31106356a00
Page Flags:           PG_referenced|PG_uptodate|PG_lru|PG_head|PG_active|
                      PG_private|PG_reported|PG_has_hwpoisoned
Page Size:            4096
Page PFN:             0x18d5a8
Page Physical:        0x18d5a8000
Page Virtual:         0xffff8d274d5a8000
Page Refcount:        37
Page Mapcount:        7
Page Index:           0x0
Page Memcg Data:      0xffff8d2711f12700
Memcg Name:           init.scope
Memcg Path:           /sys/fs/cgroup/memory/init.scope
Page Mapping:         0xffff8d27121326a8
Page Anon/File:       File
Page VMA:             0xffff8d26cac47600
VMA Start:            0x7fb3eb1b2000
VMA End:              0x7fb3eb1b6000
This page is part of a compound page.
This page is the head page of a compound page.
Head Page:            0xfffff31106356a00
Compound Order:       2
Number of Pages:      4

Signed-off-by: Ye Liu <liuye@kylinos.cn>

Changes in v4:
- Add error and exception handling.
- Adjust the way to obtain PAGE_SIZE.
- Fix the acquisition of memcg.
- Link to v3:https://lore.kernel.org/all/20250423014850.344501-1-ye.liu@linux.dev/

Changes in v3:
- Adjust display style.
- Link to v2:https://lore.kernel.org/all/20250421080748.114750-1-ye.liu@linux.dev/

Changes in v2:
- Move the show_page_info.py file to tools/mm.
- Link to v1: https://lore.kernel.org/all/20250415075024.248232-1-ye.liu@linux.dev/
---
 MAINTAINERS                |   5 ++
 tools/mm/show_page_info.py | 152 +++++++++++++++++++++++++++++++++++++
 2 files changed, 157 insertions(+)
 create mode 100755 tools/mm/show_page_info.py

diff --git a/MAINTAINERS b/MAINTAINERS
index 0cb9e55021cb..3cbd46bf1eab 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -18726,6 +18726,11 @@ F:	Documentation/mm/page_table_check.rst
 F:	include/linux/page_table_check.h
 F:	mm/page_table_check.c
 
+PAGE STATE DEBUG SCRIPT
+M:	Ye Liu <liuye@kylinos.cn>
+S:	Maintained
+F:	tools/mm/show_page_info.py
+
 PANASONIC LAPTOP ACPI EXTRAS DRIVER
 M:	Kenneth Chan <kenneth.t.chan@gmail.com>
 L:	platform-driver-x86@vger.kernel.org
diff --git a/tools/mm/show_page_info.py b/tools/mm/show_page_info.py
new file mode 100755
index 000000000000..5c46501e24f4
--- /dev/null
+++ b/tools/mm/show_page_info.py
@@ -0,0 +1,152 @@
+#!/usr/bin/env drgn
+# SPDX-License-Identifier: GPL-2.0-only
+# Copyright (C) 2025 Ye Liu <liuye@kylinos.cn>
+
+import argparse
+from drgn import Object, FaultError
+from drgn.helpers.linux import find_task, follow_page, page_size
+from drgn.helpers.linux.mm import (
+    decode_page_flags, page_to_pfn, page_to_phys, page_to_virt, vma_find,
+    PageSlab, PageCompound, PageHead, PageTail, compound_head, compound_order, compound_nr
+)
+from drgn.helpers.linux.cgroup import cgroup_name, cgroup_path
Anything in "drgn.helpers.linux.*" can be imported from
"drgn.helpers.linux" instead, which would help if any helper moved
around from one module to another. I've recently started preferring
that, but I don't know if it's a huge improvement. EG:

from drgn.helpers.linux import (
    PageCompound, PageHead, PageSlab, PageTail, cgroup_name,
    cgroup_path, compound_head, compound_nr, compound_order,
    decode_page_flags, find_task, follow_page, page_size, page_to_pfn,
    page_to_phys, page_to_virt, vma_find,
)

Again, not sure it improves anything :)
Thanks for the suggestion! After considering the trade-offs, I prefer
keeping the current imports for clarity:
Readability: Explicit module paths (e.g., mm/, cgroup/) make helper
origins clearer.
Debugging: Functional grouping helps when analyzing code.
Both styles work, but the current approach aligns better with drgn’s
documentation and our workflow. Happy to revisit if needs change.


      
+DESC = """
+This is a drgn script to show the page state.
+For more info on drgn, visit https://github.com/osandov/drgn.
+"""
+
+MEMCG_DATA_OBJEXTS = 1 << 0
+MEMCG_DATA_KMEM = 1 << 1
+__NR_MEMCG_DATA_FLAGS = 1 << 2
These are available as enums since commit 87944e2992bd2 ("mm: Introduce
page memcg flags"). So you can access them without redeclaring their
values like this:

prog.constant("MEMCG_DATA_OBJEXTS")

You can still save them as globals for efficiency.
Got it. I'll use prog.constant("MEMCG_DATA_OBJEXTS") directly.

      
+def format_page_data(data):
+    """Format raw page data into a readable hex dump."""
+    try:
+        chunks = [data[i:i+8] for i in range(0, len(data), 8)]
+        hex_chunks = ["".join(f"{b:02x}" for b in chunk[::-1]) for chunk in chunks]
+        lines = [" ".join(hex_chunks[i:i+4]) for i in range(0, len(hex_chunks), 4)]
+        return "\n".join(f"Raw: {line}" for line in lines)
+    except Exception as e:
+        return f"Error formatting page data: {e}"
I'm thinking this would show big-endian addresses backwards, and on
32-bit architectures it would group the data into 8-byte words, when I
think it would probably be more valuable to show them in the native word
size.

You could resolve the endianness issue by using Program.read_word() to
read each individual word of memory in the correct byte order, and use
Program.platform.flags to distinguish a 32-bit architecture so that you
can compute each address. In fact, check print_annotated_memory() in
drgn which has very similar logic:

https://github.com/osandov/drgn/blob/main/drgn/helpers/common/memory.py

I'll refer to print_annotated_memory() and use Program.read_word() to
handle endianness and native word size properly.

+def get_memcg_info(page):
+    """Retrieve memory cgroup information for a page."""
+    try:
+        memcg_data = page.memcg_data.value_()
+        if memcg_data & MEMCG_DATA_OBJEXTS:
+            slabobj_ext = Object(prog, "struct slabobj_ext *", address=memcg_data & ~(__NR_MEMCG_DATA_FLAGS - 1))
A slightly shorter and more idiomatic way to do it would be using the
cast() function:

slabobj_ext = cast("struct slabobj_ext *", memcg_data & ~(__NR_MEMCG_DATA_FLAGS - 1))

I'll use cast() for cleaner pointer conversion.
+            memcg_value = slabobj_ext.objcg.memcg.value_()
+        elif memcg_data & MEMCG_DATA_KMEM:
+            objcg = Object(prog, "struct obj_cgroup *", address=memcg_data & ~(__NR_MEMCG_DATA_FLAGS - 1))
+            memcg_value = objcg.memcg.value_()
+        else:
+            memcg_value = memcg_data & ~(__NR_MEMCG_DATA_FLAGS - 1)
+
+        if memcg_value == 0:
+            return "none", "/sys/fs/cgroup/memory/"
+
+        memcg = Object(prog, "struct mem_cgroup *", address=memcg_value)
+        cgrp = memcg.css.cgroup
+        return cgroup_name(cgrp).decode(), f"/sys/fs/cgroup/memory{cgroup_path(cgrp).decode()}"
+    except FaultError as e:
+        return "unknown", f"Error retrieving memcg info: {e}"
+    except Exception as e:
+        return "unknown", f"Unexpected error: {e}"
+
+def show_page_state(page, addr, mm, pid, task):
+    """Display detailed information about a page."""
+    try:
+        print(f'PID: {pid} Comm: {task.comm.string_().decode()} mm: {hex(mm)}')
+        try:
+            print(format_page_data(prog.read(page.value_(), 64)))
Rather than hard-code the size of struct page, you can use sizeof(page).
And in fact, all drgn Objects have a .bytes_() that will just give you
the bytes of the object directly, which would even avoid the sizeof().
I didn't find the .bytes_() method. Can you give an example?
I used prog.type("struct page").size instead.             

Though in this case, I'd argue for just passing the page into
format_page_data() and letting it use Program.read_word() to read each
word in the correct endianness, like I said above.

+        except FaultError as e:
+            print(f"Error reading page data: {e}")
+
+        fields = {
+            "User Virtual Address": hex(addr),
+            "Page Address": hex(page.value_()),
+            "Page Flags": decode_page_flags(page),
+            "Page Size": prog["PAGE_SIZE"].value_(),
+            "Page PFN": hex(page_to_pfn(page).value_()),
+            "Page Physical": hex(page_to_phys(page).value_()),
+            "Page Virtual": hex(page_to_virt(page).value_()),
+            "Page Refcount": page._refcount.counter.value_(),
+            "Page Mapcount": page._mapcount.counter.value_(),
+            "Page Index": hex(page.__folio_index.value_()),
+            "Page Memcg Data": hex(page.memcg_data.value_()),
+        }
+
+        memcg_name, memcg_path = get_memcg_info(page)
+        fields["Memcg Name"] = memcg_name
+        fields["Memcg Path"] = memcg_path
+        fields["Page Mapping"] = hex(page.mapping.value_())
+        fields["Page Anon/File"] = "Anon" if page.mapping.value_() & 0x1 else "File"
+
+        try:
+            vma = vma_find(mm, addr)
+            fields["Page VMA"] = hex(vma.value_())
+            fields["VMA Start"] = hex(vma.vm_start.value_())
+            fields["VMA End"] = hex(vma.vm_end.value_())
+        except FaultError as e:
+            fields["Page VMA"] = "Unavailable"
+            fields["VMA Start"] = "Unavailable"
+            fields["VMA End"] = "Unavailable"
+            print(f"Error retrieving VMA information: {e}")
+
+        # Calculate the maximum field name length for alignment
+        max_field_len = max(len(field) for field in fields)
+
+        # Print aligned fields
+        for field, value in fields.items():
+            print(f"{field}:".ljust(max_field_len + 2) + f"{value}")
+
+        # Additional information about the page
+        if PageSlab(page):
+            print("This page belongs to the slab allocator.")
+
+        if PageCompound(page):
+            print("This page is part of a compound page.")
+            if PageHead(page):
+                print("This page is the head page of a compound page.")
+            if PageTail(page):
+                print("This page is the tail page of a compound page.")
+            print(f"{'Head Page:'.ljust(max_field_len + 2)}{hex(compound_head(page).value_())}")
+            print(f"{'Compound Order:'.ljust(max_field_len + 2)}{compound_order(page).value_()}")
+            print(f"{'Number of Pages:'.ljust(max_field_len + 2)}{compound_nr(page).value_()}")
+        else:
+            print("This page is not part of a compound page.")
+    except FaultError as e:
+        print(f"Error accessing page state: {e}")
+    except Exception as e:
+        print(f"Unexpected error: {e}")
+
+def main():
+    """Main function to parse arguments and display page state."""
+    parser = argparse.ArgumentParser(description=DESC, formatter_class=argparse.RawTextHelpFormatter)
+    parser.add_argument('pid', metavar='PID', type=int, help='Target process ID (PID)')
+    parser.add_argument('vaddr', metavar='VADDR', type=str, help='Target virtual address in hexadecimal format (e.g., 0x7fff1234abcd)')
+    args = parser.parse_args()
+
+    try:
+        vaddr = int(args.vaddr, 16)
+    except ValueError:
+        print(f"Error: Invalid virtual address format: {args.vaddr}")
+        return
I find it quite useful to replace things like this with:

    sys.exit(f"Error: Invalid virtual address format: {args.vaddr}")

Which will result in the script exiting with a non-zero exit code, and
it will print the message to stderr, rather than stdout. All while being
one line shorter, for the code golfers :)
Agree, I can replace it in the main() function, but in other places,
I prefer the script to continue running instead of exiting."

The actual logic looks excellent, and most of my suggestions are just
that: suggestions. Sorry for putting this review on v4, I should have
sat down and done this sooner.

Thank you for the above suggestions. These revisions will be incorporated in V5.

Thanks,
Ye                                                                              

Regards,
Stephen

+    try:
+        task = find_task(args.pid)
+        mm = task.mm
+        page = follow_page(mm, vaddr)
+
+        if page:
+            show_page_state(page, vaddr, mm, args.pid, task)
+        else:
+            print(f"Address {hex(vaddr)} is not mapped.")
+    except FaultError as e:
+        print(f"Error accessing task or memory: {e}")
+    except Exception as e:
+        print(f"Unexpected error: {e}")
+
+if __name__ == "__main__":
+    main()
-- 
2.25.1
--------------2CrJUTzOUFXivycuE4dDs4Xf--