From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06BFEC3DA6E for ; Wed, 3 Jan 2024 06:45:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1C92E8D0032; Wed, 3 Jan 2024 01:45:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 17A508D0031; Wed, 3 Jan 2024 01:45:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F356B8D0032; Wed, 3 Jan 2024 01:45:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id DBC908D0031 for ; Wed, 3 Jan 2024 01:45:07 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A02C48010A for ; Wed, 3 Jan 2024 06:45:07 +0000 (UTC) X-FDA: 81637062654.21.836C765 Received: from mail-ua1-f48.google.com (mail-ua1-f48.google.com [209.85.222.48]) by imf09.hostedemail.com (Postfix) with ESMTP id 3798014000B for ; Wed, 3 Jan 2024 06:45:03 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GBs1OUFE; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.48 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1704264304; a=rsa-sha256; cv=none; b=gz2MctwffK2VqPO+UXMccjAmq1eMEOLdY4bnIYPbmNu0gOcMi4gMtht2wV+iThh4ja5CaP OXWjIzire/dhUkpkZjWakJGGqbYCeFnRBjsWT5zhJm0DCKBfqUz1M+PPYYuAvoV8xTcpAy SQ7euUMIJ4esKMo24OB4T/z1zcgHQew= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GBs1OUFE; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.48 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1704264304; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=s2kv/okV7FsD34nkZj85EnCm58YL2CHnCE8TsPjm8Ko=; b=oyMuI1Uz5wFAeRxXN5vpd3v1mPmVVnL2gM40KWF9Q66LIwdQnEuBF72qd9zPEpkYiDptWh VSVyZi46SP8Ny68jfccV5+MaydqY+cwQXJ1pOX0IxWmccE+bpeFudRpwrC3PGWcGxc+6Ak A5MB7xWAks8gehpAbASirjFxmVe4UdE= Received: by mail-ua1-f48.google.com with SMTP id a1e0cc1a2514c-7cc7bae27b5so2214949241.2 for ; Tue, 02 Jan 2024 22:45:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1704264303; x=1704869103; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=s2kv/okV7FsD34nkZj85EnCm58YL2CHnCE8TsPjm8Ko=; b=GBs1OUFEvphbDmUsNa3LxJcLMIC2MqA5KJEuGZUbp292OwlmJ59dl8SUIliHBwuJrC LLkERZsCoYzuMTlQnXisXadeaO1T0j3dT0kk7ZAn4fXjTA0r2cAvKTYU2LmlsPpgkOJi 2iNkkvG8RV8SFgzAzbaDlxco8KMahKOafaeF2Rn5VpNRgsfJoHqQkP3VUjoXcy0/Yoco /x75tTYUI/z02/lxuqAqCfnPGcAHpuzsXG43qVEjQRvn2Bx0Yo6wO4qKAEeE0K0CQP7n UAr1IJh6Qs6pRD4Ld8j4R82wYtZNY3IgTuZ9BuTzLnMysCU/x5PhBH2CIhy6jFYKsDX+ mQwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704264303; x=1704869103; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=s2kv/okV7FsD34nkZj85EnCm58YL2CHnCE8TsPjm8Ko=; b=UPj+bpWCzNYsZjfkNq3Wfd5OSCjacceXPmkAE5MN6ptPjSk9gZCJ9qwVYQYR2CwJti XtgHstcx1xQIzklMivr0KiPtgcwhsVYPcOr+V/AgEcayf8h1sF5SYtKS3LhZDN5uf4xV WEuVkNyZQkrEaeeJj8lsgvqHafYugrPVvOZ2ikPbcnKyyK2xJVceE1tC460KbSYPTAml I5GqPos+6oBkTDGRZfq6psePh+gtcUocGe7GJclQdAJmAh0lTFUjdk3gxx5b211JI0Tg 4LX7S6IyeifhqrqCCVY0lVsyKxumLJ8wGLZxhGz6BpJSm3If/oPDjiH7AUBi1HjNSiM7 7KSg== X-Gm-Message-State: AOJu0Yyqm3znBZB/PNldLqoyC/ZOGxELz0r9SVg7bQqIZz+GJNDo39al ZVQjNa2F8BIZgcRfjbn30VD8X6qIBJvysQP6zdw= X-Google-Smtp-Source: AGHT+IHA2ERE5DwOVPA6KN9K6r4GwF1tBgiYe0XhYWggEiWK8Jw4+gd0ToOI7hAHxIG1IhbH7wp+5Uj76tQrZ5Mq7Z0= X-Received: by 2002:a05:6102:1485:b0:466:bd26:f1f6 with SMTP id d5-20020a056102148500b00466bd26f1f6mr10976534vsv.29.1704264303067; Tue, 02 Jan 2024 22:45:03 -0800 (PST) MIME-Version: 1.0 References: <20240102153828.1002295-1-ryan.roberts@arm.com> In-Reply-To: <20240102153828.1002295-1-ryan.roberts@arm.com> From: Barry Song <21cnbao@gmail.com> Date: Wed, 3 Jan 2024 19:44:51 +1300 Message-ID: Subject: Re: [RFC PATCH v1] tools/mm: Add thpmaps script to dump THP usage info To: Ryan Roberts Cc: Andrew Morton , Zenghui Yu , Matthew Wilcox , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , Alistair Popple , linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 3798014000B X-Stat-Signature: y5kuzxuhxwdm84g6ihey8ytdfff3x57w X-HE-Tag: 1704264303-847896 X-HE-Meta: U2FsdGVkX1+MFLTo7ijyy+875GFlZDjtuKtE0T3ru4/RRrHP45Oq8u3wkAVvNStori3WUI1h+fsEb8fj2q2KOoD6UcdvqmKM6PeUo1Y80GQFdMwIi3098aMtgro/FHXeGRYYLR1HpfpZSHNZ92+pZrG9BXU/qFlZoov31nYTp/rXkGRLknGgqnlgddmYBlAhHzskr/uW/kC/O+NyyNrDEKPT5anveWc2jVOM0aIRx/tbeWkHct9v3e8NlQrGwsIWBb09NKtpqlP9+6w8+uXMO8rQi/gbtxjwdXlO3UKOuNn1GXZ+/QBKEhruJ+3tGg5ozg7wqKkeugSyLFA5zC0AJxjMhNU5K7ip6j7n2OPs7thSCZIttpRk0yDQvmZEeQ3MgJRejbq7+xVcSeQTaeVoa2CrslOmXokjFBEKY25K5ADDprJ6EyGolqh0/8dHSj5UN0X/RnIKVjR8u0hm4eZ+VFtjr5CaPMf1qTLD3DptLj7seFzdbDkRjDJIjqKa7sgci6iog5fYflW8SBUw36wPq1DLq4yPWrlyD2OQd5uMFByHQVfAEG4mIO/g4U3YzP9i6qvQxQqPFXMwg5Y/WrIKjp4pUX8XFPtN/zlpgNFqr9nMqSRcwEsVdnp7DgPXO3GaJJvSV7qe+om9Rveu3iGoy3LQXU5cRCsJ/mccjsLueOkzgzK3ZjLVnHKws6tFIZkDwqAZf+fygNGPQfLuM/+mvA18X+uGHgRxnScY7cf40dhZl2VgyyrBfNzuiZiIyJQrEBF9Gle1haGZf3GmuFz0k/KupdymV9o5CI6zG6MC74qkjmFSkIcXWlpZ6xMCl1o9OQIy3lZG+BGviiDBDM+R/ENRqGj4zgXduSVyfecZRETLgBm4OvV+XJxRzAxw45S2EO9LC7eSACT3cId/LXWWCxPORdqH73VVJANNdEm+PEWFHy0SfegxryTE4jbR8XxV3WGExw2F2Z5zd8F8G2V 6yRbbzYi rtu3pFAlgmDVO9cGFo70cL2RUTlPzMldJNDFeHMXNM+/84eoSTwXTnEs6W9jvSJt7i0a0A29LbLbKHEVBGMTpnJZiusrSJOr8KzLZOU3j/uEncD1xtPlhpXQWOFMlO5mO31PXa90smghlUmac5q2L5QL4gguzAr/py5/RGaCNH/rIyJe+7mzUBIXCRFTqVScLjP5e8wZoelyJpblzHbC3LxVts/KXjexhgdSHgr2BbwbVXxvEJ/dFHzTW74ENjXr5znEB5oVlu6+BpXoT6PDzQeGph4iQF9sdO3JcVgZJCDQ0UqYWXZZdNidaJWALmBQU7zZB X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jan 3, 2024 at 4:38=E2=80=AFAM Ryan Roberts = wrote: > > With the proliferation of large folios for file-backed memory, and more > recently the introduction of multi-size THP for anonymous memory, it is > becoming useful to be able to see exactly how large folios are mapped > into processes. For some architectures (e.g. arm64), if most memory is > mapped using contpte-sized and -aligned blocks, TLB usage can be > optimized so it's useful to see where these requirements are and are not > being met. > > thpmaps is a Python utility that reads /proc//smaps, > /proc//pagemap and /proc/kpageflags to print information about how > transparent huge pages (both file and anon) are mapped to a specified > process or cgroup. It aims to help users debug and optimize their > workloads. In future we may wish to introduce stats directly into the > kernel (e.g. smaps or similar), but for now this provides a short term > solution without the need to introduce any new ABI. > > Run with help option for a full listing of the arguments: > > # thpmaps --help > > --8<-- > usage: thpmaps [-h] [--pid pid] [--cgroup path] [--summary] > [--cont size[KMG]] [--inc-smaps] [--inc-empty] > [--periodic sleep_ms] > > Prints information about how transparent huge pages are mapped to a > specified process or cgroup. Shows statistics for fully-mapped THPs of > every size, mapped both naturally aligned and unaligned for both file > and anonymous memory. See [anon|file]-thp-[aligned|unaligned]-kB > keys. Shows statistics for mapped pages that belong to a THP but which > are not fully mapped. See [anon|file]-thp-partial keys. Optionally > shows statistics for naturally aligned, contiguous blocks of memory of > a specified size (when --cont is provided). See [anon|file]-cont- > aligned-kB keys. Statistics are shown in kB and as a percentage > of either total anon or file memory as appropriate. > > options: > -h, --help show this help message and exit > --pid pid Process id of the target process. Exactly one of > --pid and --cgroup must be provided. > --cgroup path Path to the target cgroup in sysfs. Iterates > over every pid in the cgroup. Exactly one of > --pid and --cgroup must be provided. > --summary Sum the per-vma statistics to provide a summary > over the whole process or cgroup. > --cont size[KMG] Adds anon and file stats for naturally aligned, > contiguously mapped blocks of the specified > size. May be issued multiple times to track > multiple sized blocks. Useful to infer e.g. > arm64 contpte and hpa mappings. Size must be a > power-of-2 number of pages. > --inc-smaps Include all numerical, additive > /proc//smaps stats in the output. > --inc-empty Show all statistics including those whose value > is 0. > --periodic sleep_ms Run in a loop, polling every sleep_ms > milliseconds. > > Requires root privilege to access pagemap and kpageflags. > --8<-- > > Example command to summarise fully and partially mapped THPs and 64K > contiguous blocks over all VMAs in a single process (--inc-empty forces > printing stats that are 0): > > # ./thpmaps --pid 10837 --cont 64K --summary --inc-empty > > --8<-- > anon-thp-aligned-16kB: 16 kB ( 0%) > anon-thp-aligned-32kB: 0 kB ( 0%) > anon-thp-aligned-64kB: 4194304 kB (100%) > anon-thp-aligned-128kB: 0 kB ( 0%) > anon-thp-aligned-256kB: 0 kB ( 0%) > anon-thp-aligned-512kB: 0 kB ( 0%) > anon-thp-aligned-1024kB: 0 kB ( 0%) > anon-thp-aligned-2048kB: 0 kB ( 0%) > anon-thp-unaligned-16kB: 0 kB ( 0%) > anon-thp-unaligned-32kB: 0 kB ( 0%) > anon-thp-unaligned-64kB: 0 kB ( 0%) > anon-thp-unaligned-128kB: 0 kB ( 0%) > anon-thp-unaligned-256kB: 0 kB ( 0%) > anon-thp-unaligned-512kB: 0 kB ( 0%) > anon-thp-unaligned-1024kB: 0 kB ( 0%) > anon-thp-unaligned-2048kB: 0 kB ( 0%) > anon-thp-partial: 0 kB ( 0%) > file-thp-aligned-16kB: 16 kB ( 1%) > file-thp-aligned-32kB: 64 kB ( 5%) > file-thp-aligned-64kB: 640 kB (50%) > file-thp-aligned-128kB: 128 kB (10%) > file-thp-aligned-256kB: 0 kB ( 0%) > file-thp-aligned-512kB: 0 kB ( 0%) > file-thp-aligned-1024kB: 0 kB ( 0%) > file-thp-aligned-2048kB: 0 kB ( 0%) > file-thp-unaligned-16kB: 16 kB ( 1%) > file-thp-unaligned-32kB: 32 kB ( 3%) > file-thp-unaligned-64kB: 64 kB ( 5%) > file-thp-unaligned-128kB: 0 kB ( 0%) > file-thp-unaligned-256kB: 0 kB ( 0%) > file-thp-unaligned-512kB: 0 kB ( 0%) > file-thp-unaligned-1024kB: 0 kB ( 0%) > file-thp-unaligned-2048kB: 0 kB ( 0%) > file-thp-partial: 12 kB ( 1%) > anon-cont-aligned-64kB: 4194304 kB (100%) > file-cont-aligned-64kB: 768 kB (61%) > --8<-- > > Signed-off-by: Ryan Roberts > --- Hi Ryan, I ran a couple of test cases with different parameters, it seems to work correctly. just i don't understand the below, what is the meaning of 000000ce at the beginning of each line? /thpmaps --pid 206 --cont 64K 000000ce 0000aaaadbb20000-0000aaaadbb21000 r-xp 00000000 fe:00 00426969 /root/a.out 000000ce 0000aaaadbb3f000-0000aaaadbb40000 r--p 0000f000 fe:00 00426969 /root/a.out 000000ce 0000aaaadbb40000-0000aaaadbb41000 rw-p 00010000 fe:00 00426969 /root/a.out 000000ce 0000ffff702c0000-0000ffffb02c0000 rw-p 00000000 00:00 00000000 anon-thp-aligned-64kB: 473920 kB (100%) anon-cont-aligned-64kB: 473920 kB (100%) 000000ce 0000ffffb02c0000-0000ffffb044c000 r-xp 00000000 fe:00 00395429 /usr/lib/aarch64-linux-gnu/libc.so.6 000000ce 0000ffffb044c000-0000ffffb045d000 ---p 0018c000 fe:00 00395429 /usr/lib/aarch64-linux-gnu/libc.so.6 000000ce 0000ffffb045d000-0000ffffb0460000 r--p 0018d000 fe:00 00395429 /usr/lib/aarch64-linux-gnu/libc.so.6 000000ce 0000ffffb0460000-0000ffffb0462000 rw-p 00190000 fe:00 00395429 /usr/lib/aarch64-linux-gnu/libc.so.6 000000ce 0000ffffb0462000-0000ffffb046f000 rw-p 00000000 00:00 00000000 000000ce 0000ffffb0477000-0000ffffb049d000 r-xp 00000000 fe:00 00393893 /usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1 000000ce 0000ffffb04b0000-0000ffffb04b2000 rw-p 00000000 00:00 00000000 000000ce 0000ffffb04b2000-0000ffffb04b4000 r--p 00000000 00:00 00000000 [vv= ar] 000000ce 0000ffffb04b4000-0000ffffb04b5000 r-xp 00000000 00:00 00000000 [vd= so] 000000ce 0000ffffb04b5000-0000ffffb04b7000 r--p 0002e000 fe:00 00393893 /usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1 000000ce 0000ffffb04b7000-0000ffffb04b9000 rw-p 00030000 fe:00 00393893 /usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1 000000ce 0000ffffdaba4000-0000ffffdabc5000 rw-p 00000000 00:00 00000000 [st= ack] > > I've found this very useful for debugging, and I know others have request= ed a > way to check if mTHP and contpte is working, so thought this might a good= short > term solution until we figure out how best to add stats in the kernel? > > Thanks, > Ryan > > tools/mm/Makefile | 9 +- > tools/mm/thpmaps | 573 ++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 578 insertions(+), 4 deletions(-) > create mode 100755 tools/mm/thpmaps > > diff --git a/tools/mm/Makefile b/tools/mm/Makefile > index 1c5606cc3334..7bb03606b9ea 100644 > --- a/tools/mm/Makefile > +++ b/tools/mm/Makefile > @@ -3,7 +3,8 @@ > # > include ../scripts/Makefile.include > > -TARGETS=3Dpage-types slabinfo page_owner_sort > +BUILD_TARGETS=3Dpage-types slabinfo page_owner_sort > +INSTALL_TARGETS =3D $(BUILD_TARGETS) thpmaps > > LIB_DIR =3D ../lib/api > LIBS =3D $(LIB_DIR)/libapi.a > @@ -11,9 +12,9 @@ LIBS =3D $(LIB_DIR)/libapi.a > CFLAGS +=3D -Wall -Wextra -I../lib/ -pthread > LDFLAGS +=3D $(LIBS) -pthread > > -all: $(TARGETS) > +all: $(BUILD_TARGETS) > > -$(TARGETS): $(LIBS) > +$(BUILD_TARGETS): $(LIBS) > > $(LIBS): > make -C $(LIB_DIR) > @@ -29,4 +30,4 @@ sbindir ?=3D /usr/sbin > > install: all > install -d $(DESTDIR)$(sbindir) > - install -m 755 -p $(TARGETS) $(DESTDIR)$(sbindir) > + install -m 755 -p $(INSTALL_TARGETS) $(DESTDIR)$(sbindir) > diff --git a/tools/mm/thpmaps b/tools/mm/thpmaps > new file mode 100755 > index 000000000000..af9b19f63eb4 > --- /dev/null > +++ b/tools/mm/thpmaps > @@ -0,0 +1,573 @@ > +#!/usr/bin/env python3 > +# SPDX-License-Identifier: GPL-2.0-only > +# Copyright (C) 2024 ARM Ltd. > +# > +# Utility providing smaps-like output detailing transparent hugepage usa= ge. > +# For more info, run: > +# ./thpmaps --help > +# > +# Requires numpy: > +# pip3 install numpy > + > + > +import argparse > +import collections > +import math > +import os > +import re > +import resource > +import shutil > +import sys > +import time > +import numpy as np > + > + > +with open('/sys/kernel/mm/transparent_hugepage/hpage_pmd_size') as f: > + PAGE_SIZE =3D resource.getpagesize() > + PAGE_SHIFT =3D int(math.log2(PAGE_SIZE)) > + PMD_SIZE =3D int(f.read()) > + PMD_ORDER =3D int(math.log2(PMD_SIZE / PAGE_SIZE)) > + > + > +def align_forward(v, a): > + return (v + (a - 1)) & ~(a - 1) > + > + > +def align_offset(v, a): > + return v & (a - 1) > + > + > +def nrkb(nr): > + # Convert number of pages to KB. > + return (nr << PAGE_SHIFT) >> 10 > + > + > +def odkb(order): > + # Convert page order to KB. > + return nrkb(1 << order) > + > + > +def cont_ranges_all(arrs): > + # Given a list of arrays, find the ranges for which values are monot= onically > + # incrementing in all arrays. > + assert(len(arrs) > 0) > + sz =3D len(arrs[0]) > + for arr in arrs: > + assert(arr.shape =3D=3D (sz,)) > + r =3D np.full(sz, 2) > + d =3D np.diff(arrs[0]) =3D=3D 1 > + for dd in [np.diff(arr) =3D=3D 1 for arr in arrs[1:]]: > + d &=3D dd > + r[1:] -=3D d > + r[:-1] -=3D d > + return [np.repeat(arr, r).reshape(-1, 2) for arr in arrs] > + > + > +class ArgException(Exception): > + pass > + > + > +class FileIOException(Exception): > + pass > + > + > +class BinArrayFile: > + # Base class used to read /proc//pagemap and /proc/kpageflags i= nto a > + # numpy array. Use inherrited class in a with clause to ensure file = is > + # closed when it goes out of scope. > + def __init__(self, filename, element_size): > + self.element_size =3D element_size > + self.filename =3D filename > + self.fd =3D os.open(self.filename, os.O_RDONLY) > + > + def cleanup(self): > + os.close(self.fd) > + > + def __enter__(self): > + return self > + > + def __exit__(self, exc_type, exc_val, exc_tb): > + self.cleanup() > + > + def _readin(self, offset, buffer): > + length =3D os.preadv(self.fd, (buffer,), offset) > + if len(buffer) !=3D length: > + raise FileIOException('error: {} failed to read {} bytes at = {:x}' > + .format(self.filename, len(buffer), offset)) > + > + def _toarray(self, buf): > + assert(self.element_size =3D=3D 8) > + return np.frombuffer(buf, dtype=3Dnp.uint64) > + > + def getv(self, vec): > + sz =3D 0 > + for region in vec: > + sz +=3D int(region[1] - region[0] + 1) * self.element_size > + buf =3D bytearray(sz) > + view =3D memoryview(buf) > + pos =3D 0 > + for region in vec: > + offset =3D int(region[0]) * self.element_size > + length =3D int(region[1] - region[0] + 1) * self.element_siz= e > + self._readin(offset, view[pos:pos+length]) > + pos +=3D length > + return self._toarray(buf) > + > + def get(self, index, nr=3D1): > + offset =3D index * self.element_size > + length =3D nr * self.element_size > + buf =3D bytearray(length) > + self._readin(offset, buf) > + return self._toarray(buf) > + > + > +PM_PAGE_PRESENT =3D 1 << 63 > +PM_PFN_MASK =3D (1 << 55) - 1 > + > +class PageMap(BinArrayFile): > + # Read ranges of a given pid's pagemap into a numpy array. > + def __init__(self, pid=3D'self'): > + super().__init__(f'/proc/{pid}/pagemap', 8) > + > + > +KPF_ANON =3D 1 << 12 > +KPF_COMPOUND_HEAD =3D 1 << 15 > +KPF_COMPOUND_TAIL =3D 1 << 16 > + > +class KPageFlags(BinArrayFile): > + # Read ranges of /proc/kpageflags into a numpy array. > + def __init__(self): > + super().__init__(f'/proc/kpageflags', 8) > + > + > +VMA =3D collections.namedtuple('VMA', [ > + 'name', > + 'start', > + 'end', > + 'read', > + 'write', > + 'execute', > + 'private', > + 'pgoff', > + 'major', > + 'minor', > + 'inode', > + 'stats', > +]) > + > +class VMAList: > + # A container for VMAs, parsed from /proc//smaps. Iterate over = the > + # instance to receive VMAs. > + head_regex =3D re.compile(r"^([\da-f]+)-([\da-f]+) ([r-])([w-])([x-]= )([ps]) ([\da-f]+) ([\da-f]+):([\da-f]+) ([\da-f]+)\s*(.*)$") > + kb_item_regex =3D re.compile(r"(\w+):\s*(\d+)\s*kB") > + > + def __init__(self, pid=3D'self'): > + def is_vma(line): > + return self.head_regex.search(line) !=3D None > + > + def get_vma(line): > + m =3D self.head_regex.match(line) > + if m is None: > + return None > + return VMA( > + name=3Dm.group(11), > + start=3Dint(m.group(1), 16), > + end=3Dint(m.group(2), 16), > + read=3Dm.group(3) =3D=3D 'r', > + write=3Dm.group(4) =3D=3D 'w', > + execute=3Dm.group(5) =3D=3D 'x', > + private=3Dm.group(6) =3D=3D 'p', > + pgoff=3Dint(m.group(7), 16), > + major=3Dint(m.group(8), 16), > + minor=3Dint(m.group(9), 16), > + inode=3Dint(m.group(10), 16), > + stats=3D{}, > + ) > + > + def get_value(line): > + # Currently only handle the KB stats because they are summed= for > + # --summary. Core code doesn't know how to combine other sta= ts. > + exclude =3D ['KernelPageSize', 'MMUPageSize'] > + m =3D self.kb_item_regex.search(line) > + if m: > + param =3D m.group(1) > + if param not in exclude: > + value =3D int(m.group(2)) > + return param, value > + return None, None > + > + def parse_smaps(file): > + vmas =3D [] > + i =3D 0 > + > + line =3D file.readline() > + > + while True: > + if not line: > + break > + line =3D line.strip() > + > + i +=3D 1 > + > + vma =3D get_vma(line) > + if vma is None: > + raise FileIOException(f'error: could not parse line = {i}: "{line}"') > + > + while True: > + line =3D file.readline() > + if not line: > + break > + line =3D line.strip() > + if is_vma(line): > + break > + > + i +=3D 1 > + > + param, value =3D get_value(line) > + if param: > + vma.stats[param] =3D {'type': None, 'value': val= ue} > + > + vmas.append(vma) > + > + return vmas > + > + with open(f'/proc/{pid}/smaps', 'r') as file: > + self.vmas =3D parse_smaps(file) > + > + def __iter__(self): > + yield from self.vmas > + > + > +def thp_parse(max_order, kpageflags, vfns, pfns, anons, heads): > + # Given 4 same-sized arrays representing a range within a page table= backed > + # by THPs (vfns: virtual frame numbers, pfns: physical frame numbers= , anons: > + # True if page is anonymous, heads: True if page is head of a THP), = return a > + # dictionary of statistics describing the mapped THPs. > + stats =3D { > + 'file': { > + 'partial': 0, > + 'aligned': [0] * (max_order + 1), > + 'unaligned': [0] * (max_order + 1), > + }, > + 'anon': { > + 'partial': 0, > + 'aligned': [0] * (max_order + 1), > + 'unaligned': [0] * (max_order + 1), > + }, > + } > + > + indexes =3D np.arange(len(vfns), dtype=3Dnp.uint64) > + ranges =3D cont_ranges_all([indexes, vfns, pfns]) > + for rindex, rpfn in zip(ranges[0], ranges[2]): > + index_next =3D int(rindex[0]) > + index_end =3D int(rindex[1]) + 1 > + pfn_end =3D int(rpfn[1]) + 1 > + > + folios =3D indexes[index_next:index_end][heads[index_next:index_= end]] > + > + # Account pages for any partially mapped THP at the front. In th= at case, > + # the first page of the range is a tail. > + nr =3D (int(folios[0]) if len(folios) else index_end) - index_ne= xt > + stats['anon' if anons[index_next] else 'file']['partial'] +=3D n= r > + > + # Account pages for any partially mapped THP at the back. In tha= t case, > + # the next page after the range is a tail. > + if len(folios): > + flags =3D int(kpageflags.get(pfn_end)[0]) > + if flags & KPF_COMPOUND_TAIL: > + nr =3D index_end - int(folios[-1]) > + folios =3D folios[:-1] > + index_end -=3D nr > + stats['anon' if anons[index_end - 1] else 'file']['parti= al'] +=3D nr > + > + # Account fully mapped THPs in the middle of the range. > + if len(folios): > + folio_nrs =3D np.append(np.diff(folios), np.uint64(index_end= - folios[-1])) > + folio_orders =3D np.log2(folio_nrs).astype(np.uint64) > + for index, order in zip(folios, folio_orders): > + index =3D int(index) > + order =3D int(order) > + nr =3D 1 << order > + vfn =3D int(vfns[index]) > + align =3D 'aligned' if align_forward(vfn, nr) =3D=3D vfn= else 'unaligned' > + anon =3D 'anon' if anons[index] else 'file' > + stats[anon][align][order] +=3D nr > + > + rstats =3D {} > + > + def flatten_sub(type, subtype, stats): > + for od, nr in enumerate(stats[2:], 2): > + rstats[f"{type}-thp-{subtype}-{odkb(od)}kB"] =3D {'type': ty= pe, 'value': nrkb(nr)} > + > + def flatten_type(type, stats): > + flatten_sub(type, 'aligned', stats['aligned']) > + flatten_sub(type, 'unaligned', stats['unaligned']) > + rstats[f"{type}-thp-partial"] =3D {'type': type, 'value': nrkb(s= tats['partial'])} > + > + flatten_type('anon', stats['anon']) > + flatten_type('file', stats['file']) > + > + return rstats > + > + > +def cont_parse(order, vfns, pfns, anons, heads): > + # Given 4 same-sized arrays representing a range within a page table= backed > + # by THPs (vfns: virtual frame numbers, pfns: physical frame numbers= , anons: > + # True if page is anonymous, heads: True if page is head of a THP), = return a > + # dictionary of statistics describing the contiguous blocks. > + nr_cont =3D 1 << order > + nr_anon =3D 0 > + nr_file =3D 0 > + > + ranges =3D cont_ranges_all([np.arange(len(vfns), dtype=3Dnp.uint64),= vfns, pfns]) > + for rindex, rvfn, rpfn in zip(*ranges): > + index_next =3D int(rindex[0]) > + index_end =3D int(rindex[1]) + 1 > + vfn_start =3D int(rvfn[0]) > + pfn_start =3D int(rpfn[0]) > + > + if align_offset(pfn_start, nr_cont) !=3D align_offset(vfn_start,= nr_cont): > + continue > + > + off =3D align_forward(vfn_start, nr_cont) - vfn_start > + index_next +=3D off > + > + while index_next + nr_cont <=3D index_end: > + folio_boundary =3D heads[index_next+1:index_next+nr_cont].an= y() > + if not folio_boundary: > + if anons[index_next]: > + nr_anon +=3D nr_cont > + else: > + nr_file +=3D nr_cont > + index_next +=3D nr_cont > + > + return { > + f"anon-cont-aligned-{nrkb(nr_cont)}kB": {'type': 'anon', 'value'= : nrkb(nr_anon)}, > + f"file-cont-aligned-{nrkb(nr_cont)}kB": {'type': 'file', 'value'= : nrkb(nr_file)}, > + } > + > + > +def vma_print(vma, pid): > + # Prints a VMA instance in a format similar to smaps. The main diffe= rence is > + # that the pid is included as the first value. > + print("{:08x} {:016x}-{:016x} {}{}{}{} {:08x} {:02x}:{:02x} {:08x} {= }" > + .format( > + pid, vma.start, vma.end, > + 'r' if vma.read else '-', 'w' if vma.write else '-', > + 'x' if vma.execute else '-', 'p' if vma.private else 's', > + vma.pgoff, vma.major, vma.minor, vma.inode, vma.name > + )) > + > + > +def stats_print(stats, tot_anon, tot_file, inc_empty): > + # Print a statistics dictionary. > + label_field =3D 32 > + for label, stat in stats.items(): > + type =3D stat['type'] > + value =3D stat['value'] > + if value or inc_empty: > + pad =3D max(0, label_field - len(label) - 1) > + if type =3D=3D 'anon': > + percent =3D f' ({value / tot_anon:3.0%})' > + elif type =3D=3D 'file': > + percent =3D f' ({value / tot_file:3.0%})' > + else: > + percent =3D '' > + print(f"{label}:{' ' * pad}{value:8} kB{percent}") > + > + > +def vma_parse(vma, pagemap, kpageflags, contorders): > + # Generate thp and cont statistics for a single VMA. > + start =3D vma.start >> PAGE_SHIFT > + end =3D vma.end >> PAGE_SHIFT > + > + pmes =3D pagemap.get(start, end - start) > + present =3D pmes & PM_PAGE_PRESENT !=3D 0 > + pfns =3D pmes & PM_PFN_MASK > + pfns =3D pfns[present] > + vfns =3D np.arange(start, end, dtype=3Dnp.uint64) > + vfns =3D vfns[present] > + > + flags =3D kpageflags.getv(cont_ranges_all([pfns])[0]) > + anons =3D flags & KPF_ANON !=3D 0 > + heads =3D flags & KPF_COMPOUND_HEAD !=3D 0 > + tails =3D flags & KPF_COMPOUND_TAIL !=3D 0 > + thps =3D heads | tails > + > + tot_anon =3D np.count_nonzero(anons) > + tot_file =3D np.size(anons) - tot_anon > + tot_anon =3D nrkb(tot_anon) > + tot_file =3D nrkb(tot_file) > + > + vfns =3D vfns[thps] > + pfns =3D pfns[thps] > + anons =3D anons[thps] > + heads =3D heads[thps] > + > + thpstats =3D thp_parse(PMD_ORDER, kpageflags, vfns, pfns, anons, hea= ds) > + contstats =3D [cont_parse(order, vfns, pfns, anons, heads) for order= in contorders] > + > + return { > + **thpstats, > + **{k: v for s in contstats for k, v in s.items()} > + }, tot_anon, tot_file > + > + > +def do_main(args): > + pids =3D set() > + summary =3D {} > + summary_anon =3D 0 > + summary_file =3D 0 > + > + if args.cgroup: > + with open(f'{args.cgroup}/cgroup.procs') as pidfile: > + for line in pidfile.readlines(): > + pids.add(int(line.strip())) > + else: > + pids.add(args.pid) > + > + for pid in pids: > + try: > + with PageMap(pid) as pagemap: > + with KPageFlags() as kpageflags: > + for vma in VMAList(pid): > + if (vma.read or vma.write or vma.execute) and vm= a.stats['Rss']['value'] > 0: > + stats, vma_anon, vma_file =3D vma_parse(vma,= pagemap, kpageflags, args.cont) > + else: > + stats =3D {} > + vma_anon =3D 0 > + vma_file =3D 0 > + if args.inc_smaps: > + stats =3D {**vma.stats, **stats} > + if args.summary: > + for k, v in stats.items(): > + if k in summary: > + assert(summary[k]['type'] =3D=3D v['= type']) > + summary[k]['value'] +=3D v['value'] > + else: > + summary[k] =3D v > + summary_anon +=3D vma_anon > + summary_file +=3D vma_file > + else: > + vma_print(vma, pid) > + stats_print(stats, vma_anon, vma_file, args.= inc_empty) > + except FileNotFoundError: > + if not args.cgroup: > + raise > + except ProcessLookupError: > + if not args.cgroup: > + raise > + > + if args.summary: > + stats_print(summary, summary_anon, summary_file, args.inc_empty) > + > + > +def main(): > + def formatter(prog): > + width =3D shutil.get_terminal_size().columns > + width -=3D 2 > + width =3D min(80, width) > + return argparse.HelpFormatter(prog, width=3Dwidth) > + > + def size2order(human): > + units =3D {"K": 2**10, "M": 2**20, "G": 2**30} > + unit =3D 1 > + if human[-1] in units: > + unit =3D units[human[-1]] > + human =3D human[:-1] > + try: > + size =3D int(human) > + except ValueError: > + raise ArgException('error: --cont value must be integer size= with optional KMG unit') > + size *=3D unit > + order =3D int(math.log2(size / PAGE_SIZE)) > + if order < 1: > + raise ArgException('error: --cont value must be size of at l= east 2 pages') > + if (1 << order) * PAGE_SIZE !=3D size: > + raise ArgException('error: --cont value must be size of powe= r-of-2 pages') > + return order > + > + parser =3D argparse.ArgumentParser(formatter_class=3Dformatter, > + description=3D"""Prints information about how transparent huge p= ages are > + mapped to a specified process or cgroup. > + > + Shows statistics for fully-mapped THPs of every size= , mapped > + both naturally aligned and unaligned for both file a= nd > + anonymous memory. See > + [anon|file]-thp-[aligned|unaligned]-kB keys. > + > + Shows statistics for mapped pages that belong to a T= HP but > + which are not fully mapped. See [anon|file]-thp-part= ial > + keys. > + > + Optionally shows statistics for naturally aligned, > + contiguous blocks of memory of a specified size (whe= n --cont > + is provided). See [anon|file]-cont-aligned-kB = keys. > + > + Statistics are shown in kB and as a percentage of ei= ther > + total anon or file memory as appropriate.""", > + epilog=3D"""Requires root privilege to access pagemap and kpagef= lags.""") > + > + parser.add_argument('--pid', > + metavar=3D'pid', required=3DFalse, type=3Dint, > + help=3D"""Process id of the target process. Exactly one of --pid= and > + --cgroup must be provided.""") > + > + parser.add_argument('--cgroup', > + metavar=3D'path', required=3DFalse, > + help=3D"""Path to the target cgroup in sysfs. Iterates over ever= y pid in > + the cgroup. Exactly one of --pid and --cgroup must be provid= ed.""") > + > + parser.add_argument('--summary', > + required=3DFalse, default=3DFalse, action=3D'store_true', > + help=3D"""Sum the per-vma statistics to provide a summary over t= he whole > + process or cgroup.""") > + > + parser.add_argument('--cont', > + metavar=3D'size[KMG]', required=3DFalse, default=3D[], action=3D= 'append', > + help=3D"""Adds anon and file stats for naturally aligned, contig= uously > + mapped blocks of the specified size. May be issued multiple = times to > + track multiple sized blocks. Useful to infer e.g. arm64 cont= pte and > + hpa mappings. Size must be a power-of-2 number of pages.""") > + > + parser.add_argument('--inc-smaps', > + required=3DFalse, default=3DFalse, action=3D'store_true', > + help=3D"""Include all numerical, additive /proc//smaps stat= s in the > + output.""") > + > + parser.add_argument('--inc-empty', > + required=3DFalse, default=3DFalse, action=3D'store_true', > + help=3D"""Show all statistics including those whose value is 0."= "") > + > + parser.add_argument('--periodic', > + metavar=3D'sleep_ms', required=3DFalse, type=3Dint, > + help=3D"""Run in a loop, polling every sleep_ms milliseconds."""= ) > + > + args =3D parser.parse_args() > + > + try: > + if (args.pid and args.cgroup) or \ > + (not args.pid and not args.cgroup): > + raise ArgException("error: Exactly one of --pid and --cgroup= must be provided.") > + > + args.cont =3D [size2order(cont) for cont in args.cont] > + except ArgException as e: > + parser.print_usage() > + raise > + > + if args.periodic: > + while True: > + do_main(args) > + print() > + time.sleep(args.periodic / 1000) > + else: > + do_main(args) > + > + > +if __name__ =3D=3D "__main__": > + try: > + main() > + except Exception as e: > + prog =3D os.path.basename(sys.argv[0]) > + print(f'{prog}: {e}') > + exit(1) > -- > 2.25.1 >