From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD7F9C3271E for ; Mon, 8 Jul 2024 13:50:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1AC5A6B0095; Mon, 8 Jul 2024 09:50:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 15C416B0098; Mon, 8 Jul 2024 09:50:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 023766B0099; Mon, 8 Jul 2024 09:50:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D90CD6B0095 for ; Mon, 8 Jul 2024 09:50:54 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 65743A136F for ; Mon, 8 Jul 2024 13:50:54 +0000 (UTC) X-FDA: 82316721228.03.5349553 Received: from mail-qv1-f41.google.com (mail-qv1-f41.google.com [209.85.219.41]) by imf11.hostedemail.com (Postfix) with ESMTP id 846A740017 for ; Mon, 8 Jul 2024 13:50:52 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SN6n2E+W; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of schatzberg.dan@gmail.com designates 209.85.219.41 as permitted sender) smtp.mailfrom=schatzberg.dan@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720446630; a=rsa-sha256; cv=none; b=0knLTj+VFHqOQO6qwm++JF5h4lJ2rdhEKIkf7fm0g3bbtrI5SOedB9vUA+ytz2Vw8jc+YO dMHo40RnS+i4Z+AQnVQe4biYyPufH47qDDBjug2/At6Bti/miExLFWu8ggwVH5NuF/zFPP Reof/8hg2SKQ9rqHhXdPG24SSvp5Wik= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SN6n2E+W; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of schatzberg.dan@gmail.com designates 209.85.219.41 as permitted sender) smtp.mailfrom=schatzberg.dan@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720446630; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=t3eBLpsFAPcO2Rkqsiub7sAHmvgITktHnvuxwJUacgg=; b=H4Rxtl6wUt07D/Dm/Koy9c4rc7ihSl1zHvRKj6yT1VV6QwjeupsFovYYVIv7HQDKAJ+kb1 iZFE7GquulQn4pFL9FTK37VgeT1dIOgo3haWWPvsQjZThDVFeDUN+bKqRtwD0xAQ6Enz9o JGdHxB1S8qFyMF2yCy+uRpW9+9+PwwQ= Received: by mail-qv1-f41.google.com with SMTP id 6a1803df08f44-6b5f90373d4so22361236d6.0 for ; Mon, 08 Jul 2024 06:50:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720446651; x=1721051451; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=t3eBLpsFAPcO2Rkqsiub7sAHmvgITktHnvuxwJUacgg=; b=SN6n2E+WodYr42TQ5QYINgIhT+/jvww3UQulnbwaVA3dAxls2jwMd0zrxXTE9k/kzc XdjxVcLsHMyCusligk1Rymy/zvWnncQkf/i3Af72Sot/xPWpgDmwGfZbtJ7d0o3cXa47 qYeukYMnXrQfeMtx2e1tXXwgOk1Et4Rdel6i6IV3U1C2PQORLyhfmYwKkiCeMgvuMfy7 xO8EF69+0YBiUOn1KSXhyUzuFmqqG8CPn7EtPSIEp/9rjF5YirRptxjJ/cK1fj5TfzTD sSUbBwLPn7Dd1RxmcYJnxmsDZrFOuvaCkJVsHZnuQZQDcFEPu0fvuVB4+fL4T+dGjqZY khVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720446651; x=1721051451; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=t3eBLpsFAPcO2Rkqsiub7sAHmvgITktHnvuxwJUacgg=; b=IYc+F0GaB5Esmk9BJaHIlCnoRPlT8trib6lpOf4cDy5MnggVCnrDAniqDydK8cWTXO FUTc28CQvZWAu0AvxYQ2b0rrolTp+kkHxWoepy0P7mqHTqr3vN9AvyhEE8W203bxJIJc WiCTPSsanSefpUVOGpKbeCMdxft5y4TnU4D/kyq4fnt3KJXZgTStT8Uzk/gMQDUzfZQn D+keuFD2RpmCrOnnye9Z/aLctNhZoGEgE//2lA150hUCGwrDPrCn3hHf3MzHdTKf3n5m lvNSbR/sMqDtLPjvwPK1QQQH+7QI1dcmB2VROLRr8YQrazLQiOK5/L2dqlEPPPDsAFTJ zj7Q== X-Forwarded-Encrypted: i=1; AJvYcCV28EywDGWHZKsZGtaBO9siouGYcK5TXKH83Xt7qYzUMoaUS0aSXt8TLzI9k1LdG6pTuykuIm+EM6ISoNHIXkZCJUo= X-Gm-Message-State: AOJu0Yywz4V3ZopFoSSLXLTck35MsT16UIE+uMLA2gJ9CAlpu14LTYy2 Zh4PM8BiHqkE/u3nqTPk2YO1EAK5W/ivbuYO82Nm7CEPT3KCvmoF X-Google-Smtp-Source: AGHT+IHYOhP37YOc5Iz+aieGAoST+cd8hiI3RaFW8dPhQrNGXtlpCEwChbwbSu0JjsBgFGPHAbNqLw== X-Received: by 2002:a05:6214:2601:b0:6b5:33c6:9caf with SMTP id 6a1803df08f44-6b5ee5e7483mr175034666d6.16.1720446651480; Mon, 08 Jul 2024 06:50:51 -0700 (PDT) Received: from dschatzberg-fedora-PF3DHTBV ([2620:10d:c091:500::7:8f79]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6b5f0bfc677sm41821146d6.96.2024.07.08.06.50.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Jul 2024 06:50:50 -0700 (PDT) Date: Mon, 8 Jul 2024 09:50:49 -0400 From: Dan Schatzberg To: David Rientjes Cc: Michal Hocko , Andrew Morton , Mel Gorman , Balbir Singh , Peter Zijlstra , linux-mm@kvack.org Subject: Re: Tools for explaining memory mappings/usage/pressure Message-ID: References: <29c27dab-a590-5df2-c840-279bf9dff090@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <29c27dab-a590-5df2-c840-279bf9dff090@google.com> X-Rspamd-Queue-Id: 846A740017 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 1kqqo1prmzs9bim98mnt1wow8um9hyxq X-HE-Tag: 1720446652-139872 X-HE-Meta: U2FsdGVkX1+0Z0K0sCFlGrqo9ZVO7HwptybJCtanijPmOSbn6i56v5ieSPu9oW1sthz3BSPJHWR32wKhdWXjJ9Q0Ja9hOjM7J7vWfkcMU7FOq2JopdgGI3DiBZOCuouRBoONnfhCo+uYLpKxGdIoC8S/UL+Am1r+6eloJXfRVk4urjDUGMGc4ifViwq6UBvCUJPiV24ev+BfsQlScQmWfvsRvuTMSBQhUXPkIkozptY92LzRFHhIVczY0S7Qb3SGAsECpFWLtm5E9ieNUO+9A54FLCBLCN36L2uGF7QvnBgW0s+srUy2lxyEKBYuEy7n76wG++QVg9O0gy2e3YnEoVlshiv724HEMPlqzxZX77cjUA1axLkJpTgGdwORFKMJUsIiER4N9ByHWSHTEW+LnmISbG7xn/rEvX+1IR/Q1S75gxoeloifVjlheVYWMnXRl5gnrn/RaxY319GF7si8+XxbHm0EK4mUp+Cj+4Bb/q9H12s9oFGAGNwp2X+G+WBN5u/5AfmrfHquNkWQ2d6Ra9tpiTLPcdnfSKQBCVW2KKLKufxqcO7a4QP1+0JhBgF6ghE9t41MRiUKTjLbraKu1529/IguEVUyfh7lLQdpiFj2aB8r7aVyeXxWkN6dz+QbizGODIxajNUfQSB5cbAAPRORX2EMG9gX4yWO62NjTO0CBYhuIaDArKNMjTAxShaBJB31ppvldkVYkway9MG21S4N4uZmpI/FKLZ5jQYn9pImf7Csc8gpgrNBlm+Ua+qlbh8Llv6RRFEZ2G0Iu49PsSI5SXH6oHr8PM/8fm6KQxrNHkHPnNU464ltzbF2iI73ECw30KBT3Fiou3whKbqSABWUCUlgSSKFfOj/tBQdfKlKPjdxMAK/m+A2pbMhsLGUQxqYzO104+V7NKKr3Q9ODvKmzFPPHDzo3XkXfWxVQkvHgfoTsA3TxUS6fMTDcDPN1WvPNWhBntePcsOY3Bv ZeHN6zD4 5VjZvLwzH52U9b3TBNrA2G6VTk6l43dscN0Z7kKKx0SL7ayKKwl/YSU5MuQruSUl3lPROw23scaK6SWxOJWN101EWZsiL9XtuNnMyjoBvOo3A8K6CC2pxH3ad6Owq20H2kc9Z2S23esO0poORor3nPhUvGCNB8TWlr+Bi1hdS+EcbQQmmkX9a40mgWOJVpPs508hx61+xv1lSq/CvZb360y0hJmajbKCnqvki6PVRYdi0v600zWAFz1CWyocc5SflKqf4jd0godZA3a18gPB+KW/zfT4SAhIUfdCd/Qqp/K9DgzQHf9CbHoKPvuicUcdsUtu3Dx4uX+k9nRriiKWUIB9sUtNwvr3/x2IPtdUYvpwJCMJxVRAGIQkBNcd6dL2R2BkI9OftkNjHYeBhIRol2AlCO2hgHkRb8ox1g1W1T0miFeXvxHE/lO0xIJTf3GwDgBnVTAeO5RjQcDOGE+lFYDKoRrycc62zQrQUXTEzAyqi0SeOFUSNKy4XiTw4pXTgQkiq X-Bogosity: Ham, tests=bogofilter, spamicity=0.001662, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Jul 06, 2024 at 01:55:11PM -0700, David Rientjes wrote: > Rather than hacky scripts that collect things like vmstat, memory.stat, > buddyinfo, etc, at regular intervals, it would be preferable to hand off > something more complete. Idea is an open source tool that can be run in > the background to collect metrics for the system, NUMA nodes, and memcg > hierarchies, as well as potentially from subsystems in the kernel like > delay accounting. IOW, I want to be able to say "install ${tool} and send > over the log file." > > Are thre any open source tools that do a good job of this today that I can > latch onto? If not, sounds like I'll be writing one from scratch. Let me > know if there's interest in this as well. > > Thanks! > Hi David, At meta we have built and deployed Below[1] for this purpose. It's a tool similar to `top` or others, but can record system state periodically and allow for replaying. We run this on our production fleet, periodically recording system state to the local disk. When we need to debug a machine at a point in the past, we can log in and replay the state. This uses a TUI (see the link for a demo) to make navigating the data more natural. I'm aware of a few other organizations who have also deployed Below, but tend to run it more in the manner you suggest - have it record data but then use the snapshot command to export the state (e.g. as if it was a log file) that can then be viewed off-host. Some organizations eschew the TUI altogether and export the data to Prometheus/Grafana. I'll caution though that having the data is one thing, being able to interpret it is entirely different. While we try and put the most useful and easily-understood metrics front-and-center in the TUI, debugging an issue like you describe would probably require some domain-expertise. [1] https://github.com/facebookincubator/below