From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1F16C4829D for ; Wed, 14 Feb 2024 06:20:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 389E76B007E; Wed, 14 Feb 2024 01:20:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 312906B0080; Wed, 14 Feb 2024 01:20:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 18C8C6B0082; Wed, 14 Feb 2024 01:20:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 00FDC6B007E for ; Wed, 14 Feb 2024 01:20:29 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id BF84B1608DB for ; Wed, 14 Feb 2024 06:20:29 +0000 (UTC) X-FDA: 81789410178.14.6F04D83 Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180]) by imf03.hostedemail.com (Postfix) with ESMTP id 8395420003 for ; Wed, 14 Feb 2024 06:20:27 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=yiBhOr2Q; spf=pass (imf03.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.180 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707891627; a=rsa-sha256; cv=none; b=XnrI5F4pjNY8M6Pk+1ZeUo33edHWJ8CSd7G3lm+MO1JUqoVxDxZfjSPDLY7g4iPpvbXUHq 5xvqGJm6B3+4Mbrj5uZ6GxxewIa18ooC67zOKyUxxTWPY3ssy/+qoVrqGH0vOkKrdr3JaX ijj0xWsW+mVcESfNnEOY9/M96nZY2UM= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=yiBhOr2Q; spf=pass (imf03.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.180 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707891627; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mPZvfWNgR9UvijIIJJg1XLOXvrF3xKs/y4S11I2T7Mc=; b=g3DVb+l03INFxLEyBh2dVlm4r7TwZtato7YgGC6KtcnLT0c+OwPmr0QScBkT9Sl2htOcH+ 2kOQC5S1ESIiVlifQbIhFFVpCJWTHahFm/rY6Fq8Emx6rqzaggEsVvzzCfxc9kCfT3rCBr c/DN6BifQVcbFmEUn08cKldOlR1cTtY= Received: by mail-qt1-f180.google.com with SMTP id d75a77b69052e-42aa4a9d984so24069981cf.1 for ; Tue, 13 Feb 2024 22:20:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1707891626; x=1708496426; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=mPZvfWNgR9UvijIIJJg1XLOXvrF3xKs/y4S11I2T7Mc=; b=yiBhOr2Q6VRmuSYrotb5TSAFqgyoSTY0G7rIeVrzQtKFhtYD5f58NCL6as4TnxZjiv S3tEZwsk86bOBSN4NZpoJO+FwbUEwiOxVb4D8R6nXq0nmgoxldlLyta4UCM1Rz+6y2q9 qPJkBpupWhkw/U/3k4d+U5+HLNgd5bvO9y1CAEOEJJoOpYdrHagxK8Mt13zOpJTVf31R STPWBC6vYhS/qqmMpR/oVBNMbhEzsNXZjfQxFkP6Rh32NYw9Neo+IehFePTtT2z61uGU 3CrxtRTGXtGzE6zSwC7pz615pMTqu5wyhEKNEWa2naE8BwuY1kMPSmzp0Iyy67q9wFUp dJdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707891626; x=1708496426; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=mPZvfWNgR9UvijIIJJg1XLOXvrF3xKs/y4S11I2T7Mc=; b=qUsGPEiepZPyKqEMZsrNihoGSS3JVxxxyyvH8biZcyGFJ41e56UMTEP0je7a03xoeL uhdSKFV5Y36EXp+MLIKFIVbEK1VHOBcctMy1VOXybU38GiIV2euviNqk4fwVwWtVz0FS k7Pvkns82I2ytq8jv+E2Bwj1ZyLXS4RnX0Bk6f7HRLp4xIayVGAp3PMv5eJjndL+hv+/ CIdF7CqjWjyz0uUty+Mnrfq8AutKnrk/4/DWuLiW/cQvqKZPsxk+yK9wBgryKZVJbd25 DnCYI0DbVo4PbQTAa/ln81BciVywxpsOy5KV0qR9GeqpZiYVB1UPegRnhksXIVbWjK0a IImA== X-Forwarded-Encrypted: i=1; AJvYcCWYr9c6eM0brYf4o4LxBpL6vExhF8Ko0DnW+HHya65317ieC3yw2R8lXN4X3rUAOckBkMTuhUT5gXxrR2BEmqm+J38= X-Gm-Message-State: AOJu0YwedaQupqL2tbZPmQxxhZL2VIMfiy/uyTL3u8iK2BySEygKbFKW jgKxXceXIHYeuOHOEV9t0uT19X3O4lZ15hUP8kWu6QlPib9zBtnWdJrJhZLAr38= X-Google-Smtp-Source: AGHT+IFnioDUv96AzhWE99fGuZ2JnAOUfdgv/2QX4e5KUyJeUPqxV+8qOH1R7pvVEl8yrhK7A4y3Cw== X-Received: by 2002:a05:622a:1045:b0:42c:70a8:1b3f with SMTP id f5-20020a05622a104500b0042c70a81b3fmr1875328qte.7.1707891626471; Tue, 13 Feb 2024 22:20:26 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCVyB4jFMHn+EGrswmDu1/pylqxzZS/7CEIDA9h5gu8tDWbdJ3ly8gQnCVDTj72KRibP+jAgu2ZRcQ/Q9a2f8mOD/8WXTApSC1qyCuHeU4+OP9RxDNN1OsO4haGA3xEgG8cBl+GFQKumMlACRPRyQJPu8vDu/OBVCiXC5h2Kc+UhHUN0uxIr4+CnlS8fi5EPSbA+LawgUvKtz3ZJACCe2oZdSez0fKge0ggM1gE4IT9Kqde0gLbAT0+TTaW2dEquOn5OWonOfMHlY3LgDmFyknOkT5W5eN0ZeXug1LFzG17semLCLr5AExvGYfmRQRDfU40g5L7KZnrImSIabsA13nCEDFDEsSslvrMcNaUb7t5eK7uOlxufthSTIpxY20ZQU3xPLCzAO+l1Ldckgh7k6tNNU06NoQR8ttw70zX1yKVuoWLDxjsxoj1T8wPB+5OL+GcWyt0wZa982UdziuBAHRENHQ0DQXXyXvCaQFyw1xsOKMQ7S7pmzVU3wvsVR5JUy1l6N8WUSUtuVMbsHyQijVWlF9mlMKaXIWaNBamQ9/S60ZP2p3zWinc2XX8gTox0N0h/Q0S2X5vFuIQ5UHcJrM5PUkbVJJNXCJyBC8ucShST/aG9b+YvQah2fi9VIDywwpfMIFK9M0YySaLklvdQV5uUU9LsM1ywmjyRyDGDk2WX41U+gOwGiADEmhGmUF0lSSQB5mWzXk6HCLoGXObkbI4tPYw+luBGD1XlhJpzMfqud8YIQO/OFkOahUuSX4+N3goBrvN3ZGHcTJUDPDc3LB08lvkBzmvCkMRH6ncHmyuNsvgmosgXfyXpuZFZkvLjViPXzM8cRsxd+8nU3DvvwLMl5itUgBMRkaIXUozPyGyG4FTV6l/TeUOqoFkVygQu+LVUyJl+62Jk4R/hoPm1djda/Ishy9+m0pREhG7BHUf/REwC5VSeNUF37sLHau vrCpMG+X WKicboE37DS7u3tPg+DN6EyK9ZAffCPdEkt2GF4XjAD2FnpADLiQVhGEq9Gq4x+WxVYIxnp7gxNGLxBm8TMis3mOlYQAdsIAvOSs8asdMvcSD25bG3gnLj9UKC1UKFf9aomJJB1ZKsIUjeqI2tRkeW/yf4Kcw2IqeFnyHjGKNQy1IldFxWzTnPTBNkHSNG6OZ7MHYp+nAysigPe3NYOe4Q5wbEutodr/NCeHs32gpONYfZhmrW19yPiWOyLuVChJlCWN/VhC335HRwM4Jr/3BjSnCvvbALRiLovEnzTWuwZusbptsgKAhf0pYy7Hj0O7mtI8Ipnh/4IbbGPoMN5arUoYLuSiC+F5FfROUz8hpEFNUUDL2AjbfwdHaatdbz/lhYwKm/0Eh46IibKUNBiDhVOuavk+6b194zhv0/CtvOV8FsHHEClx2D7Cjspyg+wLeUr5giZvnI8UB/IuG9+h/ALXycq0Q9ApV+74ekx9PtkzI2OBvC4NMelG/r5quHjtuJHtIzSEv7q7awQD23dieYar23cVHEPeDfPHFuaPAy/4S5RGl2QcRcwFxWNfIom7IFQS0jkYJu+FGuA2+uIbbjOXmcFA6+1f5cZnuCanLkH5nPRjzob2X4MKTarRGr73/himyWAHgMpj/7MG/senK/ELpDrSMOR9ILccmRutMaJYoT5BKk8QgRYlqxQQp/InILivj9wt6yn8wcG/ZlIk7Okj8ru9qLMBeJ/MA/eO9VzJ3RXkyt0bs3guJS9LyL5AetyXjHQDTm4NjWPUuvUoxp7fw6PVUWqv58Vt7B3YrRCsqrOsuTL8mQl+aXclP+RFSIv7QZBkmRVu87kVHgZltKUF98f2vLoKjI1gpQOjthlGqVicBoS6tNmoGtL1A76HVtYGtQ6OJi5g3LWPFa1xgahdCjQAbbgZ54quqQtuyqxGcgXGMfJYexzVuYVZmdOOI931sL2IwF3ykaiUQtqU5UJz7+Zml oPtGW1sO xtSy5GrkpnxXCEcybhjX/cUg0mDzafJYrWnWySSCMVwaThVHLFt/1LixORhIqMKVVGi/7OQeIomQZ7uRH6Pc2OtQdIsIQMy69S0RxK/8Av+XN15iNlxkhZLazp0BOiiSmWSZKEaTMf+OYvsKI Received: from localhost ([2620:10d:c091:400::5:6326]) by smtp.gmail.com with ESMTPSA id l13-20020ac8078d000000b0042c613a5cf3sm1755053qth.33.2024.02.13.22.20.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Feb 2024 22:20:25 -0800 (PST) Date: Wed, 14 Feb 2024 01:20:20 -0500 From: Johannes Weiner To: Suren Baghdasaryan Cc: akpm@linux-foundation.org, kent.overstreet@linux.dev, mhocko@suse.com, vbabka@suse.cz, roman.gushchin@linux.dev, mgorman@suse.de, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, corbet@lwn.net, void@manifault.com, peterz@infradead.org, juri.lelli@redhat.com, catalin.marinas@arm.com, will@kernel.org, arnd@arndb.de, tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com, x86@kernel.org, peterx@redhat.com, david@redhat.com, axboe@kernel.dk, mcgrof@kernel.org, masahiroy@kernel.org, nathan@kernel.org, dennis@kernel.org, tj@kernel.org, muchun.song@linux.dev, rppt@kernel.org, paulmck@kernel.org, pasha.tatashin@soleen.com, yosryahmed@google.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, andreyknvl@gmail.com, keescook@chromium.org, ndesaulniers@google.com, vvvvvv@google.com, gregkh@linuxfoundation.org, ebiggers@google.com, ytcoode@gmail.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, bristot@redhat.com, vschneid@redhat.com, cl@linux.com, penberg@kernel.org, iamjoonsoo.kim@lge.com, 42.hyeyoo@gmail.com, glider@google.com, elver@google.com, dvyukov@google.com, shakeelb@google.com, songmuchun@bytedance.com, jbaron@akamai.com, rientjes@google.com, minchan@google.com, kaleshsingh@google.com, kernel-team@android.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, iommu@lists.linux.dev, linux-arch@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-modules@vger.kernel.org, kasan-dev@googlegroups.com, cgroups@vger.kernel.org Subject: Re: [PATCH v3 00/35] Memory allocation profiling Message-ID: <20240214062020.GA989328@cmpxchg.org> References: <20240212213922.783301-1-surenb@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240212213922.783301-1-surenb@google.com> X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 8395420003 X-Stat-Signature: irnwryyc8ne1twui6ya4zsdsqksinwke X-Rspam-User: X-HE-Tag: 1707891627-190990 X-HE-Meta: U2FsdGVkX19YfrKLk/kQf5qaovk6PoAi/VpPm+cGaQu+pfnjXwcTvyJOTBzkDjmKlqKBqCyuXZ5uM0nrfblzalqzlwrGXn0lIi75ufln+47FdfZdtvU7t0U6WnvCUNSnfRsnRH77kLeRps8Hs6uq/k8by9hT1WbTjs0SlWvFvX0kFdKMaJmpVJBu1akDh4bVH17BohOoFXLuuIIwar6fDn5Ar1KnN6cQxvMw2TXMXa7DnzCn9hT+Vwij4+ndBrbLKd1S3tEQSTI44sjgjrNWhB1THCIlPw//QevbuXxdbCMhuQFDiRT2OsK+sEWW1tbYmE1xUHLifJCe2Yp4Rwqcst70mrh19BV6U0f8CfnORE2x1mEeDiARKysPWj40iIjBZ4IBG6SFfZDSzvzTjm+EeUU60vyGgc4WSloLDBRLqWGANTY/saHdLfMbKp2c4fEP4GLorh8Zqe6dUrdRaq6hcSxuol8fvsOLS5M5frq3uXNDNS/BWTQFu8IW0isVausPRyRftaS70KgqIKov9vOXTC0LbDWu8wCuAhWZIqoX1+0EISW82nzEeDYGLzXo/Ig2YBUI0hZBolqIE7+GNn+hsEeD9smo4Bytm3XF/vlXIdvjZBlYqTYqMOgtEgFdxlBub28tsFnWVtDmm0g14/0ex4QhWdN7T/dHHWjiStiKGXP4FKQK8bxPX4/xeboqP4CxtOLddo2oB4HhHTMZhJtp4cEsgijt0wN8Oi51sJZRKFMxfx68IaF6zaicrBaplllfWQV/zuxwAf7GRw3aZWrvBK8JgQQrs0wq8QZ6Gnpf8TD6edInPlQpiiY7GaPoImBnhM254FwlCGwuDxBFqZQvYtPmFP/Y8WQNMCICb3O0bFdscOV5Q63h6ib43i9OK2XMEw6WGTbVmnt6ig8d62YAceEI0GC5Eix1nu0enh4M4LQ26Et56i34VT/XIRpT0kRDgJayPiaDhnk2QQwoHdh zdQKWu4o Ir0ckAzd9uLjeeb7wijMFuBeGRzQAp3SHS+Q07bMTsRxaC4dkCZAK1sBy4vg8eqSsxlhGt/bDUrxKFhPe8YHQPZ5rI216RnvCr7upP+4v2/V6v2agEBMLP46LqePNwOO7HV6CAOLbJVhFObB2pHFUAeAd6WQYZHYrf0o0daNnXKWI55brQ+JDplTuS6WNuHFUphQ4AnyIl056U/LCRRH+cW8EQNZ6TdsKizEi5YBYFlVX1pd+EFtKiqUE95LdZUHCwq9OhOXytpCh+k21uygMUIga2/Tmwliprj76t/MppaECGZVptD5IJMtEDIqDfFWl6SANuBA9Sb6LsnZ0s2x4Cgiq5A/jjgd4TKTtecerkuw0g1VSsN+WGyx3RW1l0IZn8wM1McIvnXSw72HDDef9fpZS88F4swU+wxbahZlNfNB9wEZFvgOy7fnsK6JdHMM4V7dNBcwn8US+u/1OR0l6rB2kBtc9I4+ZdCjw7AJU+n61PUFFVbfGlZlXlPv/kAre4nrFVqKomW/Y8AiBRzXyMYkzkyWJii3nAJ1x15Fy7lij1Zn2h1v2NdPkdA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: I'll do a more throrough code review, but before the discussion gets too sidetracked, I wanted to add my POV on the overall merit of the direction that is being proposed here. I have backported and used this code for debugging production issues before. Logging into a random host with an unfamiliar workload and being able to get a reliable, comprehensive list of kernel memory consumers is one of the coolest things I have seen in a long time. This is a huge improvement to sysadmin quality of life. It's also a huge improvement for MM developers. We're the first points of contact for memory regressions that can be caused by pretty much any driver or subsystem in the kernel. I encourage anybody who is undecided on whether this is worth doing to build a kernel with these patches applied and run it on their own machine. I think you'll be surprised what you'll find - and how myopic and uninformative /proc/meminfo feels in comparison to this. Did you know there is a lot more to modern filesystems than the VFS objects we are currently tracking? :) Then imagine what this looks like on a production host running a complex mix of filesystems, enterprise networking, bpf programs, gpus and accelerators etc. Backporting the code to a slightly older production kernel wasn't too difficult. The instrumentation layering is explicit, clean, and fairly centralized, so resolving minor conflicts around the _noprof renames and the wrappers was pretty straight-forward. When we talk about maintenance cost, a fair shake would be to weigh it against the cost and reliability of our current method: evaluating consumers in the kernel on a case-by-case basis and annotating the alloc/free sites by hand; then quibbling with the MM community about whether that consumer is indeed significant enough to warrant an entry in /proc/meminfo, and what the catchiest name for the stat would be. I think we can agree that this is vastly less scalable and more burdensome than central annotations around a handful of mostly static allocator entry points. Especially considering the rate of change in the kernel as a whole, and that not everybody will think of the comprehensive MM picture when writing a random driver. And I think that's generous - we don't even have the network stack in meminfo. So I think what we do now isn't working. In the Meta fleet, at any given time the p50 for unaccounted kernel memory is several gigabytes per host. The p99 is between 15% and 30% of total memory. That's a looot of opaque resource usage we have to accept on faith. For hunting down regressions, all it takes is one untracked consumer in the kernel to really throw a wrench into things. It's difficult to find in the noise with tracing, and if it's not growing after an initial allocation spike, you're pretty much out of luck finding it at all. Raise your hand if you've written a drgn script to walk pfns and try to guess consumers from the state of struct page :) I agree we should discuss how the annotations are implemented on a technical basis, but my take is that we need something like this. In a codebase of our size, I don't think the allocator should be handing out memory without some basic implied tracking of where it's going. It's a liability for production environments, and it can hide bad memory management decisions in drivers and other subsystems for a very long time.