From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C4B2C27C53 for ; Wed, 12 Jun 2024 17:48:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 90DBE6B0095; Wed, 12 Jun 2024 13:48:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8BDE26B00A6; Wed, 12 Jun 2024 13:48:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 75FB56B00A7; Wed, 12 Jun 2024 13:48:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 5917B6B0095 for ; Wed, 12 Jun 2024 13:48:58 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id CFDA6A1857 for ; Wed, 12 Jun 2024 17:48:57 +0000 (UTC) X-FDA: 82222972314.08.0A1719F Received: from mail-lj1-f182.google.com (mail-lj1-f182.google.com [209.85.208.182]) by imf07.hostedemail.com (Postfix) with ESMTP id F0F6F40016 for ; Wed, 12 Jun 2024 17:48:55 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ksAMtODp; spf=pass (imf07.hostedemail.com: domain of avagin@gmail.com designates 209.85.208.182 as permitted sender) smtp.mailfrom=avagin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718214536; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sWRq3PAWFykXw/wg8pUX9HlfeUGFcprx4t2KU6HQHu8=; b=I4br7iv5dsTaQRLSMg4ow9TmUKhcikLfFAS2QnfIvHMRG1ckRk536hPxG3tEnf0XHJZY47 ibX1yobfnZoWvDYc6m92/g7eZoOu3gr/GzH46sg/SciLHbav3WVrODVxiEWQEKMyDfXsO1 w7Yn/HOeUtOu90rdLyElcGFKVrcVAao= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718214536; a=rsa-sha256; cv=none; b=Lonu1QID/GE8EBQkpZEqN1Ze18dRuRjTDNgeOwogHz4DZW3YQcSeAu7Gm97ff6Yjht+aEp 5ATVPUBMEf/mkKTV+wKBy8a82u1gHTz9u2T4Ow1Ddf/0ZUp3TyV4DlbyTHgQG9w5u/3MG5 jAkuyqe+lDwPHtZn1uB64GPMRi//UKw= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ksAMtODp; spf=pass (imf07.hostedemail.com: domain of avagin@gmail.com designates 209.85.208.182 as permitted sender) smtp.mailfrom=avagin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-lj1-f182.google.com with SMTP id 38308e7fff4ca-2ebdfe26226so506771fa.1 for ; Wed, 12 Jun 2024 10:48:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1718214534; x=1718819334; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=sWRq3PAWFykXw/wg8pUX9HlfeUGFcprx4t2KU6HQHu8=; b=ksAMtODpW5TtpmcZMuZuQhUGluOQKfp0HuMgu7sltlkH68XBIy7DD0WKchs//KtotE Zhr6tBUmE1OSP85E7az1OoqSY+DxQzi+vrwqtFitR2tD7JflYd6H8tA75UIUUjJca0+o l6FxwVbLVCDE0NG3Z1wajgnoiNHBgqhqmIG1eV8Wx+p6GtWfTYUjADL0jvnc1xhyd5v2 6+VjfEJ7aGRrAwuwXbITR3mo/Sv3ugoAFCFz96mcBC4RDGZwjpXWQYDsk+4dd7+2LLXz Cz2h3iOFdawsuMYgEqFAtI5o+ZSctFETijP8dc9yLn3LgEI+7JkG8HSUkItMOdToATYl 55mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718214534; x=1718819334; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sWRq3PAWFykXw/wg8pUX9HlfeUGFcprx4t2KU6HQHu8=; b=Q9zSKe4OwkaKm+irW6wlJ9ocxVu+KQYOQ4gJYEEA4imrjZtus15UzPcWSurXAWYMYj K3zfA0L2s8tAF9rhnJWcFTyftx437HWfZ30HR6QjCBazv9DnAbtXVVW7rvk51I4YAjje 8MOfIM6WPvp8ewymlBbaUwFtQQUmG/kpElmdaNoJ9IITcmYUuz6kuD/3SoqzPndnWLBn EaDRRMJoCH0MOWkGzzq7Mr3ZlWb0DsfVUmu9FuHl6YgUawFLRn6L8FwXM+Gb8SZuhN2r MKBLad65cbhtcLN3JxCbE3xx3GfTEpaJ+cO3/aRWY83zmPIb7AjiDA3lSwRNnQ+Is8Cw Y4Fg== X-Forwarded-Encrypted: i=1; AJvYcCU3DMYV2OOgiLPSU8yAjRVlpSIpZwRbNo9U3J/tuoEgt/LjIuqj0dudYSiilgVpfaSAFWdw6EIWau8AAAZhXw8RV70= X-Gm-Message-State: AOJu0Yxg+7tzG/fcRx3hVguTG8GIdjSCHb6KJvB7FHUOTBVr96XczAc9 g6n798mHpKMssbkR0CZJ/PVrBCyWZQXdrlZ+v2PfKviOS1c4hKy11FsVoIkzXqnqM/jfGZ6QxNB V0megTlnRhAEv7gEsh/5IuapSJVw= X-Google-Smtp-Source: AGHT+IHA55C4BDttu7u2HKaLoMXFIz1/aGOTxtg5E/j5naZlPku4zxMaSGvkX/d3yoCc442ECzpUGWN8vYj/jbiTdss= X-Received: by 2002:a05:651c:20c:b0:2eb:ec25:c4af with SMTP id 38308e7fff4ca-2ebfc9499ebmr15663611fa.50.1718214534010; Wed, 12 Jun 2024 10:48:54 -0700 (PDT) MIME-Version: 1.0 References: <20240605002459.4091285-1-andrii@kernel.org> <20240605002459.4091285-4-andrii@kernel.org> In-Reply-To: From: Andrei Vagin Date: Wed, 12 Jun 2024 10:48:42 -0700 Message-ID: Subject: Re: [PATCH v3 3/9] fs/procfs: implement efficient VMA querying API for /proc//maps To: Andrii Nakryiko Cc: Andrii Nakryiko , linux-fsdevel@vger.kernel.org, brauner@kernel.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, gregkh@linuxfoundation.org, linux-mm@kvack.org, liam.howlett@oracle.com, surenb@google.com, rppt@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Stat-Signature: aob9edtanprkm3uxtumei7ctkxqtmq44 X-Rspamd-Queue-Id: F0F6F40016 X-Rspam-User: X-HE-Tag: 1718214535-193640 X-HE-Meta: U2FsdGVkX1/qv++bW8qvXcuQcRsyqBIZ1Vbbjs4Bv7/0D5fl06gy53sfnu7BfYttg8mZEk9iw63vONOMFl/AlzxyHScoMBNd+R/hyElEwvO0Y1Y7hN7G+4qaFVqJ09SP97Jo/o3VBBXgPk8bctKLqkh0cnruv24zBg1VRfHcSuBxDuHndd9adM3aABpNou8VSChlYUOyUdJk7112rCbS9AyZx5ZFLtKmuvVeBB97pGZz9sXTzljnjm5CLinrADw9Nm7nNhkYOxGfYMpab8FilfeqKzFtYDt8C5DPND9hDerR5EBwb+IyflKw9iNHpbjNdDqAti5TeuSqvKgpZBEyrA6bka0iIg3K4FoyOkAV4iObTDbytKlvJedNT9129QqZFSYmDPDXIc8vC7RCJWayUCBZo5Z07WfAISrqR2qqtH4rW2HiP9Cc2kUamrgi3Vj2EojybZaVQBFJXrY59VmSh/vZaXbbqZ+2dsRYtDCbpbb+5vZ1Nx9nicDwe9rCpV8/AvuUs6R37diJIe5Rl6KG4BQGtYIdHldlwAPgk8pvxJd4lFSUG7tL69D05LWZXtr+wnQ9wJ6VkRCzaBlPpuDVB9224Aytr3FsKCs85rh1FUch134buFZ4qm5YlissDMGvAL1lp8Nl+bm4UvXbrDxd9S0SKKZgcvzYlkScbI1/OQUQglKGBbV/IuWWYVJUZGKZVBat6fLTxeBllTbqw0oSJeQjzhPWw5KEMr61BkJ11zg7ahQVXlsJ6n/0JBT7a4i+6pPdAefvmT/DsHay2DydSZDctaNKIhwK3WuBvJIjxa4gHcbU4X26x8s21S9IKotOs0eZQtZTv6G9jTsH/VdgiX6wyknUTw2PSRTGOoQI9PxYzTX8rjuEnlSH/7d8cSsxZE95b8vBLTWMM3AKFAefycN9VJJrakU+w48zvURJWyi7ltpLkHJsF2BCK3xVBoJ1SIUe772BIJYtv/H+MUq bTt3rz7C Sxx8stz6D/pHwMIaOkXhUV6Eeg2BELIDnfUhom5di+rhMmwnA4PjS+mySeM4BYusG2geEr7CBIdMwD72a/Y3qIFgpWbdT2lbxtqmWcx4SM1Y/S5QqW/Gd3dMfub+XE7SyZIv1qbeuO3xkFcrC/DVE6Uc+APOB8ZbvvqmWGwSc2i7zjdCzi4XcANWgGSvXy0HkK4Hh4qsftVPr+2ubXj51Q3dJcwdjXJhRdxTUHmQw+JploGxSQisuNoyP2ifp7dMZ2pldQsVKS/ziorzpELmP1TwqrMQgbMZrYHzyztc2Mh/Q2NenQw4FJKOtRhZVAEQa8274PHr15GbS6JeiItZ3e005qHjf38uBN96VQXKO7WOoCa3N+rfopCXLXt5xPyS6+0m26x/Y/D9qkvQPhAxBIgXYNy1Vki7TO5hC X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jun 10, 2024 at 1:17=E2=80=AFAM Andrii Nakryiko wrote: > > On Fri, Jun 7, 2024 at 11:31=E2=80=AFPM Andrei Vagin w= rote: > > > > On Tue, Jun 04, 2024 at 05:24:48PM -0700, Andrii Nakryiko wrote: > > > /proc//maps file is extremely useful in practice for various tas= ks > > > involving figuring out process memory layout, what files are backing = any > > > given memory range, etc. One important class of applications that > > > absolutely rely on this are profilers/stack symbolizers (perf tool be= ing one > > > of them). Patterns of use differ, but they generally would fall into = two > > > categories. > > > > > > In on-demand pattern, a profiler/symbolizer would normally capture st= ack > > > trace containing absolute memory addresses of some functions, and wou= ld > > > then use /proc//maps file to find corresponding backing ELF file= s > > > (normally, only executable VMAs are of interest), file offsets within > > > them, and then continue from there to get yet more information (ELF > > > symbols, DWARF information) to get human-readable symbolic informatio= n. > > > This pattern is used by Meta's fleet-wide profiler, as one example. > > > > > > In preprocessing pattern, application doesn't know the set of address= es > > > of interest, so it has to fetch all relevant VMAs (again, probably on= ly > > > executable ones), store or cache them, then proceed with profiling an= d > > > stack trace capture. Once done, it would do symbolization based on > > > stored VMA information. This can happen at much later point in time. > > > This patterns is used by perf tool, as an example. > > > > > > In either case, there are both performance and correctness requiremen= t > > > involved. This address to VMA information translation has to be done = as > > > efficiently as possible, but also not miss any VMA (especially in the > > > case of loading/unloading shared libraries). In practice, correctness > > > can't be guaranteed (due to process dying before VMA data can be > > > captured, or shared library being unloaded, etc), but any effort to > > > maximize the chance of finding the VMA is appreciated. > > > > > > Unfortunately, for all the /proc//maps file universality and > > > usefulness, it doesn't fit the above use cases 100%. > > > > > > First, it's main purpose is to emit all VMAs sequentially, but in > > > practice captured addresses would fall only into a smaller subset of = all > > > process' VMAs, mainly containing executable text. Yet, library would > > > need to parse most or all of the contents to find needed VMAs, as the= re > > > is no way to skip VMAs that are of no use. Efficient library can do t= he > > > linear pass and it is still relatively efficient, but it's definitely= an > > > overhead that can be avoided, if there was a way to do more targeted > > > querying of the relevant VMA information. > > > > > > Second, it's a text based interface, which makes its programmatic use= from > > > applications and libraries more cumbersome and inefficient due to the > > > need to handle text parsing to get necessary pieces of information. T= he > > > overhead is actually payed both by kernel, formatting originally bina= ry > > > VMA data into text, and then by user space application, parsing it ba= ck > > > into binary data for further use. > > > > I was trying to solve all these issues in a more generic way: > > https://lwn.net/Articles/683371/ > > > > Can you please provide a tl;dr summary of that effort? task_diag is a generic interface designed to efficiently gather information about running processes. It addresses the limitations of traditional /proc/PID/* files. This binary interface utilizes the netlink protocol, inspired by the socket diag interface. Input is provided as a netlink message detailing the desired information, and the kernel responds with a set of netlink messages containing the results. Compared to struct-based interfaces like this one or statx, the netlink-based approach can be more flexible, particularly when dealing with numerous optional parameters. BTW, David Ahern made some adjustments in task_diag to optimize the same things that are targeted here. task_diag hasn't been merged to the kernel. I don't remember all the arguments, it was some time ago. The primary concern was the introduction of redundant functionality. It would have been the second interface offering similar capabilities, without a plan to deprecate the older interface. Furthermore, there wasn't sufficient demand to justify the addition of a new interface at the time. Thanks, Andrei