From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C5CAC27C53 for ; Fri, 7 Jun 2024 22:31:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1CEB36B0092; Fri, 7 Jun 2024 18:31:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 17F556B0098; Fri, 7 Jun 2024 18:31:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 047206B009A; Fri, 7 Jun 2024 18:31:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id DC3526B0092 for ; Fri, 7 Jun 2024 18:31:21 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 9A4BB14033C for ; Fri, 7 Jun 2024 22:31:21 +0000 (UTC) X-FDA: 82205539962.30.035ABB8 Received: from mail-pg1-f179.google.com (mail-pg1-f179.google.com [209.85.215.179]) by imf16.hostedemail.com (Postfix) with ESMTP id B95DC180006 for ; Fri, 7 Jun 2024 22:31:19 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=XMruqAAx; spf=pass (imf16.hostedemail.com: domain of avagin@gmail.com designates 209.85.215.179 as permitted sender) smtp.mailfrom=avagin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717799479; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=obfZVFisODphXwHU/3O5DPDvWPieNCLI3FYHc2yEtOk=; b=KGoeEvW9NEsLfosCAO1PJVJ4J58qxQr8S/gy1JqN7jAOL6qjJRZK/N8YXRoHWQ5/94bKHn JKZwbG5C9yqSYARY5OgnNM3hvxRrVmjcYhIGeW6Snc4n1lPaQfEHLwRkXVbzV671b1RZz3 iGJQCDAlqPdbrIKbQH/paQ4Zv0tlAzY= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=XMruqAAx; spf=pass (imf16.hostedemail.com: domain of avagin@gmail.com designates 209.85.215.179 as permitted sender) smtp.mailfrom=avagin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717799479; a=rsa-sha256; cv=none; b=Icw6qytJYjks8yTy+/mLB5OE6fmgc4iHRVjPEtRzjPZeZ/yRMF3bfS6TM0dwxcnIoHjtcL AIj3tx/MwnO+T2V7jk3bI14HJfjRJRCCQC1QlMRlpVXMy+qfU8UdeV3xZDpjp8bwJglV2C Rc46vg9dq8eFP+xsyNp0Np8V/AZlr90= Received: by mail-pg1-f179.google.com with SMTP id 41be03b00d2f7-6e7b121be30so67247a12.1 for ; Fri, 07 Jun 2024 15:31:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717799478; x=1718404278; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=obfZVFisODphXwHU/3O5DPDvWPieNCLI3FYHc2yEtOk=; b=XMruqAAxyib7Lz1OqRVDUqz8TrP0p5LKrfxgWOaeL/LM+zO0oRVMRltUS+MmdmEcTo 19Xdilaq6SRF/Vee9xkg3m7fOERmKp/qRQXyPNFH9oWgwAQvWQi0/90NAQvehScAzMSy 0a4rdsEX+Ad9s7Sx0NCgfRIILu2R7Hek15n39A5WK60x+z5ERwwotFlgjNbO8fT/kif/ ffb8Y83z7zy9+KZZUNE1imDv36y+vx2lwruCMctEeb1gA60dTo/CT6yQ7fIFYME7bXOF FpUPwznAxdYEy68HMFdC3e28Q/B2+dnJcqHRVCnkHBo1D+F2wk9zn2xL4XTRbfcokLd3 DDvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717799478; x=1718404278; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=obfZVFisODphXwHU/3O5DPDvWPieNCLI3FYHc2yEtOk=; b=Y8FOkErQ1HjqQXdu2b/XEPUv7rtoi8HX/I3YdWBeRY0TWHAOEezxIJF0CMO3hqewDR PR1yygryWHj6SpQ+tKqRG1WuP9JHiDU/kgAnUEIph6EtWncWHNbDMrcl1UKJT6k+UR2Z /XWas733jCWdpsSoXP5b4KIYa8I88tknZQ7PqqQXL4BMkp5TgcsFm0CKzhqnhCe58L/F DTnN/mrZ425yM9KuC57DkeemrVLg0plu6iYhItq5hy1eDZX4rxYDkjZx5p2kEYexSNKm SLGC2WCFmcfbJoYMtiEgzUG5MKpV4NiNJCw3OBdbmltdUq/rFGKu1YJDJp/s7hfT1AHT jyyw== X-Forwarded-Encrypted: i=1; AJvYcCVoxQfXlS/8wAiXoNH0ITwNRdv4NOAcqMQ8KXknLXsumMZJHX2debooDEL7Fk31De7ovVirEtCF8AxSzVdA4dl1eBs= X-Gm-Message-State: AOJu0YxpKvs3Nw97MMLSDAm9qQpGNj2squ5FBicXByaHFhEieL1u2yzx rBO8RxrKLpJ/Y028rggR/MQhW7DWssTWPUjSCvaTtj4bh6vjIvzV X-Google-Smtp-Source: AGHT+IGlQ0lxj1IEvcN59nlMqfLXuYCBh2oMVRoEcf266PGOYI1SSB4rfK1sXt2gLQisCPNFM0oxLQ== X-Received: by 2002:a05:6a20:12ca:b0:1a7:878f:e9a3 with SMTP id adf61e73a8af0-1b2f9a297cfmr3908259637.22.1717799478325; Fri, 07 Jun 2024 15:31:18 -0700 (PDT) Received: from gmail.com (c-67-171-50-164.hsd1.wa.comcast.net. [67.171.50.164]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-703fd3780absm3024193b3a.21.2024.06.07.15.31.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 07 Jun 2024 15:31:17 -0700 (PDT) Date: Fri, 7 Jun 2024 15:31:14 -0700 From: Andrei Vagin To: Andrii Nakryiko Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, gregkh@linuxfoundation.org, linux-mm@kvack.org, liam.howlett@oracle.com, surenb@google.com, rppt@kernel.org Subject: Re: [PATCH v3 3/9] fs/procfs: implement efficient VMA querying API for /proc//maps Message-ID: References: <20240605002459.4091285-1-andrii@kernel.org> <20240605002459.4091285-4-andrii@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <20240605002459.4091285-4-andrii@kernel.org> X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: B95DC180006 X-Stat-Signature: jdy88fhsoyj8cstshqwxu7t7b5zsb3oz X-HE-Tag: 1717799479-544748 X-HE-Meta: U2FsdGVkX18YkB52BmSt86z74acIposjVtyNrvVx4WXQKrmFgwfzYuCzjg8ZXoAmvRqlrHni5TGwgC/+e7f/H/iY2IwQdOYpAW/BwJHR2gV8SlJCEeafmgBIj3VcfHptQwP0twq7LdA0auCt43zq1OQ/k1kFh/ivIV/8ZbX7bLyHHFz1IxtkRx/sWdB9isxpALJJGRvW95wcTlibXfWSwinlatrzNQ8MVFuen8kb3wea/poXk+r8msKNqYkFYxsAJlGi4fHT8mPQuOJJW0FdYqHX6ytIlITXYOwOSvJ1r2zlsGB5UQr9tyu5KggtFb9INfUJpRUw0/PzI5RncsnPC72qcxqAYc+oBe5cAkBnHJwr7UDWf848zLmk/P8j+VXpNTwCVgYb804DpJxVb983dAXNk8DQ68BKwjh1oxU8mqb08AK71KCKKnVPTK76LU8CPpgVwltVyxyybqwBw2d03JOehZpg9OTmaFwiBBJV3iLlKffnn6hfkY6r7EBguHNIxComRBIi80pUp8oNMb6xMRTvlChv0hzXYkM91dAJ82L2xgB2E0eaFqwXIEWFsRzuZ6TGMpBZGynaD1J3f/iwZ4q6AwFh8C8ZywipxxiaYs8cP04lJAySVFy6BFlEsCtx4cukfUJCbXq4dq9O5ZuiuUdCFD7ieWekMjvYx4WFfMA6H5Exx7i8nLRTUry4PVbL0W7oyy79GR+jbEukDBOL7QPL3NtTRftAwYQfr/E8BGqnP4F4ATMsgHDZeu0Qdar3rd8hAnfUr73h+kj6g51vOWnNz8dzlRLNQuSqWs7NG7LlVCzGy0u6AHZc3Yzb/BXBwi35OJF3cTLa3UZbNb+O3VfrQ2MdVi2ETTamjkrGlmJtJEQLyPFdF+au+KQrUhnnPhcWDYLz1ygLoKtRC1zWj9zJ3E+a2clPd9bk28aEOi1Mek/plZ6MNNFM/VYvFNq4M9OmY5YMwG6g06Y0Vaz UKIWxoXK qqlf2zR8LJOVEr0OxV5h8sSGOSliAdCV5TOygKN9nB2hS6OaUz0GuGgh93nzxTxdBqrjxdiE2emBx4021AWhtdF6POUr3eYm+VdFxQokgGlT9xBSgXdQu6HhPhPAEJOjO3fQ8ZQ+VgAWT8PLe5xB6QWwYfhUgk+fUKDty1PEJhCrkRnNZpI86dxBqI3dHtV/J3j+0lujABiCur7gilXwfL2fkulzypMCxoDIPhegnc3q4rb46iaOQVRLVu6Z8cWKHZt+kWiDKg9qXUQxknX81YRh5MmsX3u1o9kzWa28+U/NMNIasli84NcAsavATetJDLPETH3S9Fq5Vru5LgUE7XJ1rXuO1BUdqJB7zUVym5gzkOd9QxvC3rIeR8C0hkn+Js2RkIwSL5KBlkvt6RoSe5g5w3kwQ5la+CX2Dd57mc1LFWrcZPyaMAyR+gHMKKxpBFjPtmQcvsKOrG9uTuL/parIw/OLvCnhSQ86w X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 04, 2024 at 05:24:48PM -0700, Andrii Nakryiko wrote: > /proc//maps file is extremely useful in practice for various tasks > involving figuring out process memory layout, what files are backing any > given memory range, etc. One important class of applications that > absolutely rely on this are profilers/stack symbolizers (perf tool being one > of them). Patterns of use differ, but they generally would fall into two > categories. > > In on-demand pattern, a profiler/symbolizer would normally capture stack > trace containing absolute memory addresses of some functions, and would > then use /proc//maps file to find corresponding backing ELF files > (normally, only executable VMAs are of interest), file offsets within > them, and then continue from there to get yet more information (ELF > symbols, DWARF information) to get human-readable symbolic information. > This pattern is used by Meta's fleet-wide profiler, as one example. > > In preprocessing pattern, application doesn't know the set of addresses > of interest, so it has to fetch all relevant VMAs (again, probably only > executable ones), store or cache them, then proceed with profiling and > stack trace capture. Once done, it would do symbolization based on > stored VMA information. This can happen at much later point in time. > This patterns is used by perf tool, as an example. > > In either case, there are both performance and correctness requirement > involved. This address to VMA information translation has to be done as > efficiently as possible, but also not miss any VMA (especially in the > case of loading/unloading shared libraries). In practice, correctness > can't be guaranteed (due to process dying before VMA data can be > captured, or shared library being unloaded, etc), but any effort to > maximize the chance of finding the VMA is appreciated. > > Unfortunately, for all the /proc//maps file universality and > usefulness, it doesn't fit the above use cases 100%. > > First, it's main purpose is to emit all VMAs sequentially, but in > practice captured addresses would fall only into a smaller subset of all > process' VMAs, mainly containing executable text. Yet, library would > need to parse most or all of the contents to find needed VMAs, as there > is no way to skip VMAs that are of no use. Efficient library can do the > linear pass and it is still relatively efficient, but it's definitely an > overhead that can be avoided, if there was a way to do more targeted > querying of the relevant VMA information. > > Second, it's a text based interface, which makes its programmatic use from > applications and libraries more cumbersome and inefficient due to the > need to handle text parsing to get necessary pieces of information. The > overhead is actually payed both by kernel, formatting originally binary > VMA data into text, and then by user space application, parsing it back > into binary data for further use. I was trying to solve all these issues in a more generic way: https://lwn.net/Articles/683371/ We definitely interested in this new interface to use it in CRIU. > + > + if (karg.vma_name_size) { > + size_t name_buf_sz = min_t(size_t, PATH_MAX, karg.vma_name_size); > + const struct path *path; > + const char *name_fmt; > + size_t name_sz = 0; > + > + get_vma_name(vma, &path, &name, &name_fmt); > + > + if (path || name_fmt || name) { > + name_buf = kmalloc(name_buf_sz, GFP_KERNEL); > + if (!name_buf) { > + err = -ENOMEM; > + goto out; > + } > + } > + if (path) { > + name = d_path(path, name_buf, name_buf_sz); > + if (IS_ERR(name)) { > + err = PTR_ERR(name); > + goto out; It always fails if a file path name is longer than PATH_MAX. Can we add a flag to indicate whether file names are needed to be resolved? In criu, we use special names like "vvar", "vdso", but we dump files via /proc/pid/map_files. > + } > + name_sz = name_buf + name_buf_sz - name; > + } else if (name || name_fmt) { > + name_sz = 1 + snprintf(name_buf, name_buf_sz, name_fmt ?: "%s", name); > + name = name_buf; > + } > + if (name_sz > name_buf_sz) { > + err = -ENAMETOOLONG; > + goto out; > + } > + karg.vma_name_size = name_sz; > + } Thanks, Andrei