From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C582C27C5E for ; Mon, 10 Jun 2024 08:18:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7C1106B008C; Mon, 10 Jun 2024 04:18:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 770266B0092; Mon, 10 Jun 2024 04:18:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 638556B0093; Mon, 10 Jun 2024 04:18:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 466546B008C for ; Mon, 10 Jun 2024 04:18:00 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id AFB11160DAA for ; Mon, 10 Jun 2024 08:17:59 +0000 (UTC) X-FDA: 82214275878.14.B65D4EE Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com [209.85.216.54]) by imf12.hostedemail.com (Postfix) with ESMTP id E4E5C40011 for ; Mon, 10 Jun 2024 08:17:56 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SlKQkfGJ; spf=pass (imf12.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.216.54 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718007477; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=boO2l3vZN7nH3KvrAr9prlBUcFPwMQHcvTnj1oW/1V8=; b=4K+uLSIuNNZTGXJNXob0YEsWXNDVOXfxCw8kb5DF416ie38DysFMqtJcb5zA12uUFq/0C2 m1b+tQo+7i7yxr+8VZfxybIUMJmZpyPvVziaol4pzO+smF67PCCdtep65srAp2H06YoUNs zTlOWUNyc7Q30G3ABYnPSbp/AuNwNyY= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SlKQkfGJ; spf=pass (imf12.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.216.54 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718007477; a=rsa-sha256; cv=none; b=fpKIzD9bK5MVchPkzKwms0DPnX8zsNaXhNcQLBpOQnm598UEgh6CUHPP1RB8RVuCWLnKoZ Ngjtajo9FwJxkPZ2/hMnN71nkAzWCHihdlLjS4up5hr5MMDUCUHZS7YmLQOYmM9NeAfBIX ASTrmviOPK08zkBCbEe0rL2mORsyUVw= Received: by mail-pj1-f54.google.com with SMTP id 98e67ed59e1d1-2bfdae7997aso3291016a91.2 for ; Mon, 10 Jun 2024 01:17:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1718007476; x=1718612276; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=boO2l3vZN7nH3KvrAr9prlBUcFPwMQHcvTnj1oW/1V8=; b=SlKQkfGJj3Qbyf9H1ES3nhHwzvF1vCocMk1mcbL6KN2VFj2Bj3CATbJny9V6NdCRyo aGiEVG9nmaTwa3Fq/XLmE1UQ6dlBngzzPJ2VeQpIjF4MAFVu+C960dwFaSftng02GdsS FkJOumCRJKwCci2aFy24OUcRfXNdM0D11wupfpihTT7nQ2ubKxNNIWb1CPm7dAUc8QLH ZPTBUCkDZyroDPY47m3ssPjfHHQouTKe8lB4mrSyGBJ/+1MREeozn0aAXSDQCOLpZ6tv iCfvtNBheyYjze4AdGr1sLL6H3SmTeoeh+p5f9bQ5qGg7bQpV6LlwqzhqpYKlTgw5i+t T8OQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718007476; x=1718612276; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=boO2l3vZN7nH3KvrAr9prlBUcFPwMQHcvTnj1oW/1V8=; b=Ketc43dL9Mstie9BaG9CdhsogFVV+yktI3Uav5TsfuQKe/+gRQ6rLeuyrXW/c/I4PW OocxVuwU1boE+/aP6jjV1ZSom//HSpeN8ogh9J2XTW0ZNDRZaH/g6ZO3eh+jj7ARBi8o q5SO2PlrZ3o09mYpGBTp3RmvliRbznB4Ep1iGGpfH0P0vLoWpigLlCDCqwDLJTJoRfew SuWcZXSBLGiIsHdBBT6XXwlQ8v4qteWQTw5J2qD1aeWjzJ4gzd2EqOmbkSbsRKAlQVkP H9lsXeLwD8o1CLtJ4SpfWd3ezA0va6OyE+IPdjyeYq+8ZWKeEC5JwyUgEnGAlMjSdsjb rE8Q== X-Forwarded-Encrypted: i=1; AJvYcCXUzbTnS3BtKucICu3mhDieEadfta6DoDwp3vrCeBxBJlRb/TtMZcnAJ3rK9rfchCqgmM/fo8eZWQ0zrzartEk6Q+s= X-Gm-Message-State: AOJu0Yx1CcGy7P710Z2YQKlR79h/QodHLbBxaDNzojYqZ9YDYuv0norc QxE+7g8k92YZG8aT9tNPZlieX0iFVJSWs6LgCPlMkeF9dlll1KxthGn3VXxASRV9VjjUfTUyNz5 pQqM5CSBV+hgXgvLuy6ASaybqtzVtj1JlwGs= X-Google-Smtp-Source: AGHT+IH0FNZaeJCNfSLD4WEifC4IPCYbk21g97wqXKFaiT0J77VYQN7/t7VgnP5Uc4iC2YUAA8yNQ8/3tWb10uQonKk= X-Received: by 2002:a17:90b:b15:b0:2c2:fe3d:3453 with SMTP id 98e67ed59e1d1-2c2fe3d3560mr2157855a91.18.1718007475751; Mon, 10 Jun 2024 01:17:55 -0700 (PDT) MIME-Version: 1.0 References: <20240605002459.4091285-1-andrii@kernel.org> <20240605002459.4091285-4-andrii@kernel.org> In-Reply-To: From: Andrii Nakryiko Date: Mon, 10 Jun 2024 09:17:43 +0100 Message-ID: Subject: Re: [PATCH v3 3/9] fs/procfs: implement efficient VMA querying API for /proc//maps To: Andrei Vagin Cc: Andrii Nakryiko , linux-fsdevel@vger.kernel.org, brauner@kernel.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, gregkh@linuxfoundation.org, linux-mm@kvack.org, liam.howlett@oracle.com, surenb@google.com, rppt@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: E4E5C40011 X-Stat-Signature: iwtp8g6zkcyjfdihrxfrshqhweh83tq5 X-Rspam-User: X-HE-Tag: 1718007476-696396 X-HE-Meta: U2FsdGVkX1+JWHRMR+CQ/kdawOMeu9ojICZ8pLwIwaF+McEnvsJGxuQ16TW/7I5Fk4pD3HhIZF9/cWOngTUMXZKJ2/TYdjt29MYHYfD1olOpj/gvRklFnYIWxQ7roz3wyyliX9ZLccxujFRZZUxr6Xry5KG7sMESSfRXmiLPJiZVUunZxijSf11VZRBP2g6GtUsKIm0q9NWor8jVdEkwqUepR8z21xyUxZSwsW6BNGT/hJzuyDVcV/mqRDc4/WGyiMOjGy8WlOcCmpv4FLwbTSCdINecppDgkfF/nXLilqdGIef/7walMDEtuOEiVKxVUJl6hBVuZRitp3brGjdl74SQYCqCU+HmthRcQeZBE0O1phO60RutjhUmIa0SX2l3Aw6Q1wLSa9s1gyyJJNHHoyc6BcwO+uqgSz8j/y22J+wv/e0TlU9KMpMgzFkhpe8UX5MHLPTuLcDVH1JD25HqNP4u1XeZngw0qN8N3byVbXG4rEUZHjlmdksxO9k3qaJtmUMnFR8DWzwX1sH3BCVKbrBqY7WRHqE8J5xBE6rFRZyGBIyNLJHqMy2B/uEscHPdLzebQD/1WKaJSSepvlAwTD9jF4cLZjzgCsa8gEUrsUdN+wMmBqS5m+KnQMpGGWbdWjk5kcckLDpkOFqYtgD9bAhRoR+YLdjQD18X4sVf4xa5z+34ITd+jo1EtmRf7h+ClsDHxJBLay8JRezQaK4lP7pAp/jzzxXDD5ahDPWBPsHYM71QdAPiVcWJnd4lsIASaCnFsLj2gaUGirl3duybuXwGftJFMHq4mmE3JzMXuHw2gNfsgBDoQK9O9mjUytaYvaQautMnri6RwYMuO7+/hxQ5f9WglESjUgIWkRxqe21Ji1gLxMTT+/EGxV1rS5x5hW77twN8/E2UD/N44AH5NRlH1ZKZL6L2mIjAE9ekRqOfjE5s90++Sr+Gv3lnBwZt+lGg0UtFD90+v5J4m/h NGWjXW3o cQ5moBDGZek0Z0ULlmW3WEg4mkLZeG1hqG4galVfrlRBlQmqXzWoRvBVwybInozK4xxPkxRKKBlhiBvoa+iA03ADakz5cT+m4jiBopYRqE0chRSTDnA7NxI7qtVhRydVb9rhkQobkzveKSLqa2iar0Y5x81rOUju0RRKBl2rP4WpvCIhNfTdL7Um7D8hRlCx2wkmLgLOxpPspAx/VwtupsTzV+6LWch5hD9PQ3JRYlrVNNuj375hevsjCqbkCqb0ulwm0mPXrqNxCDMM+BcsOACObPm/ZgpTk5brCvyrnlW7d9FKO5grcLpgUsDWaX1Sd7vbqhXy7/wg26WlDSpdIOf+pabSWjBCE/dcTt30xaPAE+n5O4A1idEHJC+H2nKdqZuSsJDR2QbbhCr66YejfjOKKg0P1J3+SWpeTghHTgcex7gTSmQxBUhOG/9Bdo0S26nzwcH5ankvUsJmQg5hHnSN6LQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jun 7, 2024 at 11:31=E2=80=AFPM Andrei Vagin wro= te: > > On Tue, Jun 04, 2024 at 05:24:48PM -0700, Andrii Nakryiko wrote: > > /proc//maps file is extremely useful in practice for various tasks > > involving figuring out process memory layout, what files are backing an= y > > given memory range, etc. One important class of applications that > > absolutely rely on this are profilers/stack symbolizers (perf tool bein= g one > > of them). Patterns of use differ, but they generally would fall into tw= o > > categories. > > > > In on-demand pattern, a profiler/symbolizer would normally capture stac= k > > trace containing absolute memory addresses of some functions, and would > > then use /proc//maps file to find corresponding backing ELF files > > (normally, only executable VMAs are of interest), file offsets within > > them, and then continue from there to get yet more information (ELF > > symbols, DWARF information) to get human-readable symbolic information. > > This pattern is used by Meta's fleet-wide profiler, as one example. > > > > In preprocessing pattern, application doesn't know the set of addresses > > of interest, so it has to fetch all relevant VMAs (again, probably only > > executable ones), store or cache them, then proceed with profiling and > > stack trace capture. Once done, it would do symbolization based on > > stored VMA information. This can happen at much later point in time. > > This patterns is used by perf tool, as an example. > > > > In either case, there are both performance and correctness requirement > > involved. This address to VMA information translation has to be done as > > efficiently as possible, but also not miss any VMA (especially in the > > case of loading/unloading shared libraries). In practice, correctness > > can't be guaranteed (due to process dying before VMA data can be > > captured, or shared library being unloaded, etc), but any effort to > > maximize the chance of finding the VMA is appreciated. > > > > Unfortunately, for all the /proc//maps file universality and > > usefulness, it doesn't fit the above use cases 100%. > > > > First, it's main purpose is to emit all VMAs sequentially, but in > > practice captured addresses would fall only into a smaller subset of al= l > > process' VMAs, mainly containing executable text. Yet, library would > > need to parse most or all of the contents to find needed VMAs, as there > > is no way to skip VMAs that are of no use. Efficient library can do the > > linear pass and it is still relatively efficient, but it's definitely a= n > > overhead that can be avoided, if there was a way to do more targeted > > querying of the relevant VMA information. > > > > Second, it's a text based interface, which makes its programmatic use f= rom > > applications and libraries more cumbersome and inefficient due to the > > need to handle text parsing to get necessary pieces of information. The > > overhead is actually payed both by kernel, formatting originally binary > > VMA data into text, and then by user space application, parsing it back > > into binary data for further use. > > I was trying to solve all these issues in a more generic way: > https://lwn.net/Articles/683371/ > Can you please provide a tl;dr summary of that effort? > We definitely interested in this new interface to use it in CRIU. > > > > > + > > + if (karg.vma_name_size) { > > + size_t name_buf_sz =3D min_t(size_t, PATH_MAX, karg.vma_n= ame_size); > > + const struct path *path; > > + const char *name_fmt; > > + size_t name_sz =3D 0; > > + > > + get_vma_name(vma, &path, &name, &name_fmt); > > + > > + if (path || name_fmt || name) { > > + name_buf =3D kmalloc(name_buf_sz, GFP_KERNEL); > > + if (!name_buf) { > > + err =3D -ENOMEM; > > + goto out; > > + } > > + } > > + if (path) { > > + name =3D d_path(path, name_buf, name_buf_sz); > > + if (IS_ERR(name)) { > > + err =3D PTR_ERR(name); > > + goto out; > > It always fails if a file path name is longer than PATH_MAX. > > Can we add a flag to indicate whether file names are needed to be It's already supported. Getting a VMA name is optional. See a big comment next to the vma_name_size field in the UAPI header. If vma_name_size is set to zero, VMA name is not retrieved at all, avoiding the overhead and this issue with PATH_MAX. > resolved? In criu, we use special names like "vvar", "vdso", but we dump > files via /proc/pid/map_files. > > > + } > > + name_sz =3D name_buf + name_buf_sz - name; > > + } else if (name || name_fmt) { > > + name_sz =3D 1 + snprintf(name_buf, name_buf_sz, n= ame_fmt ?: "%s", name); > > + name =3D name_buf; > > + } > > + if (name_sz > name_buf_sz) { > > + err =3D -ENAMETOOLONG; > > + goto out; > > + } > > + karg.vma_name_size =3D name_sz; > > + } > > Thanks, > Andrei