From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BFAC1C4345F for ; Sat, 4 May 2024 21:51:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 392E26B0088; Sat, 4 May 2024 17:51:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2CC5A6B0089; Sat, 4 May 2024 17:51:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 11EFD6B008A; Sat, 4 May 2024 17:51:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id E09E56B0088 for ; Sat, 4 May 2024 17:51:00 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id E0ECEC01B2 for ; Sat, 4 May 2024 21:50:55 +0000 (UTC) X-FDA: 82082058870.09.1F34D1B Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) by imf25.hostedemail.com (Postfix) with ESMTP id 235DFA0012 for ; Sat, 4 May 2024 21:50:53 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=gmU1019T; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.210.179 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714859454; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bRbD5K4zZTdGsoYmonQUDEJ6IC05hDKgzajoRSsce9g=; b=thVXT5JzvXnv2ZRamX9s/wpgB1vDyUV6oVGfPZiWdDIrDY6QbIuUJpvq+LhtqsitCiJqmM Cj9OHMH1qViA8oC9SQblQ7X/ZKYiXtiwCZ47YUZ+cxWchsYIE+3w56HMfcF6Aak527inYk +9KopZqgoC5DT3jnPGqlVIIHdxXpQmw= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=gmU1019T; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.210.179 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714859454; a=rsa-sha256; cv=none; b=Rn8nambxAYx6/HWJNwevlafkjdmJvXbPWuSTMSsdO1soUN4k3FoYTkw6I2PjhFD8TwxN+F 2rwXvK8zsSq8IRa9bcPn027V0UQPLNbT882pJtoie3veZPWbPHXIbtIOWdbDoiJg1vJkOf nIYK8wMf/8oRiIngxK4e5lzFqsMn2g8= Received: by mail-pf1-f179.google.com with SMTP id d2e1a72fcca58-6f43ee95078so646782b3a.1 for ; Sat, 04 May 2024 14:50:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714859453; x=1715464253; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=bRbD5K4zZTdGsoYmonQUDEJ6IC05hDKgzajoRSsce9g=; b=gmU1019TwgfAvLRMbN4cdJTDiVDpJsU9SvCzUYsJPaJXyH3JuMYLMp2Y7jiY7h14Zy NVhrFgcxjW+xi5a/xoDd3tGfNi9yGuN/w0gSL2uyB3t/L35o+pLlr0LOwRWzpHaA/qiW ivmI9Qqo6536Xw9POyf6faRBS6Bq9Ax9gXQ8R5dYnaACofSlpujUDc9hzy397rPxLG7F DRgkwfkZjEFEwhMTy+i39oqjRvyNq2jj8N69h4QW2U6Un4Gbl7vDWhFPXtyeDAed/RDp KXXTs84tIxKGsEzDfMWO438RjSVtQDHEw16ySCoCh5VVlXn2ZeGD8UTzMfX6hTEQbr4o lWnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714859453; x=1715464253; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bRbD5K4zZTdGsoYmonQUDEJ6IC05hDKgzajoRSsce9g=; b=YTeUvHXNplrIGq2rBc9cYMXmi7n+aVlhSPQXKSpDdKocnwFQ6ONfDb4psq0VnhoPnD 6pfZymFEFhfaLSKrH+lgFUo1T/37PO6Zs8YRgPA0K5enJtgcIBtfhvBrIM8/jD67PaQX 5LljGt7Sx3aK3VbU8ZiUWYQHJBsSxGvERDb6ydlBja+rR5+sxj+Jfi2a+Hxywc27Qj6Y mUw1GTNqKrk6Al5kqBrr/fN/cdGhLykLV8JADI+r6kZOmOi78GYTRvTDkn9MHVPEdrad /NFBd27eM0iwNf74umWSvmol/J1Mbd4QEk+LGrOWXypWfbSG2sGhdTk3JtsL3Gg/LoY8 SWng== X-Forwarded-Encrypted: i=1; AJvYcCU91LOHN1qinMyZgV697mIiAhuoy7JeQ1NarwoqzIH2wYcNLWfULSryPuKQbOZv5xtps66WqwaXGSeLel6EhD3kYoY= X-Gm-Message-State: AOJu0YxahUojAbJY1HBRGjPSSGWjTkw4F6e52L7FY662wNEV8haDPhUD IV5DaTt3OUXFD3n25e6QLj5PzEiLIde9ghZ0Th0pJOrgwFvCvoBZJp+QA5ODkY8CUjRrrDKwUfT cH0Mtt0zKMsbqGW4vl8LKBR7tv1iPIg== X-Google-Smtp-Source: AGHT+IFJG7mKnewTXcss2FhclEbK/u9dvFW2JvKWtk8LfLNNgKCzHSL2ouZ3o44Ga0XaN3kS1y+lKBfVqXyjDamvMVs= X-Received: by 2002:a05:6a21:33a6:b0:1ad:7e4d:2ea6 with SMTP id yy38-20020a056a2133a600b001ad7e4d2ea6mr6833735pzb.20.1714859452765; Sat, 04 May 2024 14:50:52 -0700 (PDT) MIME-Version: 1.0 References: <20240504003006.3303334-1-andrii@kernel.org> <20240504-rasch-gekrochen-3d577084beda@brauner> In-Reply-To: <20240504-rasch-gekrochen-3d577084beda@brauner> From: Andrii Nakryiko Date: Sat, 4 May 2024 14:50:40 -0700 Message-ID: Subject: Re: [PATCH 0/5] ioctl()-based API to query VMAs from /proc//maps To: Christian Brauner Cc: Andrii Nakryiko , linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, gregkh@linuxfoundation.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 235DFA0012 X-Stat-Signature: 5r5tdihixju8jgb5zkaqwc44ho7n1jmg X-HE-Tag: 1714859453-213454 X-HE-Meta: U2FsdGVkX1/gruZxdccaAc645paD8m36Ttjrxr8jrTbmh8WR2TJCmRyde+vhKaDF27qpB90v7H/dJSRCfKUXLnfV1BwbLk87DJxmA3MDwF9y0KW3AvXJzDT/NLUnJpfmW1Ibwbdr4yWHWxbjUhetOFxxkncw9MgZ+O/c5yje+q6LrqmJTZGH+1UxVBPmU+UKLEiSWYc0xVVbSeAFNnd9axQKyQAvJp6IWE8W02E7R5N/w7qukG4ix8bj+9ERpoy/kckr9EYSsafsDAFrhJdwdn1wZZTOqRy4eo2avvs9bYMihomOLR3lACtiRIJBMjwiDR0LbgYooASk0OQmrAcwQDCGAAk+Y2MuXccfYwSNDaLZYRBckysBjJmyxGsjxTknmoDmEsmool9xpV0H3Q6zYsb+slGawjH+OQUoFXt5qdu5UeT0IEs+9oBt1I7llC9xBGrJdtvuFgU8v/vHsKkWf1vH+31RBidZL4g1iC0CeMAc0kygqLabU+stl7j9eexkp6GJbsowv8cVg0Hk4UIqSpl1bhNXDc9asvYQrApuGH5gJiQMAfxO+LoQJep/bXogtTI4T/thX9G5mjWC4aHQ9LH4vk3d9mJ8sjp2wRfLKR4zPt5v3AySl1cHhgRcxvjihe6l63XIMBth0wkS7BZK8iaxyWaIn/7aHAo0hYSB4GiK9Fnq4UmPTlqRWRZT2ahsYCgflyP3G9jvlYNJcZByUuD1QBMCr4Nf4kyahBS9sss+IRcp5WnyjaW2K43xCg+Rn7CnY/JWn8XdGLEbRpu6yJgBV5NxmThpd9KX+YgydVIEzGkW2CXNmi3hVpH9vs87KKPE6F/rAc4/OPXw50vJH5BLb2HdnTuisG5jQkxc9nh0jZXGvCZCPAZNecUiNkX39q4aLXnP10dRjoE9aN2Z9NBctHUule49azEGbylozHJxUJ0aBhqMPqBmS4O3AdgLLj5vPWUvXOLM/6+6aF6 SdnVmIey bCiljvEg03SoXiIZj+ABjhVr7xlHb+CTvnqCRqJ+3pe+AOgbxE/MQULm6uKpY118BVyyyYraKJyAQkVt1unjLV7i4DxEaDMGvXIjLu3V+0ZbOqtPO9cfoC+i2FRM1tmCXCifc67X9IqVVZk8GsYjG//bpYs+6oucDazopS3o2AM6XELeWw4ZEnpI90vQC9sPpetXFszjsY1JlBUJb+u5lmpjr2MO/4zImjJb5YdXVo+tNA7CignlcqxOxyeQr9lWsfgwonU0J7mES53qH3O6xVtlIStD+ZDQshCWLLRRrEFV10Ilp6AmDgXFCa8XIcqdAqctJl8eBRqcyOxY4sz/c+6lNA4hIbT2a/YHduVuih8a3SxgIyyN9JuJydMpQm3smZ6apectTPHFZ5nXXHCXDF8MfkqrAoPa1iUHJQ7ecqe65mSw+6AGbvCRMhJbr8Swk/x/o X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, May 4, 2024 at 4:24=E2=80=AFAM Christian Brauner wrote: > > On Fri, May 03, 2024 at 05:30:01PM -0700, Andrii Nakryiko wrote: > > Implement binary ioctl()-based interface to /proc//maps file to al= low > > applications to query VMA information more efficiently than through tex= tual > > processing of /proc//maps contents. See patch #2 for the context, > > justification, and nuances of the API design. > > > > Patch #1 is a refactoring to keep VMA name logic determination in one p= lace. > > Patch #2 is the meat of kernel-side API. > > Patch #3 just syncs UAPI header (linux/fs.h) into tools/include. > > Patch #4 adjusts BPF selftests logic that currently parses /proc//= maps to > > optionally use this new ioctl()-based API, if supported. > > Patch #5 implements a simple C tool to demonstrate intended efficient u= se (for > > both textual and binary interfaces) and allows benchmarking them. Patch= itself > > also has performance numbers of a test based on one of the medium-sized > > internal applications taken from production. > > I don't have anything against adding a binary interface for this. But > it's somewhat odd to do ioctls based on /proc files. I wonder if there > isn't a more suitable place for this. prctl()? New vmstat() system call > using a pidfd/pid as reference? ioctl() on fs/pidfs.c? I did ioctl() on /proc//maps because that's the file that's used for the same use cases and it can be opened from other processes for any target PID. I'm open to any suggestions that make more sense, this v1 is mostly to start the conversation. prctl() probably doesn't make sense, as according to man page: prctl() manipulates various aspects of the behavior of the calling thread or process. And this facility is most often used from another (profiler or symbolizer) process. New syscall feels like an overkill, but if that's the only way, so be it. I do like the idea of ioctl() on top of pidfd (I assume that's what you mean by "fs/pidfs.c", right)? This seems most promising. One question/nuance. If I understand correctly, pidfd won't hold task_struct (and its mm_struct) reference, right? So if the process exits, even if I have pidfd, that task is gone and so we won't be able to query it. Is that right? If yes, then it's still workable in a lot of situations, but it would be nice to have an ability to query VMAs (at least for binary's own text segments) even if the process exits. This is the case for short-lived processes that profilers capture some stack traces from, but by the time these stack traces are processed they are gone. This might be a stupid idea and question, but what if ioctl() on pidfd itself would create another FD that would represent mm_struct of that process, and then we have ioctl() on *that* soft-of-mm-struct-fd to query VMA. Would that work at all? This approach would allow long-running profiler application to open pidfd and this other "mm fd" once, cache it, and then just query it. Meanwhile we can epoll() pidfd itself to know when the process exits so that these mm_structs are not referenced for longer than necessary. Is this pushing too far or you think that would work and be acceptable? But in any case, I think ioctl() on top of pidfd makes total sense for this, thanks.