From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1ACEC10F1A for ; Tue, 7 May 2024 16:28:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 384A06B0095; Tue, 7 May 2024 12:28:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 334D96B0096; Tue, 7 May 2024 12:28:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1FC0B6B0098; Tue, 7 May 2024 12:28:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 0292C6B0095 for ; Tue, 7 May 2024 12:28:03 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 9DB6BA0DE8 for ; Tue, 7 May 2024 16:28:03 +0000 (UTC) X-FDA: 82092131646.10.679AB03 Received: from mail-ej1-f52.google.com (mail-ej1-f52.google.com [209.85.218.52]) by imf07.hostedemail.com (Postfix) with ESMTP id C4D1340015 for ; Tue, 7 May 2024 16:28:01 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=iu3lELcI; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.218.52 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715099281; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rICwxurU6A+ycOmFzrxmjMEY+vAR/gQDdccoIGBtMqA=; b=F8jUju9SthxzkhywIRuXHzoKB9Cj9DIea3sxDLFyu5JVGn+AGuG7XzHjKGOMGfLqekCTyB WgMkPw/K1rLjZONL+ER4DHy5YxC5nUrdo90w72Tu3dDyFd31XC/vRqXHmEpXJ88FrW5NOd 9x7ALeA+saBBCJmohtojB3exYc/txb8= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=iu3lELcI; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.218.52 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715099281; a=rsa-sha256; cv=none; b=4lr9MrbD3J1Lzy3IC9LaQ6eNmULaTHLuhffIeq5faXoVdxqQ3QjBghP82KajJL59Z0kZlP LPNuJZBJMiU/NIljRBBTmOHEAjHD9dCiZYm10iV8mKeKfCbsinrLWTRt/R7/VlpDAR5c+Y 4f/L3jjYsUxfIGhaPIZ2xNkZbxyXWFs= Received: by mail-ej1-f52.google.com with SMTP id a640c23a62f3a-a59b81d087aso722274366b.3 for ; Tue, 07 May 2024 09:28:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715099280; x=1715704080; darn=kvack.org; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=rICwxurU6A+ycOmFzrxmjMEY+vAR/gQDdccoIGBtMqA=; b=iu3lELcI4MNmBgshWHj+3Zw+F/BY920Z5GYZuYxX4sm8LzAcx4Bi+2GF027E7ce9Dy NRcG8601PZCkwebPKsDnKjdC+iymSvWfN7qsfW3sP/hnYqywgeXvubac/JeoirhDh/2k C1uXr290pAsdcZgUvim6wAi69OMZdUObX2OXPi6wCRatuCyz40pmzfYE5ZJxRY/GLvoe hskt6mywwRN4/ejnP5Tzg2YDO6sqogW8XMY0qZK048RW9icwSkhqQmy5UTnXJCe/DWrp cMjskCNyHY/1fw5lVG7a0QKYDW3LKT/KmteeoYv3Ac/7E2rueXqw78KrjCB4q/E7bL/c 92Cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715099280; x=1715704080; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rICwxurU6A+ycOmFzrxmjMEY+vAR/gQDdccoIGBtMqA=; b=hD2YIo0f8f9rwWIsrt6Way/Mpe38dccy0BtlnsIbOD551WaX8DYQVyVQXt8cYQ74wK 6jCG8faYlvIjIpv2BKxqw/0ETZH9Ho2NK83o7d67iVjsQ0r2jh5R7V/gyt1ZR2wCnuW9 jZ1x1pkLaDQLNezyI2776nVFKkfVuVbPZsHNR9bXC5gmgDzWmgD07PygLO0dD6rZkyNK PzByflxNV5l+7Fg5vyXAN/59OOT1tOWjsk9M8O67k1E4djFWKLWLSEudXEE3DE7zsGJj IriK7zdN8fYgKN1v3qgIbuBW0ZnHdSQjQhLiPiHNZk4mOi3vKt1fgC/pYgK/sIFklHiZ Uepg== X-Forwarded-Encrypted: i=1; AJvYcCWhrmR9Vm4faisCFh1FZQ+2143m6Y9SKgMqPMNvVYGThbHXVJLwgMMnRuBC+/wXKI9C90lYjXl317XJbWHuCaoFUL4= X-Gm-Message-State: AOJu0Yzjb4dlkndklwodZ20C+/vqHnFnyaSgm+W+PXXKaz2jRnmCzVfH aq6wUnrPToGi5Eiw7T5gPIkJJ5x773JH5pk5hPrsnY6dwA4a8hRlhJ88/EWx9UPGxsFnkQ1wE5S zec48uioaHk0o+JdmUHgo7K+NFKc= X-Google-Smtp-Source: AGHT+IFn3p6+IU8MNAx4QtWppQukiCZTYwwesAVqUICe2KaycZuCykzGNsekqL7kux4xcd7LRFMUFDcyPHBLQ7Tec6w= X-Received: by 2002:a17:907:3f9a:b0:a59:c5c2:a31c with SMTP id hr26-20020a1709073f9a00b00a59c5c2a31cmr8176135ejc.33.1715099279965; Tue, 07 May 2024 09:27:59 -0700 (PDT) MIME-Version: 1.0 References: <20240504003006.3303334-1-andrii@kernel.org> <20240504003006.3303334-6-andrii@kernel.org> <2024050425-setting-enhance-3bcd@gregkh> In-Reply-To: From: Andrii Nakryiko Date: Tue, 7 May 2024 09:27:44 -0700 Message-ID: Subject: Re: [PATCH 5/5] selftests/bpf: a simple benchmark tool for /proc//maps APIs To: "Liam R. Howlett" , Andrii Nakryiko , Greg KH , Andrii Nakryiko , linux-fsdevel@vger.kernel.org, brauner@kernel.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, linux-mm@kvack.org, Suren Baghdasaryan , Matthew Wilcox Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: C4D1340015 X-Stat-Signature: uh8epfeatzeiz9jxon9sp3qxfef8q5iq X-HE-Tag: 1715099281-264662 X-HE-Meta: U2FsdGVkX1/3It3y/YpHmdp7m34K/4UxKHrOMQIx6mLIoJYwn9/JLBLZtzpJOPrcJN792EFV98O7o9/hDB0Lo0BCp/vASQB8q52YR1eAaS2i4GW0BroVrLN8JnUe9XhYUX9IjZxHGTSa5IMciyKFCpzyEfwNcMgIBobNcUC2ETpgiSPlBChB+FypdbToybbsAh3XP5nUTl7QZ6IJGX2EYZzP2L0VJGaENwPLuBGaCDqGwI0ImSHfMl/t+eDHwOhz3noiQjvmam6fb2TW7dEcJARteIzrKTrzXs/jl2u3UfvmXwW+LFPOp3+N9IaLuFA8gwgNGEBUpNUeYLzSRb/ddCOnecFDruAgn70bF6QQrwplRjqUl8IwTR/KqFoFOtw3q3sY3Cg1FLq5XocUZZc5b28pu1cgrjER08e8yCDW/AIZxLW3LwViY0TsjCEJbtbF50KjyYJaNDFYsCmSgJeaa+pQ/uTSTpDG3fbrDbt+CBj6TLByIs3ayrazFxf9mLQblHiDIkIiKYouCPeZUKgKvKltcwGlPeP7EXWaDRoa2IFCIpHdMy3+TxIomQTdcf5PCn3EZHSLo+IZDgPqUsppQkvSfdDraVlTV4enWSicRu+GHpBkJ3E/RFsnbPz9REvniGIYV+xYIApYWsVBDWU4FyrTDFYz6c7yuYHYpFIOZjFU3j0Xlyz/0DJRu0McA+N5KMd8jd0s4BjjuQm9Pus06bWeHbVcN3vlu84ZcyB+proEwmS8lGKbzYnD+Xt+LO5kNlnQLS69FnGgK/sRtyZlFypLqkaDSduFANwnPI92sOPvVLGJeScEwXVraYFxh8aiOciaE9Lya+SdIbt19ETVdOSYVBwsnzlK0e/oNpg/6wxSJ3QtKMklBKmIRHMEXsUtIaVAHREoIgys3YvHgsU5bN5l8EUcl4O9n0fPSO59TnVYkCyMxn9eNUo7uiQ/fRiSxi/IqhU67vEuFhPawgL +ccH3yfl iGnr6ihPRc6suSB+J85h8v+yAoq0NhQ+3OCwlmsfXqBxpyWT93Vcl0MHS7e8nNE1MEioUDj1Vi8070MS+dIsdu0aShcpZJDz9egZXw4CHYfLO+/9TadXXtIoO9TFzwfX34+jiBnHsFn1kqeqLdfYZek/yw/OqehqCcwrLq0yWobZfIJ9wUlTLTw8w56bRw/2eLisYHNm/OFZ8kK+cLeUGKXwKbFIPQBLuzu68sB5xcAsLmfW8dYGAYbCKzx3YJKW+wuRfDTXB4NsLb6ngp8cn8/1ObQhpWKPiG2ZI5FUBsFgff296hhxMg+8s6XTCeCelEQwr/h9WZ16SOuq6jCvwlRMYIfsKTjkyqXU1kViQOiNwS7UdabxLsJTG6NHHF0FV40wWiDre+NJ7uo50EuBux6uWX4BLMTqZthmEJPIUmTF2LwZ5zvfhuklHnifHo3sBkkyLlm90oTjSGnB5p8KltYH4TwAzUVsxydEYabidL8ztJbwsLWYH60j+xz8AuxvDyuB7hjPS0aW9+3RoIo8mvSrrlxmuIyJUfUdQGFszsuXf0yQbFMcPS3/lB7759RZlDvtOQVSZAEC3CIfQtAAWUPrSHbr4PMtxDM9G+mDyfWkE/0Zhuz5fFckHl+YBcwmyHXJRbr1xeK9k7OU3FT/t9YMHuTMncRTO6BZTZCiY31rsiCI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, May 7, 2024 at 8:49=E2=80=AFAM Liam R. Howlett wrote: > > .. Adding Suren & Willy to the Cc > > * Andrii Nakryiko [240504 18:14]: > > On Sat, May 4, 2024 at 8:32=E2=80=AFAM Greg KH wrote: > > > > > > On Fri, May 03, 2024 at 05:30:06PM -0700, Andrii Nakryiko wrote: > > > > I also did an strace run of both cases. In text-based one the tool = did > > > > 68 read() syscalls, fetching up to 4KB of data in one go. > > > > > > Why not fetch more at once? > > > > > > > I didn't expect to be interrogated so much on the performance of the > > text parsing front, sorry. :) You can probably tune this, but where is > > the reasonable limit? 64KB? 256KB? 1MB? See below for some more > > production numbers. > > The reason the file reads are limited to 4KB is because this file is > used for monitoring processes. We have a significant number of > organisations polling this file so frequently that the mmap lock > contention becomes an issue. (reading a file is free, right?) People > also tend to try to figure out why a process is slow by reading this > file - which amplifies the lock contention. > > What happens today is that the lock is yielded after 4KB to allow time > for mmap writes to happen. This also means your data may be > inconsistent from one 4KB block to the next (the write may be around > this boundary). > > This new interface also takes the lock in do_procmap_query() and does > the 4kb blocks as well. Extending this size means more time spent > blocking mmap writes, but a more consistent view of the world (less > "tearing" of the addresses). Hold on. There is no 4KB in the new ioctl-based API I'm adding. It does a single VMA look up (presumably O(logN) operation) using a single vma_iter_init(addr) + vma_next() call on vma_iterator. As for the mmap_read_lock_killable() (is that what we are talking about?), I'm happy to use anything else available, please give me a pointer. But I suspect given how fast and small this new API is, mmap_read_lock_killable() in it is not comparable to holding it for producing /proc//maps contents. > > We are working to reduce these issues by switching the /proc//maps > file to use rcu lookup. I would recommend we do not proceed with this > interface using the old method and instead, implement it using rcu from > the start - if it fits your use case (or we can make it fit your use > case). > > At least, for most page faults, we can work around the lock contention > (since v6.6), but not all and not on all archs. > > ... > > > > > > > In comparison, > > > > ioctl-based implementation had to do only 6 ioctl() calls to fetch = all > > > > relevant VMAs. > > > > > > > > It is projected that savings from processing big production applica= tions > > > > would only widen the gap in favor of binary-based querying ioctl AP= I, as > > > > bigger applications will tend to have even more non-executable VMA > > > > mappings relative to executable ones. > > > > > > Define "bigger applications" please. Is this some "large database > > > company workload" type of thing, or something else? > > > > I don't have a definition. But I had in mind, as one example, an > > ads-serving service we use internally (it's a pretty large application > > by pretty much any metric you can come up with). I just randomly > > picked one of the production hosts, found one instance of that > > service, and looked at its /proc//maps file. Hopefully it will > > satisfy your need for specifics. > > > > # cat /proc/1126243/maps | wc -c > > 1570178 > > # cat /proc/1126243/maps | wc -l > > 28875 > > # cat /proc/1126243/maps | grep ' ..x. ' | wc -l > > 7347 > > We have distributions increasing the map_count to an insane number to > allow games to work [1]. It is, unfortunately, only a matter of time unt= il > this is regularly an issue as it is being normalised and allowed by an > increased number of distributions (fedora, arch, ubuntu). So, despite > my email address, I am not talking about large database companies here. > > Also, note that applications that use guard VMAs double the number for > the guards. Fun stuff. > > We are really doing a lot in the VMA area to reduce the mmap locking > contention and it seems you have a use case for a new interface that can > leverage these changes. > > We have at least two talks around this area at LSF if you are attending. I am attending LSFMM, yes, I'll try to not miss them. > > Thanks, > Liam > > [1] https://lore.kernel.org/linux-mm/8f6e2d69-b4df-45f3-aed4-5190966e2dea= @valvesoftware.com/ >