From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 579F1C369C2 for ; Thu, 24 Apr 2025 16:03:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0F5406B00B2; Thu, 24 Apr 2025 12:03:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 07C6A6B00B3; Thu, 24 Apr 2025 12:03:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E0FA26B00CB; Thu, 24 Apr 2025 12:03:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id BC1F66B00B2 for ; Thu, 24 Apr 2025 12:03:32 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 575ECBFEF7 for ; Thu, 24 Apr 2025 16:03:34 +0000 (UTC) X-FDA: 83369407548.13.6A22994 Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) by imf20.hostedemail.com (Postfix) with ESMTP id 70E801C000A for ; Thu, 24 Apr 2025 16:03:32 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=gicE3kpj; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.210.179 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745510612; a=rsa-sha256; cv=none; b=SENqTDpLZfKle8gn+/Wr/5pgIkUbwzJgrAM5mU50eXX+bb4gNQjqVO1EYrEwNiHObgSYaq 06xbqdJizdfIoWWykyFfKnf+c0YdMlEYJKJxaSeDOBAeJ9x7azujlwoD4Z8D0gzliBgnvX 0u1g7nw+uWInMAgx65eKTOyaCMaOG9Y= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=gicE3kpj; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.210.179 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745510612; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nr73UvGmhNbjdi/Hymko96JYMAKbguXj6Ma/dOTZBDc=; b=YGwR07+M/4zU88MU8LAqA7qChsJu2FR3eTO3Z9Mjv6ty+z4yjT3y/aOktfuIJYiPx+y1Ju Ae3e4DCS08+EU15IBOWZwcVegONrlvvDreL2yroKXzJnrzXHePsAf5WdDi2uSPJ2pLbneR nZ+hKZqiZPgcre/DXrnO36WovYHYOuY= Received: by mail-pf1-f179.google.com with SMTP id d2e1a72fcca58-7369ce5d323so1006122b3a.1 for ; Thu, 24 Apr 2025 09:03:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1745510611; x=1746115411; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=nr73UvGmhNbjdi/Hymko96JYMAKbguXj6Ma/dOTZBDc=; b=gicE3kpjCqmzVXPr8Lu4k71fzkRkLbiZFuyyNPwuI+NdX6NlnFVgquGV4Mlz3N4kqT Zh6JrtVvRjEXstefqIgcfDdiEY+k3w7egjtmOo3HQX/AP+TE6XN74tlTB61lVrkNV2Vy baFPLqVr83TkGKCRYa7VqmiapkKv//2qdOTKTLj5aSHFqCfAr7I5pbevGuGL3mr9GtbY KoBzdO/FISy2CuxUyRHBQWstix+OJ7k84KO9LlQlbdBQLmYXSnT899g0Jv3/1L8NdRos qKeEMT8d13xsLcX92hkwgHXxmljbsECpcbY2QJXuA5J9o/cRuzlV0lIm+QXe0eMTAU3F U7Iw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745510611; x=1746115411; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nr73UvGmhNbjdi/Hymko96JYMAKbguXj6Ma/dOTZBDc=; b=aWe5LkmL8hs92RgV6DeVScmiwX7p8EPbZr0qjlMURht//TIk4WR0mHzw0BwQ8h1ywv GMIcVUqUSzGqg7u5Dc8iaZwiKH+XIJS6uN2sHgFijsc6FGFqNCY/K4u7wPgRNwZVH9yq bxu+EGzd1FY06FAQnHwVC//yLlpltSTSFkpsO83ry56JACpgCnio1z8xQIRUVIYTPRRT 0wDva3Py7n5iMlIFQmtcp2uXqHoBwoTx8ygghwL2Xst0nYFnc3h4baaLcx7VEEnSN5vx mW+6w6wC5e/8NQhZIcLpBdDZRuKG3RIy8G6FH2r7I4DNTTnmyMy6r+rEurgJqiuKwb/z xYiA== X-Forwarded-Encrypted: i=1; AJvYcCURtov4y0OFH8OEtkE3OnVc/xsYnNHE/GtLBPMzZs5oklZr1uvCCOUZjvt85Mq+h+lISekTKZQpDQ==@kvack.org X-Gm-Message-State: AOJu0Ywe7qmeflfWvac0n4eRpUz14zWPbGfCyAc2eezVK/JVBDiTQfu5 QFV3XMrL2dpmyWzsL6Tzcf8mbvzmd5YC/CPP6r3aNoYNHFoQ2TNx00ttpweOjqZHNKdIqJopon9 G32sXFPopD+IPuZIyK180ED5SOKM= X-Gm-Gg: ASbGncs1iv9GsPeI4rNhIxAoLpleKnaQ3yrC5jrfUHbgFN/A3Xd5Adl3QXx08LItZyf Xe9aDwdYI7DWzhjlBgetaoXsQQ7d6DqPR+ZVNWK8LUD7WpaL99JK6xn+CzG6VZFWEH1FRNR8frY Vy8rLrydIm+cFlKFlCLmqsUwzuKWRju9npuSwTRg== X-Google-Smtp-Source: AGHT+IFh1XagyJEx0F1fl42M6ZR0Wj4NxvObQ3ivZ1f01Bo1t+CbWymKjZgAxkMXsD1rOBMKGUXsYb1g2tgQFE9H/qk= X-Received: by 2002:a05:6a00:279f:b0:73d:fa54:afb9 with SMTP id d2e1a72fcca58-73e32fa7f00mr335020b3a.7.1745510611109; Thu, 24 Apr 2025 09:03:31 -0700 (PDT) MIME-Version: 1.0 References: <20250418174959.1431962-1-surenb@google.com> <20250418174959.1431962-8-surenb@google.com> <6ay37xorr35nw4ljtptnfqchuaozu73ffvjpmwopat42n4t6vr@qnr6xvralx2o> In-Reply-To: From: Andrii Nakryiko Date: Thu, 24 Apr 2025 09:03:17 -0700 X-Gm-Features: ATxdqUFS9FF0qHPJa3gvJuLiwcl_K5IeYr0KszQdGsNHpX8aM_Xv51piGS8AMas Message-ID: Subject: Re: [PATCH v3 7/8] mm/maps: read proc/pid/maps under RCU To: Suren Baghdasaryan Cc: "Liam R. Howlett" , akpm@linux-foundation.org, lorenzo.stoakes@oracle.com, david@redhat.com, vbabka@suse.cz, peterx@redhat.com, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, paulmck@kernel.org, shuah@kernel.org, adobriyan@gmail.com, brauner@kernel.org, josef@toxicpanda.com, yebin10@huawei.com, linux@weissschuh.net, willy@infradead.org, osalvador@suse.de, andrii@kernel.org, ryan.roberts@arm.com, christophe.leroy@csgroup.eu, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 70E801C000A X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: 9xu1k13sgbcscqpnm9ns4eem1id976cn X-HE-Tag: 1745510612-586198 X-HE-Meta: U2FsdGVkX1+OrK6vBfxospF5Y5aKEBYZhna5Q/VP45bdLpvyYREd2iXkwr/jFRiXFTHjnFFmLgZT9xtBIPxGXK388EVS5RwuGH9xV7Z1k9zLFbes1l2CItKVDL8Pnxoy5Umi8SJOmxqzEGsvLQzbCk8j1vmmZ4GT5XYRWQrsG1bb3T/7o4kwvhBlchH04nz1VOXofzQ9CaTW7EtpAO6Kj48cGQMpJ/k1cu0ngWXCOF6bCwBln0W8MsnjwtsUOQ9aWxZnN84rvS8gB9S1O1884Z+8qY4XDsWRnXS8JCSC9t/r93AD0a9XPwpFI5y7fbqTU/nARQXkGCgcPNsiq68pyZyd9U0H8dt2yAWtIV0FnMT7/Q2e+CviOtcGd/QJoUy4lFlZXX19PzUPurSQHfRxF8sismYXgY4onnWCXtTPw9dy2Xqdo17gTJzlpJcZFPJtg+Tv8r2k84e9HoHDF+R8Vw2rQ82RJUQxvXyRKkqKkyj8Fzn8nifwY9wo9oaS4tPV9ejxbEDvxId+8Z1uAXWXf2H1r2zTIyXaspd55AN3+o1aLOOhqY8nasF4QTw54u2jSlW7T52y1OtSdfdF/nvZTmXXv7jISmiQgErkvYXmUtXmJ6p72Q9Iq29nN+PkRNUwb/W9maSkqMSS5TMXJbbIg3nMm40ELgXgSm2ucPEW0bx7Z+ZGvL2otWw1S79C7RHPC7VRjos2dwp7R5TGJmwlAJAsMEfM0LpvwCEJRm+i/mKBL1O2n7o/SRWlgLaKAFm3aLBMcfBpaL/uOz/d5QW1ERI1B7VlzoxZvDxb03scK0OrRWaLcsAoAzDcxPWJT4NrF9fQepVUxc+FlBD37DJ0usI4jxfxoQ+NGzJgvQAYY3SXWoc0gzCrbhpILhWUbuvPRnf2QSf806G0EqTBnadEDSBi0+YHROHt0RAsTUdcf0I0KiBnp1hmPnKSWxh09R5f2MTiSRz3zuOBMbG3GN6 c4qf/ywf pA/6f18WJmKdq3VZvC0+oLwEi7F6jBxhBlvl+zEpM9HeWH+UOj28kvfMf167nrBubiiSN2wXlwFWyCUcT9UXEBttYNkLulDqk0QsvHUJoOtx/2PrR1esEN/pHL8Y+JO+TTVG312Y6yo0TMWeu/i+vJ65llx0OGeJT+dqx0mBUASNPQHN8+91etFt3CSZC47enZZ4qjyHtTkyeuElyCsmHDEVYXmaw/CVAy1eloH7SaqLFyS3A7fU0Pf7pWzDT/x2AH75DBK510tIBwekqKOUbF3opge+0fdKJTEz2w7Xp8qtjwJsxNCFP18wooTswO2oJs3HLP66yaGY2bo0vtS0apsjo1g5uTuQiL9RH09fBqHyEeHXOlEToDWAHyar0J06yr14b73gzEU7y1DNg5fEy1viNOQIVvWCeuWtDy630f4nOz68= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Apr 24, 2025 at 8:20=E2=80=AFAM Suren Baghdasaryan wrote: > > On Wed, Apr 23, 2025 at 5:24=E2=80=AFPM Liam R. Howlett wrote: > > > > * Andrii Nakryiko [250423 18:06]: > > > On Wed, Apr 23, 2025 at 2:49=E2=80=AFPM Suren Baghdasaryan wrote: > > > > > > > > On Tue, Apr 22, 2025 at 3:49=E2=80=AFPM Andrii Nakryiko > > > > wrote: > > > > > > > > > > On Fri, Apr 18, 2025 at 10:50=E2=80=AFAM Suren Baghdasaryan wrote: > > > > > > > > > > > > With maple_tree supporting vma tree traversal under RCU and vma= and > > > > > > its important members being RCU-safe, /proc/pid/maps can be rea= d under > > > > > > RCU and without the need to read-lock mmap_lock. However vma co= ntent > > > > > > can change from under us, therefore we make a copy of the vma a= nd we > > > > > > pin pointer fields used when generating the output (currently o= nly > > > > > > vm_file and anon_name). Afterwards we check for concurrent addr= ess > > > > > > space modifications, wait for them to end and retry. While we t= ake > > > > > > the mmap_lock for reading during such contention, we do that mo= mentarily > > > > > > only to record new mm_wr_seq counter. This change is designed t= o reduce > > > > > > > > > > This is probably a stupid question, but why do we need to take a = lock > > > > > just to record this counter? uprobes get away without taking mmap= _lock > > > > > even for reads, and still record this seq counter. And then detec= t > > > > > whether there were any modifications in between. Why does this ch= ange > > > > > need more heavy-weight mmap_read_lock to do speculative reads? > > > > > > > > Not a stupid question. mmap_read_lock() is used to wait for the wri= ter > > > > to finish what it's doing and then we continue by recording a new > > > > sequence counter value and call mmap_read_unlock. This is what > > > > get_vma_snapshot() does. But your question made me realize that we = can > > > > optimize m_start() further by not taking mmap_read_lock at all. > > > > Instead of taking mmap_read_lock then doing drop_mmap_lock() we can > > > > try mmap_lock_speculate_try_begin() and only if it fails do the sam= e > > > > dance we do in the get_vma_snapshot(). I think that should work. > > > > > > Ok, yeah, it would be great to avoid taking a lock in a common case! > > > > We can check this counter once per 4k block and maintain the same > > 'tearing' that exists today instead of per-vma. Not that anyone said > > they had an issue with changing it, but since we're on this road anyway= s > > I'd thought I'd point out where we could end up. > > We would need to run that check on the last call to show_map() right > before seq_file detects the overflow and flushes the page. On > contention we will also be throwing away more prepared data (up to a > page worth of records) vs only the last record. All in all I'm not > convinced this is worth doing unless increased chances of data tearing > is identified as a problem. > Yep, I agree, with filling out 4K of data we run into much higher chances of conflict, IMO. Not worth it, I'd say. > > > > I am concerned about live locking in either scenario, but I haven't > > looked too deep into this pattern. > > > > I also don't love (as usual) the lack of ensured forward progress. > > Hmm. Maybe we should add a retry limit on > mmap_lock_speculate_try_begin() and once the limit is hit we just take > the mmap_read_lock and proceed with it? That would prevent a > hyperactive writer from blocking the reader's forward progress > indefinitely. Came here to say the same. I'd add a small number of retries (3-5?) and then fallback to the read-locked approach. The main challenge is to keep all this logic nicely isolated from the main VMA search/printing logic. For a similar pattern in uprobes, we don't even bother to rety, we just fallback to mmap_read_lock and proceed, under the assumption that this is going to be very rare and thus not important from the overall performance perspective. > > > > > It seems like we have four cases for the vm area state now: > > 1. we want to read a stable vma or set of vmas (per-vma locking) > > 2. we want to read a stable mm state for reading (the very short named > > mmap_lock_speculate_try_begin) > > and we don't mind retrying on contention. This one should be done > under RCU protection. > > > 3. we ensure a stable vma/mm state for reading (mmap read lock) > > 4. we are writing - get out of my way (mmap write lock). > > I wouldn't call #2 a vma state. More of a usecase when we want to read > vma under RCU (valid but can change from under us) and then retry if > it might have been modified from under us. > > > > > Cheers, > > Liam > >