From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B35C1C369C2 for ; Thu, 24 Apr 2025 17:39:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A06A26B00A5; Thu, 24 Apr 2025 13:39:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 98E9E6B00CF; Thu, 24 Apr 2025 13:39:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7DF316B00D0; Thu, 24 Apr 2025 13:39:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 5BCF96B00A5 for ; Thu, 24 Apr 2025 13:39:05 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 593F8815CF for ; Thu, 24 Apr 2025 17:39:05 +0000 (UTC) X-FDA: 83369648250.28.01C3D48 Received: from mail-qt1-f175.google.com (mail-qt1-f175.google.com [209.85.160.175]) by imf13.hostedemail.com (Postfix) with ESMTP id 7247C20013 for ; Thu, 24 Apr 2025 17:39:03 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=wSmEl+Tf; spf=pass (imf13.hostedemail.com: domain of surenb@google.com designates 209.85.160.175 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745516343; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AQve8Z1vW/P2YOkExAreJok4GWtTgHefYj/Q8y+Ff4I=; b=IDopP6VcP6KxkFfLQvZ0VI7zsN5Yt+Pi9sKKpZOOhlv1qS55ZLxLJL2ZapB2Yfy6wTQXgX /6kzfeD1edzo03dNWQ50BaQVrTl/6ttAS1Daf7X3kN8PYhngr3GqiWyvR7mnHxn/D0UlMV WeTXCqmHfcMQA0DTewGYtOKV21xz5IE= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=wSmEl+Tf; spf=pass (imf13.hostedemail.com: domain of surenb@google.com designates 209.85.160.175 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745516343; a=rsa-sha256; cv=none; b=p9EQuCFh6GOBFltf7s/D9fihU4ujEURxOLGy2Li8yZAyPb/OcL+ke4/LVjChCm5VrQzlCH ICEFhbeOR8dYz4U7FSIzIqcKHFcWADBWNCM1vutgGZMYoJ2bkFG5OfI8b9xsTJBcIhXDIr QpNPj+NtCiN+4En3f2tIw8RWBEuwh70= Received: by mail-qt1-f175.google.com with SMTP id d75a77b69052e-47e9fea29easo21431cf.1 for ; Thu, 24 Apr 2025 10:39:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1745516342; x=1746121142; darn=kvack.org; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=AQve8Z1vW/P2YOkExAreJok4GWtTgHefYj/Q8y+Ff4I=; b=wSmEl+TfsLYdkL/lGfUUBM0pkfZovC+XCHqEFXGUuI0zOqfq5SpFI6QwuukfdqRAQo /7czOA0gwb3RTM+lJf9hRygZt4VjnWXp+mRJNSA/qtM5HBL4lTRy8fBaJhtbB/CElMA1 sqT38rjgsT1UegNIzDA6GOCUMxFlMx89AI4uGDFeBMVBK105fUbCsyqwETbKvg06trl6 PfPIPeV3Fo3EnPyLXM7bkBzE5SdYMxUmQvPJ19hy/titH8ocEi22/9XCZkMsOIdCO1ac hFMojAVfRirWv3gRdjlGj1zPzqAYELrXPLjm33q9YKVnqJo2CIeEADmxMILznrGthpUn x0iA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745516342; x=1746121142; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AQve8Z1vW/P2YOkExAreJok4GWtTgHefYj/Q8y+Ff4I=; b=iGHtI64bzEmEI8Naj25H1epncF1ulfrjF0uBPxGLi7vdQjHpv35hHrq0WMkgZRiJq2 OWKTH8Sj6vu05z9lVX+pfRtJW74BTMeLtkpG+DlzojtOqCyblpsOiKPdiRDWun5BvBEo TVPh6d5Z+leyXmT+VBjbcpYe8yum2e7LRN/qEsfusvWbHm79UFQGELbKKiEcsJ8WBvSr R+rTVf8oJRjW4UQ1jfjjiBddpnkO/tsBt1to/yWgLEP7Cmdq8SYl+GsdKTrtCBH2tw6d PfkbSdgnavQL6SyaNgX5NjCdFaN5O+s/dzwFcqZWwS9Vv3bLg9lTU3yiYIHAH9oRTWb5 aqrg== X-Forwarded-Encrypted: i=1; AJvYcCWtHI0ZADeGP3/Us7lSQs/WekX1Vk0uG1NhLfzQ678evXpoSetWq6nKEvR4vkKN3axTvfolm5CSLQ==@kvack.org X-Gm-Message-State: AOJu0YxeyCtt9+HaqkIBKhWCI+JsqFGSP1ymGnN2KLU168FqdRzZbP5X gjHnNvTBqTawBSijrtpxumWP/TuZIikOD9oH4CKdFgpWXq4jbtuLIGY19sx2tsjqOEW49hGPL6y 3CCNqVILARq6HyT8JhjJwf/LZWR9xovXfoHyQ X-Gm-Gg: ASbGncup9+MORC0DzYZspF3ask1QkQVTGQskotbZAtBKf/iG+8AW92ByHRZ6xIaBUH4 lsxBu4XqwiZJfutvP+MtFMiXS/Eexj0XBhfMHKJeh3fHOa5LkDAkOBY/Lp3nmc9TSpAQpfYYYKh G7T4pcgXQKKyqOfKIlzC2I0LrHgLMLQABIOgknmnxgP2wUXDZwAGSl X-Google-Smtp-Source: AGHT+IEaeLExckdIth/b7h0UkPH6TCrtID8z1DnhzMSXKNMJB5xFB7I7YwWnciAgMdM0WUca1IkmZJFJkD5T7GZBSZ8= X-Received: by 2002:a05:622a:5a95:b0:47d:cd93:5991 with SMTP id d75a77b69052e-47ea4e482fcmr4499751cf.21.1745516342030; Thu, 24 Apr 2025 10:39:02 -0700 (PDT) MIME-Version: 1.0 References: <20250418174959.1431962-1-surenb@google.com> <20250418174959.1431962-8-surenb@google.com> <6ay37xorr35nw4ljtptnfqchuaozu73ffvjpmwopat42n4t6vr@qnr6xvralx2o> In-Reply-To: From: Suren Baghdasaryan Date: Thu, 24 Apr 2025 10:38:50 -0700 X-Gm-Features: ATxdqUHNYGXmEC72FqW87V5mUP50xMoSw_i6A76PXBjs9grB8Rv_KLgtJdCe72k Message-ID: Subject: Re: [PATCH v3 7/8] mm/maps: read proc/pid/maps under RCU To: "Liam R. Howlett" , Andrii Nakryiko , Suren Baghdasaryan , akpm@linux-foundation.org, lorenzo.stoakes@oracle.com, david@redhat.com, vbabka@suse.cz, peterx@redhat.com, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, paulmck@kernel.org, shuah@kernel.org, adobriyan@gmail.com, brauner@kernel.org, josef@toxicpanda.com, yebin10@huawei.com, linux@weissschuh.net, willy@infradead.org, osalvador@suse.de, andrii@kernel.org, ryan.roberts@arm.com, christophe.leroy@csgroup.eu, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: u1m5iehy8hd5oa747a13kmfmh6sjif9i X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 7247C20013 X-Rspam-User: X-HE-Tag: 1745516343-88472 X-HE-Meta: U2FsdGVkX1+5uUd5Zgn/BjQbuvK6DhkwCBISedcdBcOrAr0is7FXaKzaD5njEGBNuD/snKSaDLXqK6F9BoCS1coHFBb/Wp2osX9WertqRtww967kJNQuqclXwqZFEik4u8GcdNO+sf3jggxeDBeDYmhDOr+4OA1usAJvZsBgWmbDrexa2IdYcqiqhOdEXs5jVdkVjlCGXaI7LqdSSTaRPNTHCVG5agKc+VyuhHehHftY5JOkON6Q47Q65avP1Z4QAFtjbRnjUuUazbvtqkMZmrgyr+QtdE1c56jwyL3OTVe64ch/yK/eS5SbzWOuFi92kQC2vdmdmpo4jJ/uQ3DJ/KU+skplENkqAgwokx4Qi0lS7wzkLPedWr0ue17+TFy7/5jiQdke+eQ0q95OjvVddECWAYNMKYa0bFbXV/+Yw5fB1kLXT2ipi8AG9HPl45fHqz2q5BUe0eLxkbKc7OA54HGeVakvbOYnuK5wzzSOGm5y/pteG0ZWREU0O7H/uV7gk/AWk5SIIVssEHK+GvAtGChblaWYCH8dNSU6zzaK6GWxPiM3HYOTH3tz/2duPQ1SZ7eU5XovXsmHVdE95EmiHmFk9UD+N4ECJUjWAS/Tm68XZFlxXxFFh08snQNlCjsz4XjvrqNMICS7QZh0C5JL4jVpX2kGcypS+8DPtAW+McdQQqT2IthhgM1trHXxzD8ZH/X+w7xIDqK8ld3Clk8TdGxp0XmbA1AAjbtCpNwAUE22P5gICHql0ry0tvnkW7LJquIxpH8lvcJi0mo6lqV05FhT4ipSTAURIcIc57K3+FSqz9Vj9yVuoRPezrjveemF4KB/kR9pGIv6CgcrlJewu7Q+ULF+E8Q/chBSsS8V+Jyhr6J9+Sw5KXtv6smO9LEbBnO2qRqBhgVLZsQwbpJVbJ0Jw1BwQ57PlZ6lqpWXYs/E2Y/hoWPUfQTB+IFtHjsfiFxV9zMHJpkXckkvx+X tyjZ5cRz CJwrUtTSa8FBA/85uwYJul2mnhyHeZW51hC3IHxjNgHwrL1nXXbhZnR7wvk9Jj+LqB3XmkMlAGG22Aw/ET1sV1UfIpEKO9WgOMxSpP0kBslCLh4Ddus6EYgXfAc3J2ye8uRHGEz6wBafcL/YNkSNINvF3MZCTXAe+4sKVc9/Y1n6mJEshsYc/Oc627BirGzOb67xGFv4dt1JFSW/xgrS2iRqF59pBQQL+LOfxORf6PEg+KTD7tx9BrJAsU9xvh3Krfmb8E9lsTgtrCY3dqh6CbLNsUnUuIy2glGpuysHa7gmMLZiNz9EjDVkj9sJbr5k1NRu0uXBR8ApcWV1D1chu8o0pDQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Apr 24, 2025 at 9:42=E2=80=AFAM Liam R. Howlett wrote: > > * Andrii Nakryiko [250424 12:04]: > > On Thu, Apr 24, 2025 at 8:20=E2=80=AFAM Suren Baghdasaryan wrote: > > > > > > On Wed, Apr 23, 2025 at 5:24=E2=80=AFPM Liam R. Howlett wrote: > > > > > > > > * Andrii Nakryiko [250423 18:06]: > > > > > On Wed, Apr 23, 2025 at 2:49=E2=80=AFPM Suren Baghdasaryan wrote: > > > > > > > > > > > > On Tue, Apr 22, 2025 at 3:49=E2=80=AFPM Andrii Nakryiko > > > > > > wrote: > > > > > > > > > > > > > > On Fri, Apr 18, 2025 at 10:50=E2=80=AFAM Suren Baghdasaryan <= surenb@google.com> wrote: > > > > > > > > > > > > > > > > With maple_tree supporting vma tree traversal under RCU and= vma and > > > > > > > > its important members being RCU-safe, /proc/pid/maps can be= read under > > > > > > > > RCU and without the need to read-lock mmap_lock. However vm= a content > > > > > > > > can change from under us, therefore we make a copy of the v= ma and we > > > > > > > > pin pointer fields used when generating the output (current= ly only > > > > > > > > vm_file and anon_name). Afterwards we check for concurrent = address > > > > > > > > space modifications, wait for them to end and retry. While = we take > > > > > > > > the mmap_lock for reading during such contention, we do tha= t momentarily > > > > > > > > only to record new mm_wr_seq counter. This change is design= ed to reduce > > > > > > > > > > > > > > This is probably a stupid question, but why do we need to tak= e a lock > > > > > > > just to record this counter? uprobes get away without taking = mmap_lock > > > > > > > even for reads, and still record this seq counter. And then d= etect > > > > > > > whether there were any modifications in between. Why does thi= s change > > > > > > > need more heavy-weight mmap_read_lock to do speculative reads= ? > > > > > > > > > > > > Not a stupid question. mmap_read_lock() is used to wait for the= writer > > > > > > to finish what it's doing and then we continue by recording a n= ew > > > > > > sequence counter value and call mmap_read_unlock. This is what > > > > > > get_vma_snapshot() does. But your question made me realize that= we can > > > > > > optimize m_start() further by not taking mmap_read_lock at all. > > > > > > Instead of taking mmap_read_lock then doing drop_mmap_lock() we= can > > > > > > try mmap_lock_speculate_try_begin() and only if it fails do the= same > > > > > > dance we do in the get_vma_snapshot(). I think that should work= . > > > > > > > > > > Ok, yeah, it would be great to avoid taking a lock in a common ca= se! > > > > > > > > We can check this counter once per 4k block and maintain the same > > > > 'tearing' that exists today instead of per-vma. Not that anyone sa= id > > > > they had an issue with changing it, but since we're on this road an= yways > > > > I'd thought I'd point out where we could end up. > > > > > > We would need to run that check on the last call to show_map() right > > > before seq_file detects the overflow and flushes the page. On > > > contention we will also be throwing away more prepared data (up to a > > > page worth of records) vs only the last record. All in all I'm not > > > convinced this is worth doing unless increased chances of data tearin= g > > > is identified as a problem. > > > > > > > Yep, I agree, with filling out 4K of data we run into much higher > > chances of conflict, IMO. Not worth it, I'd say. > > Sounds good. > > If this is an issue we do have a path forward still. Although it's less > desirable. > > > > > > > > > > > I am concerned about live locking in either scenario, but I haven't > > > > looked too deep into this pattern. > > > > > > > > I also don't love (as usual) the lack of ensured forward progress. > > > > > > Hmm. Maybe we should add a retry limit on > > > mmap_lock_speculate_try_begin() and once the limit is hit we just tak= e > > > the mmap_read_lock and proceed with it? That would prevent a > > > hyperactive writer from blocking the reader's forward progress > > > indefinitely. > > > > Came here to say the same. I'd add a small number of retries (3-5?) > > and then fallback to the read-locked approach. The main challenge is > > to keep all this logic nicely isolated from the main VMA > > search/printing logic. > > > > For a similar pattern in uprobes, we don't even bother to rety, we > > just fallback to mmap_read_lock and proceed, under the assumption that > > this is going to be very rare and thus not important from the overall > > performance perspective. > > In this problem space we are dealing with a herd of readers caused by > writers delaying an ever-growing line of readers, right? I don't know if we have a herd of readers. The problem statement was for a low-priority reader (monitors) blocking a high-priority writer. Multiple readers are of course possible, so I think as long as we can guarantee forward progress we should be ok. Is that reasonable? > > Assuming there is a backup caused by a writer, then I don't know if the > retry is going to do anything more than heat the data centre. > > The readers that take the read lock will get the data, while the others > who arrive during read locked time can try lockless, but will most > likely have a run time that extends beyond the readers holding the lock > and will probably be interrupted by the writer. > > We can predict the new readers will also not make it through in time > because the earlier ones failed. The new readers will then take the > lock and grow the line of readers. > > Does that make sense? Yeah. I guess we could guarantee forward progress if the readers would take mmap_read_lock on contention and produce one page worth of output under that lock before dropping it and continuing with speculation again. If contention happens again it grabs the mma_read_lock and repeats the dance. IOW, we start with speculation, on contention we grab the lock to produce one page and go back to speculation. Repeat until all vmas are reported. OTOH I guess Paul's benchmark would show some regression though... Can't have everything :) > > Thanks, > Liam > >