From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 196F4C83F17 for ; Tue, 15 Jul 2025 20:13:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B26F96B0088; Tue, 15 Jul 2025 16:13:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AD7C36B0089; Tue, 15 Jul 2025 16:13:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9C62C6B008C; Tue, 15 Jul 2025 16:13:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 8A7AC6B0088 for ; Tue, 15 Jul 2025 16:13:51 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 11FF5B6B4E for ; Tue, 15 Jul 2025 20:13:51 +0000 (UTC) X-FDA: 83667599862.19.FA8287F Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) by imf10.hostedemail.com (Postfix) with ESMTP id 1C4B4C000A for ; Tue, 15 Jul 2025 20:13:48 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Hb7rhuIv; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf10.hostedemail.com: domain of surenb@google.com designates 209.85.160.182 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752610429; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PnqN+9qSxUNQ+71q6fpzACj+u3aM7v+GPckeC6rcpuk=; b=0GIkKZOLyQcvyxaSrsxN+idGPen52Ceo36+byBWxAWLv59OxpS6J+7EHUhM4XYIxGOBeBb 1acXKHxgCkUxkfxmv6RgbObzbF1FjP1iAEQNtAegQta0s7STSVUXadvkWn6VFHIATxtsCd wkzMs66OAs3JdFCy0/XWz7Jaxm9ZaHM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752610429; a=rsa-sha256; cv=none; b=UJ+L0o1o3ODx9OCmtXS6qyW8hRbXQGzAv58glsI2xTs/ic1wkCAy+hQA8eJ/bnw4aZ+BIM LqwnwJ4GxemjT4aXqypI7pnmt47qdLo22wEfWPtMmZrnq4jpBy5s1tQGfUpA9PWBB7HmBm ZZd5p4r/KZVBUkutxxEUqLAR1DllcQw= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Hb7rhuIv; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf10.hostedemail.com: domain of surenb@google.com designates 209.85.160.182 as permitted sender) smtp.mailfrom=surenb@google.com Received: by mail-qt1-f182.google.com with SMTP id d75a77b69052e-4ab3ad4c61fso117551cf.0 for ; Tue, 15 Jul 2025 13:13:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752610428; x=1753215228; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=PnqN+9qSxUNQ+71q6fpzACj+u3aM7v+GPckeC6rcpuk=; b=Hb7rhuIvMohRtyTf3aOe82ntyfBKbJv4R2eaq8tb/DnZxQH61KrPmMVxyNZZBz2zWb FLOCLnbUKVe9+R4i/5+fXgHBAWbnXUSw0Lq/iBfuLFk/xp7Wr2JYJDoK9qUF9abeovTM hoYJoJjieqfZ75swKqfN1M7yqmi2xyNzx7i4YCvvhii+YhqP1aexJ+pONU/h/nSRj7an IFrPXR75rp1PymJEHoCZuh5xbrAtn7LfZzg9gzOWyPCJO/bSUiKh346RpJLi3amnI2Z6 Gg+YLVvpwGCDS1A3TFPHazTvycUS/+AQP4cZYAaKKoIlzDxzvotqSzZPSlWk8qWtald5 ZcHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752610428; x=1753215228; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PnqN+9qSxUNQ+71q6fpzACj+u3aM7v+GPckeC6rcpuk=; b=HbOaaPkPNZca0hvqrnE8iDe9yvFH2j+LHYu0wB4z4UnG1+QBF2sEJxEaGhUNqFuWQt QXbaK4vEUcGKw4Wz5IeaN3mvKLIKRLsQIwAUP3fk5xm/WWxFW59hpC+vLdgQwaoaYAJA qb/2o74dcGsBS7cXxLqlgqohb4pffXRDiR8usCrNg2KJagbZ4R1LKejLHQEJ9H5P0wWF dpyHnU4mP5zbwzYLMZlRKv39rch0t4+QNbiis5F6ApH3IctRYZ1SstF1uDpq7c//flw7 f5hNMez3GeIhuY02yVLJxiMQf1eee7Ck/y6NbLz7fJzDvRHy3Ii0xbvK0MnE8WD2GVYk sirA== X-Forwarded-Encrypted: i=1; AJvYcCURRvv1IP0Te4zHoA52N6M4FPbq7ou4W/O+Yvp06K77RkbqQlfzlBbD1tVdcuV0XmmCspViUmcn+g==@kvack.org X-Gm-Message-State: AOJu0YxUxiTOsJVEePgznG6mB+ajJU+RRsjIhRq1W/uKH5ZJBzOHwA8f 2+e9BaDNr0Jtvs/Fd6mWuKhhTBlCW2L08gbCQ5IN/pl9j09y3kKT0Qlb5h4uQIs4Avv0pKVQCFb e/NfPktZGckJxylYyOnma/V4+80Sex18tG9Fx/b7i X-Gm-Gg: ASbGnctda4d3cKu7g2XYYw4Ma029G6PDEC3D1mXWOHjUP+y+x0XZbIOv2LB5UyAVIIp VeLNplRlosWHPphKWyvPUhF7fRDkdGbUm5UWwUK2r1JwSGQJJu6Z5IevHZgmnuOuovoaquEAE7i GBeVECRO69zC2Fofla6/DGdWr0+Le23ZdOoNn3IGwUzy1Ds9p5pQZZ+YQ9qRvdhXY05xKg5Fufd A4QEYpft36ghTxsgFsJGa+yqnYrS0gFyuKs X-Google-Smtp-Source: AGHT+IHz0yCVAXbdoe681ITqxZohUJiYZlX8ylfbpQbm8VmW93+o3sQapqeMltiltwk3XwTrVWx4XRtCYFhEMy9XuIg= X-Received: by 2002:ac8:7dd1:0:b0:497:75b6:e542 with SMTP id d75a77b69052e-4ab92567edemr844891cf.10.1752610427738; Tue, 15 Jul 2025 13:13:47 -0700 (PDT) MIME-Version: 1.0 References: <20250704060727.724817-1-surenb@google.com> <20250704060727.724817-8-surenb@google.com> <3b3521f6-30c8-419e-9615-9228f539251e@suse.cz> <5ec10376-6a5f-4a94-9880-e59f1b6d425f@suse.cz> In-Reply-To: <5ec10376-6a5f-4a94-9880-e59f1b6d425f@suse.cz> From: Suren Baghdasaryan Date: Tue, 15 Jul 2025 13:13:36 -0700 X-Gm-Features: Ac12FXx1qtrIEv4cPL5REuI2yEHzLi9rHCGSMPDn5j2m4o0efnabufB8WvT1p3k Message-ID: Subject: Re: [PATCH v6 7/8] fs/proc/task_mmu: read proc/pid/maps under per-vma lock To: Vlastimil Babka Cc: "Liam R. Howlett" , Lorenzo Stoakes , akpm@linux-foundation.org, david@redhat.com, peterx@redhat.com, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, paulmck@kernel.org, shuah@kernel.org, adobriyan@gmail.com, brauner@kernel.org, josef@toxicpanda.com, yebin10@huawei.com, linux@weissschuh.net, willy@infradead.org, osalvador@suse.de, andrii@kernel.org, ryan.roberts@arm.com, christophe.leroy@csgroup.eu, tjmercier@google.com, kaleshsingh@google.com, aha310510@gmail.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 1C4B4C000A X-Stat-Signature: c9ebmi6t64kcauxdat8tb9oii6nijeq4 X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1752610428-207565 X-HE-Meta: U2FsdGVkX19JGW888OA7VGwS04gkdKyKnJmxfqmmeHXlbc/UYrO64pYHyghlC6cltYcS7noUIQrRnXJFiRMAs7T+fp/JRsMoAL5iRWzdy1kRcNgHlw+fZmQf3LjoC0gpqSyR/4pImLp2yv1b5VG74pSCxCMUOE0W38SvbejmC7onZPms5a06pjemmgRYVRjXhyJYCrmF6jcKRFryDAAmRjxDh0LgdLFlAG3w0HJ4ACib7aqwA3P4z4Ch8X8NlTJjrrFwG2DJ7ucezbwUxMTafhHZYN9f7rSGpLi2qk1raR6xXez3j337DU3Tyqfs8pRtYc+Lm3y4/flZo10JvOLviHi5RY7VduZHGxxe+Ca7FRDQ91vFUK5QkLEEv9jyRvk8s+MjXR0LTICP2qoToGnIabdRfS0zySKlOcqVqzgjiKiby1u3Cu22cnlEjGXL74WwmHYK7xb4DiezXLNhcIVXLRB2U/lgUNl1NtewyDY1De3nosnXhKErSulJe+oX5pJYHqnzU60IOxUabexmGTR0zaRGP1MvWH/BSzDDfIkOHBgY7MfaNoyZA/qzBzWPQa3af9NZRHcfuGk5u01I78mVJfhKhg5U4YxNoyvnoP+5OA7k2SvXEi8/QLk9T00DsTylbrnk78iesiTKc83nc2q5fh2Fo/5M5YkDYCYlFrAJpl4n+sV4xpRTaHA7L9OZCduwc3P4m3+tYePXGCu6hyLmdJjCuB1M8IdpKC+UI0IHOLNVxGTQTqa7TU59YGNzbjjriy4rqZDvaonvtx7R/dAQkfEzOm5pdq06bcPmw6pdlFx4l3m+x3lz1FIPa93EOCR1f65NmmgN9pHL+7T4YY8FUoGBorwFa5nKUw7ZgKRa49EIbF8tVrb7Pg38yCW+mrVsA9DeQoVsxJM/qtxeoZG3SY+kRjyJwok7Ptx3ztPEBvArCjSmOXAGryMM0fWjhWof0X87Qvqv7R9oDMLbZbG Ag7Whr4t aKPRjti/rEmPixA5RcdrQVX+G4xYKkIqlHsyUnn7sUf7cOy+IFlU76PvQjcFObrkDoHgpLXHBooBsk9XZ40tv+u5Sawy3I31XyFMrmljZ861lJtyZ7z/SPYmx1z0GdPlzUQBcAjOTixFooNCsCszwMBer6hXyMnA2HHfhsmgn+f++6iP83Wr26SVRUeMSacL6CLoSaK126VAaPRNsfxVyEDqAw46hE66SZ8uHRPqnHr2kcmH6luxp3goHqqxQbdPtAuuR/TkaKS0dU2oVfQE8v6ebR+qrh9ef4tFJBYjPjABGaM4afJ2zdxurozxsArpkHq+aEajC/lPowJXajrYcnxMnoAzfaKXf7ruK1BD6RwWVsbyuRBp6F1dFJkjTIDuAcaSFCvEhumdG3TazDg4wreiN8g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jul 15, 2025 at 1:16=E2=80=AFAM Vlastimil Babka wr= ote: > > On 7/10/25 19:02, Suren Baghdasaryan wrote: > > On Thu, Jul 10, 2025 at 12:03=E2=80=AFAM Suren Baghdasaryan wrote: > >> > >> On Wed, Jul 9, 2025 at 10:47=E2=80=AFAM Suren Baghdasaryan wrote: > >> > > >> > On Wed, Jul 9, 2025 at 4:12=E2=80=AFPM Liam R. Howlett wrote: > >> > > > >> > > * Suren Baghdasaryan [250709 11:06]: > >> > > > On Wed, Jul 9, 2025 at 3:03=E2=80=AFPM Vlastimil Babka wrote: > >> > > > > > >> > > > > On 7/9/25 16:43, Suren Baghdasaryan wrote: > >> > > > > > On Wed, Jul 9, 2025 at 1:57=E2=80=AFAM Vlastimil Babka wrote: > >> > > > > >> > >> > > > > >> On 7/8/25 01:10, Suren Baghdasaryan wrote: > >> > > > > >> >>> + rcu_read_unlock(); > >> > > > > >> >>> + vma =3D lock_vma_under_mmap_lock(mm, iter, addres= s); > >> > > > > >> >>> + rcu_read_lock(); > >> > > > > >> >> OK I guess we hold the RCU lock the whole time as we tra= verse except when > >> > > > > >> >> we lock under mmap lock. > >> > > > > >> > Correct. > >> > > > > >> > >> > > > > >> I wonder if it's really necessary? Can't it be done just in= side > >> > > > > >> lock_next_vma()? It would also avoid the unlock/lock dance = quoted above. > >> > > > > >> > >> > > > > >> Even if we later manage to extend this approach to smaps an= d employ rcu > >> > > > > >> locking to traverse the page tables, I'd think it's best to= separate and > >> > > > > >> fine-grain the rcu lock usage for vma iterator and page tab= les, if only to > >> > > > > >> avoid too long time under the lock. > >> > > > > > > >> > > > > > I thought we would need to be in the same rcu read section w= hile > >> > > > > > traversing the maple tree using vma_next() but now looking a= t it, > >> > > > > > maybe we can indeed enter only while finding and locking the= next > >> > > > > > vma... > >> > > > > > Liam, would that work? I see struct ma_state containing a no= de field. > >> > > > > > Can it be freed from under us if we find a vma, exit rcu rea= d section > >> > > > > > then re-enter rcu and use the same iterator to find the next= vma? > >> > > > > > >> > > > > If the rcu protection needs to be contigous, and patch 8 avoid= s the issue by > >> > > > > always doing vma_iter_init() after rcu_read_lock() (but does i= t really avoid > >> > > > > the issue or is it why we see the syzbot reports?) then I gues= s in the code > >> > > > > quoted above we also need a vma_iter_init() after the rcu_read= _lock(), > >> > > > > because although the iterator was used briefly under mmap_lock= protection, > >> > > > > that was then unlocked and there can be a race before the rcu_= read_lock(). > >> > > > > >> > > > Quite true. So, let's wait for Liam's confirmation and based on = his > >> > > > answer I'll change the patch by either reducing the rcu read sec= tion > >> > > > or adding the missing vma_iter_init() after we switch to mmap_lo= ck. > >> > > > >> > > You need to either be under rcu or mmap lock to ensure the node in= the > >> > > maple state hasn't been freed (and potentially, reallocated). > >> > > > >> > > So in this case, in the higher level, we can hold the rcu read loc= k for > >> > > a series of walks and avoid re-walking the tree then the performan= ce > >> > > would be better. > >> > > >> > Got it. Thanks for confirming! > >> > > >> > > > >> > > When we return to userspace, then we should drop the rcu read lock= and > >> > > will need to vma_iter_set()/vma_iter_invalidate() on return. I th= ought > >> > > this was being done (through vma_iter_init()), but syzbot seems to > >> > > indicate a path that was missed? > >> > > >> > We do that in m_start()/m_stop() by calling > >> > lock_vma_range()/unlock_vma_range() but I think I have two problems > >> > here: > >> > 1. As Vlastimil mentioned I do not reset the iterator when falling > >> > back to mmap_lock and exiting and then re-entering rcu read section; > >> > 2. I do not reset the iterator after exiting rcu read section in > >> > m_stop() and re-entering it in m_start(), so the later call to > >> > lock_next_vma() might be using an iterator with a node that was free= d > >> > (and possibly reallocated). > >> > > >> > > > >> > > This is the same thing that needed to be done previously with the = mmap > >> > > lock, but now under the rcu lock. > >> > > > >> > > I'm not sure how to mitigate the issue with the page table, maybe = we > >> > > guess on the number of vmas that we were doing for 4k blocks of ou= tput > >> > > and just drop/reacquire then. Probably a problem for another day > >> > > anyways. > >> > > > >> > > Also, I think you can also change the vma_iter_init() to vma_iter_= set(), > >> > > which is slightly less code under the hood. Vlastimil asked about= this > >> > > and it's probably a better choice. > >> > > >> > Ack. > >> > I'll update my series with these fixes and all comments I received s= o > >> > far, will run the reproducers to confirm no issues and repost them > >> > later today. > >> > >> I have the patchset ready but would like to test it some more. Will > >> post it tomorrow. > > > > Ok, I found a couple of issues using the syzbot reproducer [1] (which > > is awesome BTW!): > > 1. rwsem_acquire_read() inside vma_start_read() at [2] should be moved > > after the last check, otherwise the lock is considered taken on > > vma->vm_refcnt overflow; > > I think it's fine because if the last check fails there's a > vma_refcount_put() that includes rwsem_release(), no? Ah, yes, you are right. This is fine. Obviously trying to figure out the issue right before a flight is not a good idea :) > > > 2. query_matching_vma() is missing unlock_vma() call when it does > > "goto next_vma;" and re-issues query_vma_find_by_addr(). The previous > > vma is left locked; > > > > [1] https://syzkaller.appspot.com/x/repro.c?x=3D101edf70580000 > > [2] https://elixir.bootlin.com/linux/v6.15.5/source/include/linux/mm.h#= L747 > > > > After these fixes it's much harder to fail but I still get one more > > error copied below. I will continue the investigation and will hold > > off reposting until this is fixed. That will be next week since I'll > > be out of town the rest of this week. > > > > Andrew, could you please remove this patchset from mm-unstable for now > > until I fix the issue and re-post the new version? > > Andrew can you do that please? We keep getting new syzbot reports. > > > The error I got after these fixes is: > > I suspect the root cause is the ioctls are not serialized against each ot= her > (probably not even against read()) and yet we treat m->private as safe to > work on. Now we have various fields that are dangerous to race on - for > example locked_vma and iter races would explain a lot of this. > > I suspect as long as we used purely seq_file workflow, it did the right > thing for us wrt serialization, but the ioctl addition violates that. We > should rather recheck even the code before this series, if dangerous ioct= l > vs read() races are possible. And the ioctl implementation should be > refactored to use an own per-ioctl-call private context, not the seq_file= 's > per-file-open context. Huh, I completely failed to consider this. In hindsight it is quite obvious... Thanks Vlastimil, I owe you a beer or two.