From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1106C04A94 for ; Thu, 10 Aug 2023 20:20:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0BEA36B0071; Thu, 10 Aug 2023 16:20:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 06ECF6B0072; Thu, 10 Aug 2023 16:20:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E514F6B0074; Thu, 10 Aug 2023 16:20:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D61EF6B0071 for ; Thu, 10 Aug 2023 16:20:23 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 991CC1CA181 for ; Thu, 10 Aug 2023 20:20:23 +0000 (UTC) X-FDA: 81109312326.25.ED56E12 Received: from mail-oo1-f43.google.com (mail-oo1-f43.google.com [209.85.161.43]) by imf02.hostedemail.com (Postfix) with ESMTP id C27A880015 for ; Thu, 10 Aug 2023 20:20:21 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=4DziVMSd; spf=pass (imf02.hostedemail.com: domain of surenb@google.com designates 209.85.161.43 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691698821; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9H3yyG7XVBmPgpZbRel0dQc6e8lzfwXDnebcZHkrY6I=; b=fxSFM4BpTtAhMb4q3NOyGiPLUXLpE+FdwGeubALeEE6LNw9QV3zVKyRIWJGZ1qjSeR6E6a c7pduh53cUWlzrWrEI4mi/3QBOOm2Z6btrTvnrt9qXFwx4Kcx3CLmgE1DG5lQiWF6/PNxw AjQWu7JFL4oeFN4dL0tF02TJYodR5so= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691698821; a=rsa-sha256; cv=none; b=0uJkFzAJNPvNG3AMgCYDXK79PQjjdVXinGKW5C6r/L0h/qP7Cfh1w4C8b9uvlx6dpWuVmV h728pd4HdR2vjDxMGumJCxtfTDKxjtfissDBlx9Hu0naIabx02es7Ddcu92roggizaGB44 bQQmq4Kr9dL7l4ZU8mHTcgvRYn/Siuc= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=4DziVMSd; spf=pass (imf02.hostedemail.com: domain of surenb@google.com designates 209.85.161.43 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-oo1-f43.google.com with SMTP id 006d021491bc7-56c74961e0cso1056885eaf.3 for ; Thu, 10 Aug 2023 13:20:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1691698821; x=1692303621; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=9H3yyG7XVBmPgpZbRel0dQc6e8lzfwXDnebcZHkrY6I=; b=4DziVMSdZ/POSPu0fbZ2zw4SkrSPxT5eMr/Yas7q/11gLiHruaFmAIKQ9ZTemrQ9Yn C42Jke3d/CpzK8B8uF8iZM3m6rkpACZbSKuxeX9hQWrrvclyci0tBl9HhBIzMkeC7V4y +lIMnzW80Pk+28WdR5JrGh+yqSu8cwSuGPey4kEkP6P/lp46cuxdLPnQePeiAXQ97PP/ SAQKbd9MP7TeIfVOgfCmUykDJnpg7ggZV/rRfA53nin8lZJ8E8EOgP4ntIDST7GDjwyU 2LluMDhnyZ9nnTu/ct2Voi72FzsfrLZuX1fxAaQa6youcBJ+UhF8v4h++h3JOi+26aPU GuMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691698821; x=1692303621; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9H3yyG7XVBmPgpZbRel0dQc6e8lzfwXDnebcZHkrY6I=; b=Do1iQq1KExQUYUn3h6hmztSz3+oYRO1T3OxGBeZnYX53kpHxU9Orl45MgtxwQ+DXHw jGKG34UZGyPt5ffUT5HmgfzSD+7FVPfsinJARxZtwoigUDrJtaaScTJhBSu8smJMsKvv vBWMqvFi3uocr4RTTqJ0Un+/lQMX8IMbNqQAfALqHfTTGOKtUGRWL2uxe6pdmEpPbQcf tsqx1CBkwkwN/YzQMJVTbrszV0k2EEZjjYgnfDLE8Ppx5a2MsZGXI/zyCjfGXYeJ1Bh3 QfQHgdWyzWPj2SFJ0BVSjrcm5Bx15npHinR3J6AEdI/5ZWxJNhTuAXv34TIS6ollLiCn jLog== X-Gm-Message-State: AOJu0YxFxr7/ykKBUtqzAg3X5YWVA4rHyZlpFC/x3wz52WOd8oG2Pbo/ 6dudm8UU9mdwHIx/ncPmsp7S4i9fdn2jI3Nl3x940A== X-Google-Smtp-Source: AGHT+IFDXwBUtyS42J7TYsiABMXZD6fg+qqeJ68pmlEsx5vvajDiUvI0of9X7T5czmR1RA3Byjofr33NYO6HIAzGOQ0= X-Received: by 2002:a05:6358:7f1a:b0:134:c1e1:3b08 with SMTP id p26-20020a0563587f1a00b00134c1e13b08mr14840rwn.25.1691698820530; Thu, 10 Aug 2023 13:20:20 -0700 (PDT) MIME-Version: 1.0 References: <20230630211957.1341547-1-surenb@google.com> <0ab6524a-6917-efe2-de69-f07fb5cdd9d2@redhat.com> <01e20a4a-35dc-b342-081f-0edaf8780f51@redhat.com> In-Reply-To: <01e20a4a-35dc-b342-081f-0edaf8780f51@redhat.com> From: Suren Baghdasaryan Date: Thu, 10 Aug 2023 13:20:09 -0700 Message-ID: Subject: Re: [PATCH v7 0/6] Per-VMA lock support for swap and userfaults To: David Hildenbrand Cc: willy@infradead.org, akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@suse.com, josef@toxicpanda.com, jack@suse.cz, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, michel@lespinasse.org, liam.howlett@oracle.com, jglisse@google.com, vbabka@suse.cz, minchan@google.com, dave@stgolabs.net, punit.agrawal@bytedance.com, lstoakes@gmail.com, hdanton@sina.com, apopple@nvidia.com, peterx@redhat.com, ying.huang@intel.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, pasha.tatashin@soleen.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: C27A880015 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 431mkjqcogwtu5j4dadsmknsunm465s7 X-HE-Tag: 1691698821-118641 X-HE-Meta: U2FsdGVkX1+RwelIHzyFvIOmBwLfxU+DEVyozvCB8K0a6dXnX647jRn18g7Zy2y+921bTcFAmJvo85yxoy88ZQN+TJZ8uw/eUdBKx7z6KEiaYh2pUBQeZgNHCWkJ66Z3PVcrafNiZEhSk6U3q3ekRIv7izOH4TU59FvVYSEUNXXk1KuGli0uV5igiR3F+N6rYp3YGBtbA0CKSYP+1+c2POt7lhNDb91gp79bUCtQ552n7Lov7qLvqtMeOy67sdOWsvg8AutkQt2D4tQiIvrKDmwB9T7xTqC7fpuVmKWCZYoy3nRF/rdm3gapSBL1tM9IK29VpUKB8/5ydcuPcgEZCmvnjpwfXMpl5AALBxt7aW6NfAF0FdZyquKsNorby6hCoAJByGXv8mwIofnt16F+bKV+lrr1/7Ey1wcqTTJYr6lInjXhqLPrZjjQwbJ+Cz4ybKCAgs7vbqxW9UNsmEKiimSqHCD6n2Ka9EjOoOEz8aONRhtGE4O18ZD3j9oRbJi8Mj541oPYv7AJIWqdmHxDOXQGx1AUuAp39UwlDFFtks4rjVEcSJjjiSePA9VokQAl/MzpoBxBDPsVz/XJxpOxay3FLbSnooByi7Qwuh/EfKcyPWQidFZ+KYzwLxHDTJNbw9pzDFpetDJtT3r2ku98La8txmckiUYo1/RDZjxDLMhrY40ReZ/3u5/Gg5zyabzihaOSdyP9nHOvwa63lZlUB6cSORGbqKLwJGP4QSUNAmMkgiXY5Jghug09Q0zoAslGoZPqhycBL7vqGIDVQ5VQvNg8856aFg31tudCA2PmZnovMOZi3HvXi2gNv4lrtNTr1EntTR0CjH5XXY/4T0Gzjyh4ozxVyFwG2kDLGkawyGx0lUg6U3jeGWN8vid2umX5tEaDkcQa9TfzAkiaEeOitYjkqFthfVjVtbxoJTjAd6vMV9AjtwJ0pa8IbEf4DR2y0qULhprSSLbFRDALwTW l5IZUnY4 9MjGLkNt33hF0hgE7oXIZTeyPbjONHX95iRVZTzKitI7TrvpoRvXjB6PpEcA5Lm++30498XOQkXaMIf0IlsRTLZrX7/Wgp7JOlJQ138uLREhdQ5E4V5AonHUhvMPZfICChEhuTJvYbutLIFWO8Ag0ncmOiz6dtvjHRsoFwRJfGNv38Q/4mBsUJYLMZhts8kfJHPJIoNYWuy5R9uJ33Hjh9t3BOBfjIft3TXb89Lc19jWDNa6SxYqwOqQC7/xWe/8uMhw+Js16JgF/ZldciuAyZLOB7VBp4UZyTG2h7wZKqLV0nmfmj4C+w1qf3Mojsz35hxRC X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Aug 10, 2023 at 12:41=E2=80=AFAM David Hildenbrand wrote: > > On 10.08.23 08:24, Suren Baghdasaryan wrote: > > On Wed, Aug 9, 2023 at 10:29=E2=80=AFPM Suren Baghdasaryan wrote: > >> > >> On Wed, Aug 9, 2023 at 11:31=E2=80=AFAM Suren Baghdasaryan wrote: > >>> > >>> On Wed, Aug 9, 2023 at 11:08=E2=80=AFAM Suren Baghdasaryan wrote: > >>>> > >>>> On Wed, Aug 9, 2023 at 11:04=E2=80=AFAM David Hildenbrand wrote: > >>>>> > >>>>>>>>> Which ends up being > >>>>>>>>> > >>>>>>>>> VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); > >>>>>>>>> > >>>>>>>>> I did not check if this is also the case on mainline, and if th= is series is responsible. > >>>>>>>> > >>>>>>>> Thanks for reporting! I'm checking it now. > >>>>>>> > >>>>>>> Hmm. From the code it's not obvious how lock_mm_and_find_vma() en= ds up > >>>>>>> calling find_vma() without mmap_lock after successfully completin= g > >>>>>>> get_mmap_lock_carefully(). lock_mm_and_find_vma+0x3f/0x270 points= to > >>>>>>> the first invocation of find_vma(), so this is not even the lock > >>>>>>> upgrade path... I'll try to reproduce this issue and dig up more = but > >>>>>>> from the information I have so far this issue does not seem to be > >>>>>>> related to this series. > >>>>> > >>>>> I just checked on mainline and it does not fail there. > >>> > >>> Thanks. Just to eliminate the possibility, I'll try reverting my > >>> patchset in mm-unstable and will try the test again. Will do that in > >>> the evening once I'm home. > >>> > >>>>> > >>>>>> > >>>>>> This is really weird. I added mmap_assert_locked(mm) calls into > >>>>>> get_mmap_lock_carefully() right after we acquire mmap_lock read lo= ck > >>>>>> and one of them triggers right after successful > >>>>>> mmap_read_lock_killable(). Here is my modified version of > >>>>>> get_mmap_lock_carefully(): > >>>>>> > >>>>>> static inline bool get_mmap_lock_carefully(struct mm_struct *mm, > >>>>>> struct pt_regs *regs) { > >>>>>> /* Even if this succeeds, make it clear we might have slept= */ > >>>>>> if (likely(mmap_read_trylock(mm))) { > >>>>>> might_sleep(); > >>>>>> mmap_assert_locked(mm); > >>>>>> return true; > >>>>>> } > >>>>>> if (regs && !user_mode(regs)) { > >>>>>> unsigned long ip =3D instruction_pointer(regs); > >>>>>> if (!search_exception_tables(ip)) > >>>>>> return false; > >>>>>> } > >>>>>> if (!mmap_read_lock_killable(mm)) { > >>>>>> mmap_assert_locked(mm); <---- gener= ates a BUG > >>>>>> return true; > >>>>>> } > >>>>>> return false; > >>>>>> } > >>>>> > >>>>> Ehm, that's indeed weird. > >>>>> > >>>>>> > >>>>>> AFAIKT conditions for mmap_read_trylock() and > >>>>>> mmap_read_lock_killable() are checked correctly. Am I missing > >>>>>> something? > >>>>> > >>>>> Weirdly enough, it only triggers during that specific uffd test, ri= ght? > >>>> > >>>> Yes, uffd-unit-tests. I even ran it separately to ensure it's not so= me > >>>> fallback from a previous test and I'm able to reproduce this > >>>> consistently. > >> > >> Yeah, it is somehow related to per-vma locking. Unfortunately I can't > >> reproduce the issue on my VM, so I have to use my host and bisection > >> is slow. I think I'll get to the bottom of this tomorrow. > > > > Ok, I think I found the issue. > > Nice! > > > wp_page_shared() -> > > fault_dirty_shared_page() can drop mmap_lock (see the comment saying > > "Drop the mmap_lock before waiting on IO, if we can...", therefore we > > have to ensure we are not doing this under per-VMA lock. > > I think what happens is that this path is racing with another page > > fault which took mmap_lock for read. fault_dirty_shared_page() > > releases this lock which was taken by another page faulting thread and > > that thread generates an assertion when it finds out the lock it just > > took got released from under it. > > I wonder if we could detect that someone releases the mmap lock that was > not taken by that person, to bail out early at the right place when > debugging such issues. Only with certain config knobs enabled, of course. I think that's doable. If we add tags_struct.mmap_locked =3D RDLOCK | WRLOCK | NONE that could set when a task takes the mmap_lock and reset+checked when it's released. Lockdep would also catch this if the release code did not race with another page faulting task (this would be seen as releasing the lock which was never locked). > > > The following crude change fixed the issue for me but there might be a > > more granular way to deal with this: > > > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -3293,18 +3293,18 @@ static vm_fault_t wp_page_shared(struct > > vm_fault *vmf, struct folio *folio) > > struct vm_area_struct *vma =3D vmf->vma; > > vm_fault_t ret =3D 0; > > > > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { > > + pte_unmap_unlock(vmf->pte, vmf->ptl); > > + vma_end_read(vmf->vma); > > + return VM_FAULT_RETRY; > > + } > > + > > I won't lie: all of these locking checks are a bit hard to get and > possibly even harder to maintain. > > Maybe better mmap unlock sanity checks as spelled out above might help > improve part of the situation. > > > And maybe some comments regarding the placement might help as well ;) I think comments with explanations why we bail out would help. I had them in some but probably not all the places. Once the code stabilizes I'll review the results and will add more comments with explanations. Thanks, Suren. > > -- > Cheers, > > David / dhildenb > > -- > To unsubscribe from this group and stop receiving emails from it, send an= email to kernel-team+unsubscribe@android.com. >