From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 088C2C04A6A for ; Thu, 10 Aug 2023 06:24:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 54E9A6B0071; Thu, 10 Aug 2023 02:24:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4FF436B0074; Thu, 10 Aug 2023 02:24:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 39FFF6B0075; Thu, 10 Aug 2023 02:24:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 2A9FA6B0071 for ; Thu, 10 Aug 2023 02:24:32 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E6B601A04E0 for ; Thu, 10 Aug 2023 06:24:31 +0000 (UTC) X-FDA: 81107205942.05.40A94BA Received: from mail-yb1-f172.google.com (mail-yb1-f172.google.com [209.85.219.172]) by imf17.hostedemail.com (Postfix) with ESMTP id 337644000A for ; Thu, 10 Aug 2023 06:24:29 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=Q3t3fVnA; spf=pass (imf17.hostedemail.com: domain of surenb@google.com designates 209.85.219.172 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691648670; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fcRSefl3OZZaOz0guv+13e/oWyf4yL2+dUUAXZrxcbs=; b=5ALxyyKC8Om/gbAM/OaPPfwCbM8eYm15lr0bwZ4F9pQDFDLYkJt5GzuUGEDUuWSXHR2pVX 4dBIneICI+chAGLxlg3bmPRlFZODPsCgjjaild6PQ1EusY3f9I3C0BVIn9rGjN96fbrFHk 59H6PRAZ7S6noVySr2KU+Hxngc7aRwY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691648670; a=rsa-sha256; cv=none; b=SB1LrJRzRdF3WDpCTR2O0o8QWlCGBJt911zgWjkK5Osnr6swwdH2HEiCpJLeeNabIDY7GY tNfbz1L6G1cX8zU61OStGyVxxwXDMAkljbaHN4R5SlIzIOyyuOrDBRazWg28QKGAU0Srmi iQXSzJUfnV7Jx92KMBdLyn76gmlLSgU= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=Q3t3fVnA; spf=pass (imf17.hostedemail.com: domain of surenb@google.com designates 209.85.219.172 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yb1-f172.google.com with SMTP id 3f1490d57ef6-d43930354bcso486882276.3 for ; Wed, 09 Aug 2023 23:24:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1691648669; x=1692253469; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=fcRSefl3OZZaOz0guv+13e/oWyf4yL2+dUUAXZrxcbs=; b=Q3t3fVnAnbjwvs1rSolyQhbmJnfPYm4QBm+7lLywrgD8BaSQDgk4w4RaTRwmlXONqB oWkeJhKvJjruyocXII1gaUwsutCG9Yene4sGUmbDazXoHN295cpH9yMQ66b+SDV0HPtJ LgHgZB+T5NjxAXgU6Fi3dPiGYImioi7VtSKa3rPoOKjRkBAceBXbxWEZI8f6qIAwcF8t e65EilUk3dtZW5LQB+kNlta6MqWW4uwmsNfpyQKZPPdrJ/RCvIzadXP9Kw3SMH4+eECG b2tte4iX8hD0Oj699VXXhEghOr6GX9v8ftDETAmP9DyS+fo0LHvMLWy2IG1rnB+dukv0 c3yA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691648669; x=1692253469; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fcRSefl3OZZaOz0guv+13e/oWyf4yL2+dUUAXZrxcbs=; b=dJaq3R6UcM7FDPymBKVb/WjnaQHvNyu71TEgcWO12Ud1KGkDSaKAfUGgPQtqEZ5FVk ptuuSqEUxkbt/cCAjpBEkPumqhsCUSpnWGlJ//X0pbDXr9i6Hqst7oiLRF72aPhj2LMm /uDeEXdS3Yq6uVBskkkmZOJt4i6virjJvbzHq6wxSx5rr+UBTpF0nCWL5vWUF/mFXl2/ AfJDCZVCJ1lSOPPiT5L2OasMdRR2xseE1Jq8ky6tXCWnt+4wGKYidp94HD0F0YGLnJbX oleXx+CeYJId3Z6nqZoJplqT19XQc8yLLZJYpgkCyMvsrdyBstYZW1JoKp8Qqai1XwSL iXFA== X-Gm-Message-State: AOJu0YyXF4oK7rcDPsD9rkX205Lcn/er7AXeTdsqpSwPcr9aex+idTnA 6EfLGg6RoMSK5+2a6XiObn3GJgq7qaK1oQvWOGfi9A== X-Google-Smtp-Source: AGHT+IFOLniwxzWYoJ8fkv7Tcws0en57d877E8g4X26U+1A06Nius1ggOJxrvezVXOQb/C474rT0eFEVM7bBh7DoO6U= X-Received: by 2002:a25:aaea:0:b0:d43:604f:9de1 with SMTP id t97-20020a25aaea000000b00d43604f9de1mr1690683ybi.17.1691648669006; Wed, 09 Aug 2023 23:24:29 -0700 (PDT) MIME-Version: 1.0 References: <20230630211957.1341547-1-surenb@google.com> <0ab6524a-6917-efe2-de69-f07fb5cdd9d2@redhat.com> In-Reply-To: From: Suren Baghdasaryan Date: Thu, 10 Aug 2023 06:24:15 +0000 Message-ID: Subject: Re: [PATCH v7 0/6] Per-VMA lock support for swap and userfaults To: David Hildenbrand , willy@infradead.org Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@suse.com, josef@toxicpanda.com, jack@suse.cz, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, michel@lespinasse.org, liam.howlett@oracle.com, jglisse@google.com, vbabka@suse.cz, minchan@google.com, dave@stgolabs.net, punit.agrawal@bytedance.com, lstoakes@gmail.com, hdanton@sina.com, apopple@nvidia.com, peterx@redhat.com, ying.huang@intel.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, pasha.tatashin@soleen.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 337644000A X-Rspam-User: X-Stat-Signature: tze9k1n57eye3suz8xfzfw5cx9qmjpxf X-Rspamd-Server: rspam03 X-HE-Tag: 1691648669-306212 X-HE-Meta: U2FsdGVkX1+2OgtOcAOzZaTj05N8KKULuxBGLmFRtczvUX3mCIm7NA6AOJz0hov6LtKshLfrgF7T7Om++RBPsX3HKGTZi3LueB5ZT+7zwMIoRmi/YWjOJDbA9rw9QfN8r0WzxGmiIgbC+BtM8aoUKAB7Fc32WOs8K3v8L18WkN+8NZx20fTyvzstec5fMVoA2ruedWNvOJalYXJefeq6x/i1Yb8Ioirwgt8swPmKBvpkOLTJ2HmOYM63+ISYlA2J7RSkI3oTLohjJJ53jL8id1VJ+ToFTi0EEshRxvdtCnQawDpffGeuxwrGuxBvD8oDU2tI4PAWt91GH/FqTpTMmWGKlnBCO4bvQMnGhXP5ONztkmYBKGpqW17XgbwYlPVxk2QMZlhGdHO336w8AD0UfZhiR8HCFhlFDf+WolRRxAJ3LgOXVBZrDrhSxEb+DthJx9fmY+djqOMldThjcCuvJSHZ/tedEYfaO4rS4xSwwC+dc2U0e78ivwiGRh/7xbrZ5QxX9MAD24akfa3qWs02lGJLeYzGi5GxlBVfXQ8/phhVAcsfjuisvubU2TaNU6twNIXHSKMYzb0BIr7BXUt7o6vFDjfvHE2xkcVcZl2lxbsL3cz+RYqgRM6hwwds14Do73Kz8Pl8TKCNaLLHM6VVElEBBRzMEbIXi/BuFQXrClnqx/at0iLIpFFrd2oy1+LPOj3+MjOAggC5+QD4sw9vjzogT/OsjUOK+qzP+rQzvaGLmTELfXmCH4wuKdwsOlv9awdRIdHEJVafbSYGhPKiIaL5CvDRUTiR7d+lxINYXK7Bfrj3qvB+6vIS7uCSPwOBUwjpvbTJbYKpX2vz25KLggob4VaNx5D8CrNLGRtzr7/KhGUtErZG9XtXSFnn7qTwbfLO9snfibrsT25EXzgC0fFedNBTLaKy8+XkC3NW2+wxAKRqct5MWhsDzFK9M3waWCmZ8CILnXtbZWD+XM/ /BnteYai y4WCzwHjvOUJxznMI8hqcbksYfHGw37NABhZyvHZ5f3U2aK4FoHRiDJsghV9TDcQp6tFWsy08d/SSxT6WVIHJG7T2J6Y13vnLzeFeSm6pGiuqvXPKrwNvw4AnTlUSuD3L/sLOA5IYdi8P/RNfklK8Y0YN1bvNvFK0fi3febiRDYAt5DhcUYN+MtED37psDl51KGBuT8jD4y9+EZsKFesqvRVeROplVliUYrHYX8TurnkhQtLEUoi2uv9exjZn6uykHwuZmI7aqoIcXI7HKk22eDI3HA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Aug 9, 2023 at 10:29=E2=80=AFPM Suren Baghdasaryan wrote: > > On Wed, Aug 9, 2023 at 11:31=E2=80=AFAM Suren Baghdasaryan wrote: > > > > On Wed, Aug 9, 2023 at 11:08=E2=80=AFAM Suren Baghdasaryan wrote: > > > > > > On Wed, Aug 9, 2023 at 11:04=E2=80=AFAM David Hildenbrand wrote: > > > > > > > > >>>> Which ends up being > > > > >>>> > > > > >>>> VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); > > > > >>>> > > > > >>>> I did not check if this is also the case on mainline, and if t= his series is responsible. > > > > >>> > > > > >>> Thanks for reporting! I'm checking it now. > > > > >> > > > > >> Hmm. From the code it's not obvious how lock_mm_and_find_vma() e= nds up > > > > >> calling find_vma() without mmap_lock after successfully completi= ng > > > > >> get_mmap_lock_carefully(). lock_mm_and_find_vma+0x3f/0x270 point= s to > > > > >> the first invocation of find_vma(), so this is not even the lock > > > > >> upgrade path... I'll try to reproduce this issue and dig up more= but > > > > >> from the information I have so far this issue does not seem to b= e > > > > >> related to this series. > > > > > > > > I just checked on mainline and it does not fail there. > > > > Thanks. Just to eliminate the possibility, I'll try reverting my > > patchset in mm-unstable and will try the test again. Will do that in > > the evening once I'm home. > > > > > > > > > > > > > > > > This is really weird. I added mmap_assert_locked(mm) calls into > > > > > get_mmap_lock_carefully() right after we acquire mmap_lock read l= ock > > > > > and one of them triggers right after successful > > > > > mmap_read_lock_killable(). Here is my modified version of > > > > > get_mmap_lock_carefully(): > > > > > > > > > > static inline bool get_mmap_lock_carefully(struct mm_struct *mm, > > > > > struct pt_regs *regs) { > > > > > /* Even if this succeeds, make it clear we might have slept= */ > > > > > if (likely(mmap_read_trylock(mm))) { > > > > > might_sleep(); > > > > > mmap_assert_locked(mm); > > > > > return true; > > > > > } > > > > > if (regs && !user_mode(regs)) { > > > > > unsigned long ip =3D instruction_pointer(regs); > > > > > if (!search_exception_tables(ip)) > > > > > return false; > > > > > } > > > > > if (!mmap_read_lock_killable(mm)) { > > > > > mmap_assert_locked(mm); <---- gener= ates a BUG > > > > > return true; > > > > > } > > > > > return false; > > > > > } > > > > > > > > Ehm, that's indeed weird. > > > > > > > > > > > > > > AFAIKT conditions for mmap_read_trylock() and > > > > > mmap_read_lock_killable() are checked correctly. Am I missing > > > > > something? > > > > > > > > Weirdly enough, it only triggers during that specific uffd test, ri= ght? > > > > > > Yes, uffd-unit-tests. I even ran it separately to ensure it's not som= e > > > fallback from a previous test and I'm able to reproduce this > > > consistently. > > Yeah, it is somehow related to per-vma locking. Unfortunately I can't > reproduce the issue on my VM, so I have to use my host and bisection > is slow. I think I'll get to the bottom of this tomorrow. Ok, I think I found the issue. wp_page_shared() -> fault_dirty_shared_page() can drop mmap_lock (see the comment saying "Drop the mmap_lock before waiting on IO, if we can...", therefore we have to ensure we are not doing this under per-VMA lock. I think what happens is that this path is racing with another page fault which took mmap_lock for read. fault_dirty_shared_page() releases this lock which was taken by another page faulting thread and that thread generates an assertion when it finds out the lock it just took got released from under it. The following crude change fixed the issue for me but there might be a more granular way to deal with this: --- a/mm/memory.c +++ b/mm/memory.c @@ -3293,18 +3293,18 @@ static vm_fault_t wp_page_shared(struct vm_fault *vmf, struct folio *folio) struct vm_area_struct *vma =3D vmf->vma; vm_fault_t ret =3D 0; + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { + pte_unmap_unlock(vmf->pte, vmf->ptl); + vma_end_read(vmf->vma); + return VM_FAULT_RETRY; + } + folio_get(folio); if (vma->vm_ops && vma->vm_ops->page_mkwrite) { vm_fault_t tmp; pte_unmap_unlock(vmf->pte, vmf->ptl); - if (vmf->flags & FAULT_FLAG_VMA_LOCK) { - folio_put(folio); - vma_end_read(vmf->vma); - return VM_FAULT_RETRY; - } - tmp =3D do_page_mkwrite(vmf, folio); if (unlikely(!tmp || (tmp & (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))= ) { Matthew, please check if this fix is valid and if there might be a better way. I think the issue was introduced by 88e2667632d4 ("mm: handle faults that merely update the accessed bit under the VMA lock") Thanks, Suren. > > > > > > > > > > > > -- > > > > Cheers, > > > > > > > > David / dhildenb > > > >