From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F04EC001DB for ; Thu, 10 Aug 2023 07:41:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A69586B0071; Thu, 10 Aug 2023 03:41:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A18E16B0074; Thu, 10 Aug 2023 03:41:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8E1176B0075; Thu, 10 Aug 2023 03:41:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 7EF356B0071 for ; Thu, 10 Aug 2023 03:41:06 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 430DDA0890 for ; Thu, 10 Aug 2023 07:41:06 +0000 (UTC) X-FDA: 81107398932.05.74626B0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf09.hostedemail.com (Postfix) with ESMTP id 0F50314000E for ; Thu, 10 Aug 2023 07:41:03 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=eDdpSnkW; spf=pass (imf09.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691653264; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wo0nlmTB+yI/EjPzL+nHf3s+1gZ0NohNOJVrAJSRTwA=; b=4hdix6lF9BZh2pRgZnVdqD2/6QnTrhwjTgiAfjmZtJFVB/J/y+nEoCAtMjCdVY00o++Iyy GOwdRNOhjcToFAlzsx0I9ieCWWBFqOMlfCw/GE/RjH4pTZ+kPEToH+q/IZ/BGm7v/J1axA sMHgC14QHIU1gFxaUfwd3UD4xtSgdMI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691653264; a=rsa-sha256; cv=none; b=WNizmOEZnX3atXmKNfcHVS6UgFOLK79chD9t1uvZtjGu0LJPY8cdXScjEUHNnv1TIVbAlo ighvB7w1G7O1hOjLff0+cKYzEBOzb2iyUE6yq25ieJUaTMrf0XLl1dYYaL9mPqo7G1pKZJ k/25m+cnF1gxSGhXdY6I2HvdgJ5HUWw= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=eDdpSnkW; spf=pass (imf09.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1691653263; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wo0nlmTB+yI/EjPzL+nHf3s+1gZ0NohNOJVrAJSRTwA=; b=eDdpSnkWbVi8NmCvo9E8DAMGlwyEgKT+LZ8XcwQsHN1o174BMT+f1QCK00DQvZ6jzJ/8Ws K1Z6Y2qxWXnz21Gxl5kr1WVhAi6iJR/Ekrn9FpdGs0v3vlaSvXicwvIOoEXfukOe7OkAOn EimXn1vNCfRtLhX8vhMW1vjzkNr5MEo= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-96-ubU708nnMPi8lnW0tgpwBg-1; Thu, 10 Aug 2023 03:41:00 -0400 X-MC-Unique: ubU708nnMPi8lnW0tgpwBg-1 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-3fe2fc65f1fso3851345e9.3 for ; Thu, 10 Aug 2023 00:41:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691653259; x=1692258059; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=wo0nlmTB+yI/EjPzL+nHf3s+1gZ0NohNOJVrAJSRTwA=; b=BMl5QpzMbTA+vfDv3QRMsVt5CChx+kMmubUk0jt8bq9KS0/yRiawVsPfJrMI/p2JyM 4OsuumNiOX9B10LlagpCtUzC7K8EvqmHfdPBhbJe8f3p1+gvthKrvYkOv4bfjkO8U/0v yIxha3SGwZB8T2ufzPabxsP9ao8pbB86+zdvMko3Pm6scA4I82dFghsADBJ+ck1TmhFF rH7LpAXg8ZSyF0CtVCbHcK6fJ3FN3/l9bl9WYuEoNr4A+f6MYTxf5+TDal8Eppxcii4F 55pnmq06q8hi14tTk3YoiPLfMTvcP3x90axrI6AnzYWnwznnfAIK9f0qixqLel0mNsHG 1++A== X-Gm-Message-State: AOJu0Yxg7oA5JqsPhVK9FpW/d7EUHgQHm3Bz2pUV1jHrNzsz4yTw1dCq /ZfQr/o0DEUQ/qMcOXW3Q17VMAD+JvMGaR06SMvq7mmRv9NI+R3APCG0EZcJkxW2yeneqpjwsOM YvHP4haE5Q9E= X-Received: by 2002:a05:600c:da:b0:3fd:29cf:20c5 with SMTP id u26-20020a05600c00da00b003fd29cf20c5mr1271259wmm.7.1691653259491; Thu, 10 Aug 2023 00:40:59 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEvM10aRE0/c6fVx3Q56OHfaGwLcM5SZ9caahh1IpRqOJJTKbhysWzSoWCzzN7tmB5JyQridg== X-Received: by 2002:a05:600c:da:b0:3fd:29cf:20c5 with SMTP id u26-20020a05600c00da00b003fd29cf20c5mr1271232wmm.7.1691653258988; Thu, 10 Aug 2023 00:40:58 -0700 (PDT) Received: from ?IPV6:2a09:80c0:192:0:5dac:bf3d:c41:c3e7? ([2a09:80c0:192:0:5dac:bf3d:c41:c3e7]) by smtp.gmail.com with ESMTPSA id 19-20020a05600c229300b003fe24da493dsm1214769wmf.41.2023.08.10.00.40.57 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 10 Aug 2023 00:40:58 -0700 (PDT) Message-ID: <01e20a4a-35dc-b342-081f-0edaf8780f51@redhat.com> Date: Thu, 10 Aug 2023 09:40:57 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 To: Suren Baghdasaryan , willy@infradead.org Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@suse.com, josef@toxicpanda.com, jack@suse.cz, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, michel@lespinasse.org, liam.howlett@oracle.com, jglisse@google.com, vbabka@suse.cz, minchan@google.com, dave@stgolabs.net, punit.agrawal@bytedance.com, lstoakes@gmail.com, hdanton@sina.com, apopple@nvidia.com, peterx@redhat.com, ying.huang@intel.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, pasha.tatashin@soleen.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com References: <20230630211957.1341547-1-surenb@google.com> <0ab6524a-6917-efe2-de69-f07fb5cdd9d2@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v7 0/6] Per-VMA lock support for swap and userfaults In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 0F50314000E X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: d8cpnmkp1xncybx7rtbt94m9zbzeiog7 X-HE-Tag: 1691653263-631841 X-HE-Meta: U2FsdGVkX18+/8vSQgqqL7qapvdEh5hnLuImyvo4Mwpd5bnWh+CVDdX2UqkV9P3EXoidoiXbFI2BQjajqbv+O9ixwyKOMLPH4l5cHDznNOvRbrE9LKtyinC2P3cqhhX97p1rOrtOmagLOJ/EIXDgUckBc4GqH0p7dpYZIvlYnMw/xQVaQzggeG7KuzKshRh5oUAkWBGTralFb0UAbtLIEUx5Ia0QtEZWTnKZBsK2FTDz52k3o3c2wGopiDlWpHyI85OsbOF6gKQnj9KH8RSAI5nhENZX+endWcuCDF5Y2ZL3vkajamokSzK59Ao9C3SmmDQ50XEZazPmzenH799yBS1ojMEVL+Kqqo3TcRZ6tqR0XNjAPxyRwloeyPqESfE75+/5bR5HtLxiMe/+iy2stWUV9GXfRr8COgWNdGITE3rx1wSZtevNRdbI9gWyCUJoozlgiLDAAy2FUykl2kiPU1qV85yg8yycsg61dpbiw6MXsZ3SwPQvzmGoy+Wu9ML65fskpl9nfaD0Yu3+ZJBjgnRaTijAI2pLXjWIQpg3CP9teYpk8KXA4oHI44MnXxbclqOhQRbTca965ViLt932LCbli/bTPUZDZ9jZE8LUZ26TJGq0KrT2LJIIAkc8EUB86KNpC9B+W6NdzI/LVs8IVaENpUKcHPYHvhM0DjVpllFyUBQRLPlQF99tUkTfZ/G8qo0kKUf+mqNZI3f03XAcCYRH6eD2v2P31Q4l+H+AvIAsqXt/42EIXlh0mvteeDTALKcUCLz+8BN0IB9hZfy0IgvB/jpVx0W/A10+i8PnQEaqdnonLDVuf/Vu3UN9bmlifEtv+fD3H5IQV+ERpocUfyN8dhM9Rw+NRQbFczB6eFCSC7QOt33g2NK6Bcv0RgRyZ0lhQmEyhp6qih6CFUsoG8W81x9G6st+qIGq/PihddKxhGvGpSozOjMMZeVBr7OZc8Z9Sc2mVGdc4XBb0k3 wMdlfNBM bLYb83WIj3n9Mik80AtZBS6PLdeZKpKj1n9i/zQa3xBHaCvP86Hyo1JFNaEfuX5wTqEaypEp4AxN2gY8DVDDQdyi2yXVIEPe0+lcsCNG6Ma4VOh6xE7N6k29AmxfxNf7I2tNrR28/Ah5XittnJlwe0ARmTXiopqx7Mtg1efzCHKHgbpZ+93xgn4WIkXjRB9ANotEV9O1CV5b5WJEAuqTiWlj3TsQV4fGAY6ORlhNuZrAmBMlDxy9AXuwVJtOFnzJ3TU001Nxwt7yKzr5oJxQAgq+d17VqCX9dzUUocdoW7O3toq9+/gvgx5zmp+dIgcOJZGkEq49vlkdc+I89cpA1jNns5KpVlNEzBN4tJG+m2CVubXFxojg0FPTPZKcKcxOFJaJfRPmCeLWbWuXgRGSmcwgaHY0bAu7wsEXN29nEjW/lHLH8d/2TCeK7OV/jGelvpyJ4yMNNHA8F/U4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 10.08.23 08:24, Suren Baghdasaryan wrote: > On Wed, Aug 9, 2023 at 10:29 PM Suren Baghdasaryan wrote: >> >> On Wed, Aug 9, 2023 at 11:31 AM Suren Baghdasaryan wrote: >>> >>> On Wed, Aug 9, 2023 at 11:08 AM Suren Baghdasaryan wrote: >>>> >>>> On Wed, Aug 9, 2023 at 11:04 AM David Hildenbrand wrote: >>>>> >>>>>>>>> Which ends up being >>>>>>>>> >>>>>>>>> VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); >>>>>>>>> >>>>>>>>> I did not check if this is also the case on mainline, and if this series is responsible. >>>>>>>> >>>>>>>> Thanks for reporting! I'm checking it now. >>>>>>> >>>>>>> Hmm. From the code it's not obvious how lock_mm_and_find_vma() ends up >>>>>>> calling find_vma() without mmap_lock after successfully completing >>>>>>> get_mmap_lock_carefully(). lock_mm_and_find_vma+0x3f/0x270 points to >>>>>>> the first invocation of find_vma(), so this is not even the lock >>>>>>> upgrade path... I'll try to reproduce this issue and dig up more but >>>>>>> from the information I have so far this issue does not seem to be >>>>>>> related to this series. >>>>> >>>>> I just checked on mainline and it does not fail there. >>> >>> Thanks. Just to eliminate the possibility, I'll try reverting my >>> patchset in mm-unstable and will try the test again. Will do that in >>> the evening once I'm home. >>> >>>>> >>>>>> >>>>>> This is really weird. I added mmap_assert_locked(mm) calls into >>>>>> get_mmap_lock_carefully() right after we acquire mmap_lock read lock >>>>>> and one of them triggers right after successful >>>>>> mmap_read_lock_killable(). Here is my modified version of >>>>>> get_mmap_lock_carefully(): >>>>>> >>>>>> static inline bool get_mmap_lock_carefully(struct mm_struct *mm, >>>>>> struct pt_regs *regs) { >>>>>> /* Even if this succeeds, make it clear we might have slept */ >>>>>> if (likely(mmap_read_trylock(mm))) { >>>>>> might_sleep(); >>>>>> mmap_assert_locked(mm); >>>>>> return true; >>>>>> } >>>>>> if (regs && !user_mode(regs)) { >>>>>> unsigned long ip = instruction_pointer(regs); >>>>>> if (!search_exception_tables(ip)) >>>>>> return false; >>>>>> } >>>>>> if (!mmap_read_lock_killable(mm)) { >>>>>> mmap_assert_locked(mm); <---- generates a BUG >>>>>> return true; >>>>>> } >>>>>> return false; >>>>>> } >>>>> >>>>> Ehm, that's indeed weird. >>>>> >>>>>> >>>>>> AFAIKT conditions for mmap_read_trylock() and >>>>>> mmap_read_lock_killable() are checked correctly. Am I missing >>>>>> something? >>>>> >>>>> Weirdly enough, it only triggers during that specific uffd test, right? >>>> >>>> Yes, uffd-unit-tests. I even ran it separately to ensure it's not some >>>> fallback from a previous test and I'm able to reproduce this >>>> consistently. >> >> Yeah, it is somehow related to per-vma locking. Unfortunately I can't >> reproduce the issue on my VM, so I have to use my host and bisection >> is slow. I think I'll get to the bottom of this tomorrow. > > Ok, I think I found the issue. Nice! > wp_page_shared() -> > fault_dirty_shared_page() can drop mmap_lock (see the comment saying > "Drop the mmap_lock before waiting on IO, if we can...", therefore we > have to ensure we are not doing this under per-VMA lock. > I think what happens is that this path is racing with another page > fault which took mmap_lock for read. fault_dirty_shared_page() > releases this lock which was taken by another page faulting thread and > that thread generates an assertion when it finds out the lock it just > took got released from under it. I wonder if we could detect that someone releases the mmap lock that was not taken by that person, to bail out early at the right place when debugging such issues. Only with certain config knobs enabled, of course. > The following crude change fixed the issue for me but there might be a > more granular way to deal with this: > > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3293,18 +3293,18 @@ static vm_fault_t wp_page_shared(struct > vm_fault *vmf, struct folio *folio) > struct vm_area_struct *vma = vmf->vma; > vm_fault_t ret = 0; > > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { > + pte_unmap_unlock(vmf->pte, vmf->ptl); > + vma_end_read(vmf->vma); > + return VM_FAULT_RETRY; > + } > + I won't lie: all of these locking checks are a bit hard to get and possibly even harder to maintain. Maybe better mmap unlock sanity checks as spelled out above might help improve part of the situation. And maybe some comments regarding the placement might help as well ;) -- Cheers, David / dhildenb