From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3028DC04A94 for ; Wed, 9 Aug 2023 14:28:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9D1816B0074; Wed, 9 Aug 2023 10:28:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 959B26B0075; Wed, 9 Aug 2023 10:28:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D3A36B0078; Wed, 9 Aug 2023 10:28:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 666766B0074 for ; Wed, 9 Aug 2023 10:28:34 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 31E5CB2819 for ; Wed, 9 Aug 2023 14:28:34 +0000 (UTC) X-FDA: 81104796948.09.55EB1E8 Received: from mail-yb1-f169.google.com (mail-yb1-f169.google.com [209.85.219.169]) by imf18.hostedemail.com (Postfix) with ESMTP id 40C5C1C0027 for ; Wed, 9 Aug 2023 14:28:32 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=N37uvbqx; spf=pass (imf18.hostedemail.com: domain of surenb@google.com designates 209.85.219.169 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691591312; a=rsa-sha256; cv=none; b=NQbOPH+HUt63biXPzvBdpwfv2/z9FaHnoUPE0Bqa0zcn6hwzyo10M5gKfGQCU3zwXyhVrr D94UgraDvINDzRUJ7r4wDW38XzGCEyyal1ebY4Q3nswoOXHt63Oae69+pppRQW2mvKJp0X PEpaOwFXsISY3q1T1aOSrxrCeYGjwKw= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=N37uvbqx; spf=pass (imf18.hostedemail.com: domain of surenb@google.com designates 209.85.219.169 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691591312; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1ihaVpeAqx9qHCMNV38aMOWE/2Pmk5R446JB7Tpkugo=; b=2HP6GJngCFY3U6gxsE2OaTSf0L+WTvcSiyQZADbibJFeaF6kjGtIOj8GJk/qia3uENeCee SSlDvffhS4v57Ik+kBM9+WLhi0fRGFpNqTcztpCoqaGcM1Y4pEhcbMByPNnqZuj4mapT8s PcjzHS/Z0HPbhXG1/FF1z4lL2/xTscw= Received: by mail-yb1-f169.google.com with SMTP id 3f1490d57ef6-d07c535377fso7387413276.1 for ; Wed, 09 Aug 2023 07:28:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1691591311; x=1692196111; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=1ihaVpeAqx9qHCMNV38aMOWE/2Pmk5R446JB7Tpkugo=; b=N37uvbqxY4sXiYPIKnctNFNTXMZQNHXwWM0PprUuzC+0Yu7nXi/5lc5ZpgtZvOOoyH 21eXvILkgW06zU+kVkD6VetCrriUXWM0kdkvfFWxQVlTSDk69TpMOKZIXn9AFhjVrSMe ZuSdF8g4Vxxugee/77jz233pDrQG5UWM/1akZEXfw97ZSiwvJXuwB8MYa8wRcOSmBFG+ jw2yB7Npq5GyIWUrHQS+v1xo6CqICGW9biInI4uXUqkG+nPNbDgLada1zB57qBTAaMPP 6KRQsbzHEK1oxmgt4ve48QsHhD5pXmlygCIyjaD3sD1NBUAkuIJoi59uAgSNacv9iaT7 DBpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691591311; x=1692196111; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1ihaVpeAqx9qHCMNV38aMOWE/2Pmk5R446JB7Tpkugo=; b=OEZoG5MHXLN2Grk6KMIydNO3sg8kHNa/NZgceDks/WPtdMpphmAz4X9pfFLrZ1kcnK A7uccx3ty6dHZTv8tq4tLhGaZnW29HiFqcp6yCEJilJc94TeO8wRJY7kDhi/2YXOBAhi 2IDOT/cGSmO+wa8xd5eL78+NioGtWn0b+ryVXR80jD3qtl9ZZ3V+SuLudKzTIZKlYm6b AXa+Uh3Yuw9zpab50DE9LZD6bv1nisNk2Uxh2c1IRw8qpQILiqAc1OrJUy7xo0FHsgaq 3PEbW/YyTVAh2Fh58QcpLOtSrd/QhJvCpD5hm09RLJRzVQvhvEd1k//ru+Bs+zYykF+Y Qa7A== X-Gm-Message-State: AOJu0YxouPm9a93tRM1yHk2Tf0XELtlI+hNrPaAo6Z2sFUTEVUHmy2r9 gj1emJhbUlmtSN+UpVr+Pk6KcL3Sfv/rITy7R2xvhw== X-Google-Smtp-Source: AGHT+IGDwS7yl4rbDfBORmhy0kNzmOzKe/dKZTCGIxWNy1WrQ4Ry3xK9vMoGM1JlzArUgDRvV5br78D5suFMcDDdj88= X-Received: by 2002:a5b:bc7:0:b0:c1a:5904:fe8e with SMTP id c7-20020a5b0bc7000000b00c1a5904fe8emr2648434ybr.34.1691591311035; Wed, 09 Aug 2023 07:28:31 -0700 (PDT) MIME-Version: 1.0 References: <20230630211957.1341547-1-surenb@google.com> In-Reply-To: From: Suren Baghdasaryan Date: Wed, 9 Aug 2023 07:28:17 -0700 Message-ID: Subject: Re: [PATCH v7 0/6] Per-VMA lock support for swap and userfaults To: David Hildenbrand Cc: akpm@linux-foundation.org, willy@infradead.org, hannes@cmpxchg.org, mhocko@suse.com, josef@toxicpanda.com, jack@suse.cz, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, michel@lespinasse.org, liam.howlett@oracle.com, jglisse@google.com, vbabka@suse.cz, minchan@google.com, dave@stgolabs.net, punit.agrawal@bytedance.com, lstoakes@gmail.com, hdanton@sina.com, apopple@nvidia.com, peterx@redhat.com, ying.huang@intel.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, pasha.tatashin@soleen.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 40C5C1C0027 X-Stat-Signature: 94ymd137s1ibb1jypruicsuqjcdq386j X-Rspam-User: X-HE-Tag: 1691591312-182042 X-HE-Meta: U2FsdGVkX1+oUVZDjza0oEoUyMW3Z3ZAHq126afPpUGjvmJ33gghuODGV4tkgtcpjgYI1JFAgfF2itu7oxZpZZEAjx1pTT4kpz7R/fgG2tvSM3Dk+pxoGXat6WUZma4IW05ipXMv1yQ9D8IOw2O1oMBf0zvoSQCAdDRhNHXrt0zmPBdzUpaG8g/zRHTE/ROP/iTi2MW91DqWdu6yuhZgSM72+slxER5nLs8kpGDfg3lL1aLtc1AphWYQiJbZGyh02KZu23G4oWM9Tmm4ZJ6d81gZLB7I6AO4dvdOtc/NihkPRjvMVXXwgzPeIa4ZwTWMpPJUw87iZz+d2zbnX0OJpW7jGdy1bxVvDmHmP7rFysyQc9ismQccVALWeK8dVWW5I9s2yCvH1QuKP/3O+ogYxWjpntUXyqfKUzDMQ+IjU4K0f39x29w84V9CzdfSFmMgt3+75iOtjnYU6y/sAp8Y2OGXSncof6gthsL6ntX2cRdh+2mYnqi4jPk3SbC8AnIgxe9jDH+7aEj7VkOVm2OKPdYz3PBpcGRCNdU6hTRSey2KGSdhDdjpzlHcH4jv+4lwfGvdclY8B6X7EEJF4q/9e6O6GfMtiTKqj9/6XMGwMSENAa1tf/36lNPOANZHrb6dBL5CM797MZG70Hz2Oaf7eXuC3sN3JYLvQmRlFjGVdUpdmPREFA2lse7AbOtBP0lz2E5a7S+AsUpPL4envgIVZ95C2UgDs4U+OPAlxfPTX1h6yDLyqMjJtsr0ZuGfogkhC3kHxxMSnc0VZrfEuDo8yf0fbEDImo6xI9UDy+roDA+SBXJakB1pccWWRIXaG3vdb/jkfRh5X9k/+hCS+MpheEp6KU75d7+COnj9pI5+V9nPoLEZIOy6jYgyQtxWUC72z5zRicgvaoO3ZMgevCGaitOCzuor2kEMNWq63P9BXS15oEbgCKGqrQymGRGV/lJX7nw7H8JcUF8RP5rxouI IymoD3LV W5fA6Rl/dsXeQcIdYRYmxqoAUts0hFvnyX3IknlYHm4wSrcBpH/lQg55c8bOiV2V9OPJF0zRPlvVabsSkKC+DGCsrign2xsPwoP+FRUU9QF4NVr2qmnPH8lBbGJwEMazrpKqpx/qVVXJ/acZ45YeQDMsbWoa/0rMQZFf7xMDIxJDOlgFTcc8igQwC+/6dWG3DshNQT1fbUB210OP8G1MjANi6NTNllJS6XXZDZW8mgc1hElyZwgA+eeo4DkGDa5Bm9ueHDhzallungZvFZLzP8pjZ/5iqGd1iN//YxccewXaQqAxYesrLrcFZBWri2qQs3mYekc/iCIYcBjdC0Fev+ZwMlg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Aug 9, 2023 at 12:49=E2=80=AFAM David Hildenbrand wrote: > > On 30.06.23 23:19, Suren Baghdasaryan wrote: > > When per-VMA locks were introduced in [1] several types of page faults > > would still fall back to mmap_lock to keep the patchset simple. Among t= hem > > are swap and userfault pages. The main reason for skipping those cases = was > > the fact that mmap_lock could be dropped while handling these faults an= d > > that required additional logic to be implemented. > > Implement the mechanism to allow per-VMA locks to be dropped for these > > cases. > > First, change handle_mm_fault to drop per-VMA locks when returning > > VM_FAULT_RETRY or VM_FAULT_COMPLETED to be consistent with the way > > mmap_lock is handled. Then change folio_lock_or_retry to accept vm_faul= t > > and return vm_fault_t which simplifies later patches. Finally allow swa= p > > and uffd page faults to be handled under per-VMA locks by dropping per-= VMA > > and retrying, the same way it's done under mmap_lock. > > Naturally, once VMA lock is dropped that VMA should be assumed unstable > > and can't be used. > > > > Changes since v6 posted at [2] > > - 4/6 replaced the ternary operation in folio_lock_or_retry, > > per Matthew Wilcox > > - 4/6 changed return code description for __folio_lock_or_retry > > per Matthew Wilcox > > > > Note: patch 3/6 will cause a trivial merge conflict in arch/arm64/mm/fa= ult.c > > when applied over mm-unstable branch due to a patch from ARM64 tree [3] > > which is missing in mm-unstable. > > > > [1] https://lore.kernel.org/all/20230227173632.3292573-1-surenb@google.= com/ > > [2] https://lore.kernel.org/all/20230630020436.1066016-1-surenb@google.= com/ > > [3] https://lore.kernel.org/all/20230524131305.2808-1-jszhang@kernel.or= g/ > > > > Suren Baghdasaryan (6): > > swap: remove remnants of polling from read_swap_cache_async > > mm: add missing VM_FAULT_RESULT_TRACE name for VM_FAULT_COMPLETED > > mm: drop per-VMA lock when returning VM_FAULT_RETRY or > > VM_FAULT_COMPLETED > > mm: change folio_lock_or_retry to use vm_fault directly > > mm: handle swap page faults under per-VMA lock > > mm: handle userfaults under VMA lock > > On mm/mm-unstable I get running the selftests: > > Testing sigbus-wp on shmem... [ 383.215804] mm ffff9666078e5280 task_siz= e 140737488351232 > [ 383.215804] get_unmapped_area ffffffffad03b980 > [ 383.215804] mmap_base 140378441285632 mmap_legacy_base 47254353883136 > [ 383.215804] pgd ffff966608960000 mm_users 1 mm_count 6 pgtables_bytes = 126976 map_count 28 > [ 383.215804] hiwater_rss 6183 hiwater_vm 8aa7 total_vm 8aa7 locked_vm 0 > [ 383.215804] pinned_vm 0 data_vm 844 exec_vm 1a4 stack_vm 21 > [ 383.215804] start_code 402000 end_code 408f09 start_data 40ce10 end_da= ta 40d500 > [ 383.215804] start_brk 17fe000 brk 1830000 start_stack 7ffecbbe08e0 > [ 383.215804] arg_start 7ffecbbe1c6f arg_end 7ffecbbe1c81 env_start 7ffe= cbbe1c81 env_end 7ffecbbe1fe6 > [ 383.215804] binfmt ffffffffaf3efe40 flags 80000cd > [ 383.215804] ioctx_table 0000000000000000 > [ 383.215804] owner ffff96660d4a4000 exe_file ffff966285501a00 > [ 383.215804] notifier_subscriptions 0000000000000000 > [ 383.215804] numa_next_scan 4295050919 numa_scan_offset 0 numa_scan_seq= 0 > [ 383.215804] tlb_flush_pending 0 > [ 383.215804] def_flags: 0x0() > [ 383.236255] ------------[ cut here ]------------ > [ 383.237537] kernel BUG at include/linux/mmap_lock.h:66! > [ 383.238897] invalid opcode: 0000 [#1] PREEMPT SMP PTI > [ 383.240114] CPU: 37 PID: 1482 Comm: uffd-unit-tests Not tainted 6.5.0-= rc4+ #68 > [ 383.242513] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1= .16.2-1.fc38 04/01/2014 > [ 383.244936] RIP: 0010:find_vma+0x3a/0x40 > [ 383.246200] Code: 48 89 34 24 48 85 c0 74 1c 48 83 c7 40 48 c7 c2 ff f= f ff ff 48 89 e6 e8 a4 29 ba 00 > [ 383.251084] RSP: 0000:ffffae3745b6beb0 EFLAGS: 00010282 > [ 383.252781] RAX: 0000000000000314 RBX: ffff9666078e5280 RCX: 000000000= 0000000 > [ 383.255073] RDX: 0000000000000001 RSI: ffffffffae8f69c3 RDI: 00000000f= fffffff > [ 383.257352] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffae374= 5b6bc48 > [ 383.259369] R10: 0000000000000003 R11: ffff9669fff46fe8 R12: 000000004= 4401028 > [ 383.261570] R13: ffff9666078e5338 R14: ffffae3745b6bf58 R15: 000000000= 0000400 > [ 383.263499] FS: 00007fac671c5740(0000) GS:ffff9669efbc0000(0000) knlG= S:0000000000000000 > [ 383.265483] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 383.266847] CR2: 0000000044401028 CR3: 0000000488960006 CR4: 000000000= 0770ee0 > [ 383.268532] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 000000000= 0000000 > [ 383.270206] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 000000000= 0000400 > [ 383.271905] PKRU: 55555554 > [ 383.272593] Call Trace: > [ 383.273215] > [ 383.273774] ? die+0x32/0x80 > [ 383.274510] ? do_trap+0xd6/0x100 > [ 383.275326] ? find_vma+0x3a/0x40 > [ 383.276152] ? do_error_trap+0x6a/0x90 > [ 383.277072] ? find_vma+0x3a/0x40 > [ 383.277899] ? exc_invalid_op+0x4c/0x60 > [ 383.278846] ? find_vma+0x3a/0x40 > [ 383.279675] ? asm_exc_invalid_op+0x16/0x20 > [ 383.280698] ? find_vma+0x3a/0x40 > [ 383.281527] lock_mm_and_find_vma+0x3f/0x270 > [ 383.282570] do_user_addr_fault+0x1e4/0x660 > [ 383.283591] exc_page_fault+0x73/0x170 > [ 383.284509] asm_exc_page_fault+0x22/0x30 > [ 383.285486] RIP: 0033:0x404428 > [ 383.286265] Code: 48 89 85 18 ff ff ff e9 dc 00 00 00 48 8b 15 9f 92 0= 0 00 48 8b 05 80 92 00 00 48 03 > [ 383.290566] RSP: 002b:00007ffecbbe05c0 EFLAGS: 00010206 > [ 383.291814] RAX: 0000000044401028 RBX: 00007ffecbbe08e8 RCX: 00007fac6= 6e93c18 > [ 383.293502] RDX: 0000000044400000 RSI: 0000000000000001 RDI: 000000000= 0000000 > [ 383.295175] RBP: 00007ffecbbe06c0 R08: 00007ffecbbe05c0 R09: 00007ffec= bbe06c0 > [ 383.296857] R10: 0000000000000008 R11: 0000000000000246 R12: 000000000= 0000000 > [ 383.298533] R13: 00007ffecbbe08f8 R14: 000000000040ce18 R15: 00007fac6= 7206000 > [ 383.300203] > [ 383.300775] Modules linked in: rfkill intel_rapl_msr intel_rapl_common= intel_uncore_frequency_commong > [ 383.309661] ---[ end trace 0000000000000000 ]--- > [ 383.310795] RIP: 0010:find_vma+0x3a/0x40 > [ 383.311771] Code: 48 89 34 24 48 85 c0 74 1c 48 83 c7 40 48 c7 c2 ff f= f ff ff 48 89 e6 e8 a4 29 ba 00 > [ 383.316081] RSP: 0000:ffffae3745b6beb0 EFLAGS: 00010282 > [ 383.317346] RAX: 0000000000000314 RBX: ffff9666078e5280 RCX: 000000000= 0000000 > [ 383.319050] RDX: 0000000000000001 RSI: ffffffffae8f69c3 RDI: 00000000f= fffffff > [ 383.320767] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffae374= 5b6bc48 > [ 383.322468] R10: 0000000000000003 R11: ffff9669fff46fe8 R12: 000000004= 4401028 > [ 383.324164] R13: ffff9666078e5338 R14: ffffae3745b6bf58 R15: 000000000= 0000400 > [ 383.325870] FS: 00007fac671c5740(0000) GS:ffff9669efbc0000(0000) knlG= S:0000000000000000 > [ 383.327795] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 383.329177] CR2: 0000000044401028 CR3: 0000000488960006 CR4: 000000000= 0770ee0 > [ 383.330885] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 000000000= 0000000 > [ 383.332592] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 000000000= 0000400 > [ 383.334287] PKRU: 55555554 > > > Which ends up being > > VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); > > I did not check if this is also the case on mainline, and if this series = is responsible. Thanks for reporting! I'm checking it now. > > -- > Cheers, > > David / dhildenb >