From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0CC9EB64D9 for ; Tue, 27 Jun 2023 16:24:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1B9A68D0002; Tue, 27 Jun 2023 12:24:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 16A338D0001; Tue, 27 Jun 2023 12:24:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 033B88D0002; Tue, 27 Jun 2023 12:24:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id E71768D0001 for ; Tue, 27 Jun 2023 12:24:55 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id AE9701C8B08 for ; Tue, 27 Jun 2023 16:24:55 +0000 (UTC) X-FDA: 80949051750.04.2C0D89A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf19.hostedemail.com (Postfix) with ESMTP id 3151A1A000C for ; Tue, 27 Jun 2023 16:24:51 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="GWeop/eT"; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf19.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687883093; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Mgfy3ZXXgXJMVAGb4+oxMVfqPwB6JstJnnUKkP8Y35s=; b=CMyDQKPp/pzEP18BVoGOcA0mhbQgWP/nHAz5zp9hnqR9QotjvUe2HnVKIaz+t9J+kRTCex GoGp29Y8oX95dfgxBXRZoDXs+oitzH3osHzlZCul0XVlUNmJ//6NshfLfMZrSElSOjAeRN 2RCh+xZHGUYv135Vkenh7d9aG8z/XmM= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="GWeop/eT"; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf19.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687883093; a=rsa-sha256; cv=none; b=yfSoY+474PTRq5SmAxy4iItw6eoHplf6kYoG8o1evrO6jZdVpLSrSn6PYIjNCSvYktQDdG 9m3XKUKX304l2UwRAWKnz2lRsmMzBbEAMKpmqg0GBumuF9UlwelHIG0LpM0lypd1A6C3c7 G2KKCHwgTeNxCc54hQGYAf6IlIsqUl4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1687883090; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Mgfy3ZXXgXJMVAGb4+oxMVfqPwB6JstJnnUKkP8Y35s=; b=GWeop/eTCdFSSxtqA6hSbW/0Skp1h7yd3HHopybcmtgQQ1f/NfZaYNQFRljSQxZJAKtBNu EbMNvqVTgI0PZgbwh4yHFC4dl/lm4B8DVTLEQBNt4su44GL0i/wuDP1uV8FL3OQpAKlyRK Uz8E7azq74JAG/zOgAOaFZKdgv60TxA= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-648-z2datmUrNTeSCu_SafceWw-1; Tue, 27 Jun 2023 12:24:28 -0400 X-MC-Unique: z2datmUrNTeSCu_SafceWw-1 Received: by mail-qk1-f198.google.com with SMTP id af79cd13be357-76714d4beebso7348585a.0 for ; Tue, 27 Jun 2023 09:24:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687883063; x=1690475063; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Mgfy3ZXXgXJMVAGb4+oxMVfqPwB6JstJnnUKkP8Y35s=; b=PkIXihYT6pN7rSqohAD7QcylTNiBLXvTpJBS8af34J5AwucMrYWRS/fF9AWfhI3cfF pQJTrSXgW3OO7D+s2zTvYiHMYSl7UFrAxvlLdp1g6XKljZHNxApsXk8EY8d6BPazYi4f 0J55KkxjKmFB73kTjjs6Q6nnkH+LZEOcc6lEzOMHo7unLjDyX+9NVLum5RbYzTxT+b9n dApVtp95A4bv65b1i8Wtnnq1hgyRA9cbvrLdmRGb278osBdLq323SyUcR1wHSEtBsQ3y KHFOr3QBIa7VhtqSB4uX972KTAJ/QHi9s/0utVy/tZnEyEAqY3v42UqO/gYLzaMqHyA0 Fblg== X-Gm-Message-State: AC+VfDyXVDo41ZNOZ/58RWSSHCTl7Aga1+LmIrqwQAdfMrs5ITHxwA2/ hjnL0kLK5xf3ZDCq5erOE+yl60DAQif7j3QtDn6veltcnvuGLJxb2xcOdL5mvoUliE0lPdudzeE lq4Mdx0oWXhw= X-Received: by 2002:a05:620a:4502:b0:75b:23a1:69f0 with SMTP id t2-20020a05620a450200b0075b23a169f0mr45160895qkp.7.1687883062651; Tue, 27 Jun 2023 09:24:22 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4N80tCfrDuSYYuboDG5GYUYSpis6DKre0Lz8au633iIZSaQabV/3oaNp9WWlqAr2nLop5NVA== X-Received: by 2002:a05:620a:4502:b0:75b:23a1:69f0 with SMTP id t2-20020a05620a450200b0075b23a169f0mr45160860qkp.7.1687883062320; Tue, 27 Jun 2023 09:24:22 -0700 (PDT) Received: from x1n (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id z19-20020a05620a101300b007625382f4d6sm4087674qkj.84.2023.06.27.09.24.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Jun 2023 09:24:22 -0700 (PDT) Date: Tue, 27 Jun 2023 12:24:20 -0400 From: Peter Xu To: Suren Baghdasaryan Cc: akpm@linux-foundation.org, willy@infradead.org, hannes@cmpxchg.org, mhocko@suse.com, josef@toxicpanda.com, jack@suse.cz, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, michel@lespinasse.org, liam.howlett@oracle.com, jglisse@google.com, vbabka@suse.cz, minchan@google.com, dave@stgolabs.net, punit.agrawal@bytedance.com, lstoakes@gmail.com, hdanton@sina.com, apopple@nvidia.com, ying.huang@intel.com, david@redhat.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, pasha.tatashin@soleen.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com Subject: Re: [PATCH v3 6/8] mm: handle swap page faults under per-VMA lock Message-ID: References: <20230627042321.1763765-1-surenb@google.com> <20230627042321.1763765-7-surenb@google.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 3151A1A000C X-Stat-Signature: mo1e567r1t6x6kn4yxwx6bm5msi8z1rw X-HE-Tag: 1687883091-317511 X-HE-Meta: U2FsdGVkX186QJHnXbtOkPHZB9wb5dftHlmYUtSGF++MQtK8U8/Ssil1QpqYA61QB1MtmBGZGK0APe8MYJ+JLACiYXzZebr3oLtpJxKl/vNLA1/qg5WwTvZZYVOj9HRV10iutw21DlVhkhsMwgSxVmN7EtFxILTfxJLlnkyobo+8UMOegWw5MUIa6a9A3oZh5jcJw1IkW+jS6zP8PQa/vbvDzCHuZLaJx2f/WOJbVtPkE9ifBN2t5zLUGUbXso/coN3fo/TiTKPhV3W5puO1rexkhX6srzId6uKabxzUjUOTlV0NJ/tBXJoXmaZ3wb1pPjxaS0A8IKbTonl23d1GA5P6U+8nZLIqJQVWsIw/zyJAFUxz2qqg/p79nEEcf779Z5FU3cQA1M+93gQkV4GlYDSt3O1Q9ORuGFKG+Rhu6lPnRnoaAkA36Y0dwWZeOEUnzlCJCS5pgC5ykTSdt4Wp3lYeeNxDnVk3d3wfa7rHMQe/HbEMp9MdnfOVnYiCSXqKRUqe2Sj4qBVSHSxilSKFghXbT1h8kLnITRwaGtYTfPx3yKlBoCBR0uHyqLzks1tAWO8GwV48mqBGstdVL5tQ0kZuI46Fr4zAWUgwKmb9k78394LVzJOqE1lYSVSKQryoPmdtMJ2iElQi6Dmc5hsg1SVAbXOcC5uaMs9BpYmAWHDqMa9x7eSPbzYqS80HwFicGc0nTk3qSmW3mfcpVAppYnXxj9I91t/U73DbGqowKSxvoa47F0L4pKMeT25qLFA2qqz6SmgpbnB380iLNsCfwNnIrSudTM4Tew83He9KUSGFAGvopg81mqGtRlfoxK/EdPXRB+hHHhwycXLLaCZelQY2SF3NfZIQsnAZ2Ih2RWFmzrRPDdZ1+wk9n0y/Bkwo/sMKqDHpPNy801U3bUJGqFBOjBOReFaJGocNqJdXA9uKId7EibywXgLoy5PtX+AlWe6FQAB0csPuQB6oqCL z7euLC73 naU7iZklfkKK5XDIczmT4VstCnHXzmeigTongqJIZrov4eIQc4072zlc+gmcLwmNAfDQit8qohetKcj7WkbAzdknUqZ22iD1pBBh8mcdkG0m1NFEjDeiTxhTPtJ0buAwqhQQJI4bqaBu/GNX7aPk3p/RlvjEHawoXAmBthiUkBTWBgv0JxA3ZTTR0QVyKDUIYqZGwhgeluv6HzzYVlU/5PT9ArzJp+1iH5in9fcs3ZJgb34A4+2zu4ubLFsWv8uxBjgmAlCxNOcANoIel9EuTYP4gsDQa74Ltluflu7byU+7xHyib8bnbeL5pBUN7KcaAKMRILivA+IN6ltGqhPr0hiImRKDEOZ/tNQIQqSoqwA/GBTCIPxujKekO9GLO5/Be7mSrUoWfqRIJABr5g0aWiXUdaVUtaQWik4q2 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 27, 2023 at 09:05:32AM -0700, Suren Baghdasaryan wrote: > On Tue, Jun 27, 2023 at 8:41 AM Peter Xu wrote: > > > > On Mon, Jun 26, 2023 at 09:23:19PM -0700, Suren Baghdasaryan wrote: > > > When page fault is handled under per-VMA lock protection, all swap page > > > faults are retried with mmap_lock because folio_lock_fault (formerly > > > known as folio_lock_or_retry) had to drop and reacquire mmap_lock > > > if folio could not be immediately locked. > > > Follow the same pattern as mmap_lock to drop per-VMA lock when waiting > > > for folio in folio_lock_fault and retrying once folio is available. > > > With this obstacle removed, enable do_swap_page to operate under > > > per-VMA lock protection. Drivers implementing ops->migrate_to_ram might > > > still rely on mmap_lock, therefore we have to fall back to mmap_lock in > > > that particular case. > > > Note that the only time do_swap_page calls synchronous swap_readpage > > > is when SWP_SYNCHRONOUS_IO is set, which is only set for > > > QUEUE_FLAG_SYNCHRONOUS devices: brd, zram and nvdimms (both btt and > > > pmem). Therefore we don't sleep in this path, and there's no need to > > > drop the mmap or per-VMA lock. > > > > > > Signed-off-by: Suren Baghdasaryan > > > --- > > > mm/filemap.c | 24 ++++++++++++++++-------- > > > mm/memory.c | 21 ++++++++++++++------- > > > 2 files changed, 30 insertions(+), 15 deletions(-) > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > > index 8ad06d69895b..683f11f244cd 100644 > > > --- a/mm/filemap.c > > > +++ b/mm/filemap.c > > > @@ -1703,12 +1703,14 @@ static int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait) > > > * Return values: > > > * 0 - folio is locked. > > > * VM_FAULT_RETRY - folio is not locked. > > > - * mmap_lock has been released (mmap_read_unlock(), unless flags had both > > > - * FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT set, in > > > - * which case mmap_lock is still held. > > > + * FAULT_FLAG_LOCK_DROPPED bit in vmf flags will be set if mmap_lock or > > > > This "FAULT_FLAG_LOCK_DROPPED" should belong to that patch when introduced. > > But again I still think this flag as a whole with that patch is not needed > > and should be dropped, unless I miss something important.. > > > > > + * per-VMA lock got dropped. mmap_lock/per-VMA lock is dropped when > > > + * function fails to lock the folio, unless flags had both > > > + * FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT set, in which case > > > + * the lock is still held. > > > * > > > * If neither ALLOW_RETRY nor KILLABLE are set, will always return 0 > > > - * with the folio locked and the mmap_lock unperturbed. > > > + * with the folio locked and the mmap_lock/per-VMA lock unperturbed. > > > */ > > > vm_fault_t __folio_lock_fault(struct folio *folio, struct vm_fault *vmf) > > > { > > > @@ -1716,13 +1718,16 @@ vm_fault_t __folio_lock_fault(struct folio *folio, struct vm_fault *vmf) > > > > > > if (fault_flag_allow_retry_first(vmf->flags)) { > > > /* > > > - * CAUTION! In this case, mmap_lock is not released > > > - * even though return VM_FAULT_RETRY. > > > + * CAUTION! In this case, mmap_lock/per-VMA lock is not > > > + * released even though returning VM_FAULT_RETRY. > > > */ > > > if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT) > > > return VM_FAULT_RETRY; > > > > > > - mmap_read_unlock(mm); > > > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) > > > + vma_end_read(vmf->vma); > > > + else > > > + mmap_read_unlock(mm); > > > vmf->flags |= FAULT_FLAG_LOCK_DROPPED; > > > if (vmf->flags & FAULT_FLAG_KILLABLE) > > > folio_wait_locked_killable(folio); > > > @@ -1735,7 +1740,10 @@ vm_fault_t __folio_lock_fault(struct folio *folio, struct vm_fault *vmf) > > > > > > ret = __folio_lock_killable(folio); > > > if (ret) { > > > - mmap_read_unlock(mm); > > > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) > > > + vma_end_read(vmf->vma); > > > + else > > > + mmap_read_unlock(mm); > > > vmf->flags |= FAULT_FLAG_LOCK_DROPPED; > > > return VM_FAULT_RETRY; > > > } > > > diff --git a/mm/memory.c b/mm/memory.c > > > index 3c2acafcd7b6..5caaa4c66ea2 100644 > > > --- a/mm/memory.c > > > +++ b/mm/memory.c > > > @@ -3712,11 +3712,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > > if (!pte_unmap_same(vmf)) > > > goto out; > > > > > > - if (vmf->flags & FAULT_FLAG_VMA_LOCK) { > > > > So if with my imagination, here we'll already have the vma_read_end() and > > this patch will remove it, which makes sense. Then... > > > > > - ret = VM_FAULT_RETRY; > > > - goto out; > > > - } > > > - > > > entry = pte_to_swp_entry(vmf->orig_pte); > > > if (unlikely(non_swap_entry(entry))) { > > > if (is_migration_entry(entry)) { > > > @@ -3726,6 +3721,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > > vmf->page = pfn_swap_entry_to_page(entry); > > > ret = remove_device_exclusive_entry(vmf); > > > } else if (is_device_private_entry(entry)) { > > > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { > > > + /* > > > + * migrate_to_ram is not yet ready to operate > > > + * under VMA lock. > > > + */ > > > > ... here we probably just do vma_read_end(), then... > > > > > + ret |= VM_FAULT_RETRY; > > > + goto out; > > > + } > > > + > > > vmf->page = pfn_swap_entry_to_page(entry); > > > vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, > > > vmf->address, &vmf->ptl); > > > @@ -5089,9 +5093,12 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, > > > /* > > > * In case of VM_FAULT_RETRY or VM_FAULT_COMPLETED we might > > > * be still holding per-VMA lock to keep the vma stable as long > > > - * as possible. Drop it before returning. > > > + * as possible. In this situation vmf.flags has > > > + * FAULT_FLAG_VMA_LOCK set and FAULT_FLAG_LOCK_DROPPED unset. > > > + * Drop the lock before returning when this happens. > > > */ > > > - if (vmf.flags & FAULT_FLAG_VMA_LOCK) > > > + if ((vmf.flags & (FAULT_FLAG_VMA_LOCK | FAULT_FLAG_LOCK_DROPPED)) == > > > + FAULT_FLAG_VMA_LOCK) > > > vma_end_read(vma); > > > > This whole chunk should have been dropped altogether with my comment in > > previous patch, iiuc, and it should be no-op anyway for swap case. For the > > real "waiting for page lock during swapin" phase we should always 100% > > release the vma lock in folio_lock_or_retry() - just like mmap lock. > > Yep, we drop FAULT_FLAG_LOCK_DROPPED, release vma lock when we return > RETRY and that makes all this unnecessary. I just need to make sure we > do not access VMA after we drop its lock since we will be releasing it > now earlier than before. Yes. Hopefully that's not a problem, because mmap lock has it the same way. And that's also the good thing if we can always keep both locks behaving the same if possible, because the problem is really shared, imho. -- Peter Xu