From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB661C77B72 for ; Fri, 14 Apr 2023 20:32:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 21D60900002; Fri, 14 Apr 2023 16:32:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1CDF46B0075; Fri, 14 Apr 2023 16:32:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 06F3E900002; Fri, 14 Apr 2023 16:32:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id E8C436B0072 for ; Fri, 14 Apr 2023 16:32:20 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id AA35B12042C for ; Fri, 14 Apr 2023 20:32:20 +0000 (UTC) X-FDA: 80681144040.12.7AE6D7F Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf16.hostedemail.com (Postfix) with ESMTP id 10B39180028 for ; Fri, 14 Apr 2023 20:32:17 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=F61Xsi7h; spf=none (imf16.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681504338; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zkJ52LKFN3KW2VRgFxWenolm+nSjIO2xurI61vRkfQg=; b=7fMinUJwveh/q5Cxp6ReYDQ6D4NqKTafIIcR4Vo/kRJD69VuKT9e5VK0AkoWmgw09crXy7 xv7ekcCWNXh4+4dfSEhq7eNsrqWeGX09ttUgJL0bfECXyiiV7RPdZ+T1Blb7ICh1FEoKS9 zIuHDkknjErbQrQVB1b+39yqGts22Zo= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=F61Xsi7h; spf=none (imf16.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681504338; a=rsa-sha256; cv=none; b=AyvDsIonaKVBHLawO0Vq/3TA9ic8lKUEXEah/uG+w9/EHW7v9oIlBGOe6KRZsvsUiillg6 Whg0FY8zkb5MEVh/R8fL4aCPBUu0wu2i1zCI2emJqGnE0upbzQNJflzBktwCaBkTyiMgCB 8UfZc+15cQnJ60W0JPzwNLCQ+M8PP2s= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=zkJ52LKFN3KW2VRgFxWenolm+nSjIO2xurI61vRkfQg=; b=F61Xsi7hJg8p79oBCG/Ya4KZTs NYyItO8lVqFUsyrFHUeHuSURJ8s59rMlIiWS7BsvUy4qB3i5uzGwW+rdiuBav36hIRnTg46zEearw Xnseesrp3Lz6ju4rdZfBg9QAqsDrYMYtutNLh339JNW6vkxUIa4JL2Cbc/iTetC3zgnKGnAH1MCIg AiM6IMqSgf0hcUWq/q1jH4TL5Q7NyoJixXwfUwfFjE1D9amcThVPP3aqN4qX/zKxhYnHJ93qN/xTW cqlx53Uk6XjvJNHnU7VSi9nk/bQuOX061GIoEoD2sgE2fW8toStzRjxFgZ6LUsEpK1+Jjy9Nh4AP+ cmvyNvcw==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pnQ5P-0093og-C7; Fri, 14 Apr 2023 20:31:59 +0000 Date: Fri, 14 Apr 2023 21:31:59 +0100 From: Matthew Wilcox To: Suren Baghdasaryan Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@suse.com, josef@toxicpanda.com, jack@suse.cz, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, michel@lespinasse.org, liam.howlett@oracle.com, jglisse@google.com, vbabka@suse.cz, minchan@google.com, dave@stgolabs.net, punit.agrawal@bytedance.com, lstoakes@gmail.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com Subject: Re: [PATCH 1/1] mm: handle swap page faults if the faulting page can be locked Message-ID: References: <20230414180043.1839745-1-surenb@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 10B39180028 X-Rspam-User: X-Stat-Signature: yjpzhig86pkqicz7cjh4nqukopbxu4oi X-HE-Tag: 1681504337-102851 X-HE-Meta: U2FsdGVkX1/yZM+XdaE2u/N/n4c/toOiCw851gi8oSWRaVbC/YYfMTOE9v4wANUNfz26/rhFwryc6LsmifabOu1XIi0Cm7lOdmYjR3zg6ajiFcF34sTBZJiK3M+Tue3xWohwqCUsuzz8nFWmQlhnRuoZnARexqrxz3RIjOAVV07AcmLYbinvsAGBOVGXALn/HARMHenQTrpV+R1i9G/o0CCSUYyykjA48RKP8c84z47S1RIg5c+9m2M006kORm8lZ5Qibm86ZRNE6kH9KNgQQnEGadGREcI0NtNR+qdEVgEPDWEllFSMbzdfQB0V884yMdsHw99GxIrSkPE3HzaQkBq6Aj32jp9yOPAcmVNnBr9ESi4dojlke+D/t5W36egXFbw32Ifg19EiM80N7DWH2e5Z7cdlxqNQgbyweWnzJE48A2QJThc9gcMoGtqBg6F8LH/gQ6nkxrfgSrRmAkzJX9Z6tBb4mX6bzNgiYmwMTHkBG4yWlC5oI2xKy/EoiyGJXhK0u5VJxxMjfI0fRxsat55dzVGbGPbKaU68axYsAYmDIw3e9gU4IKK6ESXv03bO9OO4i+s5MR/Fgoey4AB2klpWxm7NOoyJnYnMQFwqKMLjNI5P+Ig6NTMt0jRs68Yh/ZjzvqpIbDS0/pKHpj8+1oY77+VaxJbB/bxgYfk/0xr0U0WLXX/mVE12fuwyRAhmSufr88tsZSgIrn6R/tjeuNyeU9bEhxJ9I3RvHtzvz8Ir/wXv1rMelu3Ay2iu361uuPVFA4GoNnsVvmEh0ggVGdbdhZ4DUexEIkutkqPeuqSD2T0A879L7oZ//H9yON9uzAL4R2bL9rTswloVhVT6A6MVggLi/A7arqETMO3kjN0qSMfSXZQcAiKH+MQk35ZFtFRD1nh/bNNeuRXvSN8pFAfoQhM+0UDMaBVzObe4gWyS3da0QhC09Lo0dQ8Z7L23XbygigsKWIexprJOuhi q8/zSyJl J6b4hocqD+YKTGzWrJvIxRXDYKp/8Flini1B5c7Qrou4BT+ei7vUUWaHL/fdSiplML0W+aAoxrZCR8ZSgjjMVVSt1g7iGFa7jXy++2yekmMOqu2XzK4AlTVcgQSkOrKxRErVZjNR8br15JakEZ7zoPWZ/IePMnwioFiCi X-Bogosity: Ham, tests=bogofilter, spamicity=0.000004, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Apr 14, 2023 at 12:48:54PM -0700, Suren Baghdasaryan wrote: > > - We can call migration_entry_wait(). This will wait for PG_locked to > > become clear (in migration_entry_wait_on_locked()). As previously > > discussed offline, I think this is safe to do while holding the VMA > > locked. Just to be clear, this particular use of PG_locked is not during I/O, it's during page migration. This is a few orders of magnitude different. > > - We can call swap_readpage() if we allocate a new folio. I haven't > > traced through all this code to tell if it's OK. ... whereas this will wait for I/O. If we decide that's not OK, we'll need to test for FAULT_FLAG_VMA_LOCK and bail out of this path. > > So ... I believe this is all OK, but we're definitely now willing to > > wait for I/O from the swap device while holding the VMA lock when we > > weren't before. And maybe we should make a bigger deal of it in the > > changelog. > > > > And maybe we shouldn't just be failing the folio_lock_or_retry(), > > maybe we should be waiting for the folio lock with the VMA locked. > > Wouldn't that cause holding the VMA lock for the duration of swap I/O > (something you said we want to avoid in the previous paragraph) and > effectively undo d065bd810b6d ("mm: retry page fault when blocking on > disk transfer") for VMA locks? I'm not certain we want to avoid holding the VMA lock for the duration of an I/O. Here's how I understand the rationale for avoiding holding the mmap_lock while we perform I/O (before the existence of the VMA lock): - If everybody is doing page faults, there is no specific problem; we all hold the lock for read and multiple page faults can be handled in parallel. - As soon as one thread attempts to manipulate the tree (eg calls mmap()), all new readers must wait (as the rwsem is fair), and the writer must wait for all existing readers to finish. That's potentially milliseconds for an I/O during which time all page faults stop. Now we have the per-VMA lock, faults which can be handled without taking the mmap_lock can still be satisfied, as long as that VMA is not being modified. It is rare for a real application to take a page fault on a VMA which is being modified. So modifications to the tree will generally not take VMA locks on VMAs which are currently handling faults, and new faults will generally not find a VMA which is write-locked. When we find a locked folio (presumably for I/O, although folios are locked for other reasons), if we fall back to taking the mmap_lock for read, we increase contention on the mmap_lock and make the page fault wait on any mmap() operation. If we simply sleep waiting for the I/O, we make any mmap() operation _which touches this VMA_ wait for the I/O to complete. But I think that's OK, because new page faults can continue to be serviced ... as long as they don't need to take the mmap_lock. So ... I think what we _really_ want here is ... +++ b/mm/filemap.c @@ -1690,7 +1690,8 @@ static int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait) bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm, unsigned int flags) { - if (fault_flag_allow_retry_first(flags)) { + if (!(flags & FAULT_FLAG_VMA_LOCK) && + fault_flag_allow_retry_first(flags)) { /* * CAUTION! In this case, mmap_lock is not released * even though return 0. @@ -1710,7 +1711,8 @@ bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm, ret = __folio_lock_killable(folio); if (ret) { - mmap_read_unlock(mm); + if (!(flags & FAULT_FLAG_VMA_LOCK)) + mmap_read_unlock(mm); return false; } } else {