From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87D03C6FD1C for ; Thu, 23 Mar 2023 13:31:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D99946B0072; Thu, 23 Mar 2023 09:31:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D4A7E6B0074; Thu, 23 Mar 2023 09:31:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C146F6B0075; Thu, 23 Mar 2023 09:31:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id AEB556B0072 for ; Thu, 23 Mar 2023 09:31:55 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 78FD014062B for ; Thu, 23 Mar 2023 13:31:55 +0000 (UTC) X-FDA: 80600250990.08.62752A4 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf29.hostedemail.com (Postfix) with ESMTP id 73D7B120029 for ; Thu, 23 Mar 2023 13:31:53 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=bTfU2M28; spf=pass (imf29.hostedemail.com: domain of bhe@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=bhe@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679578313; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mXzYR6c2WCVTnZ/W9wW/e0omNfMuDAlzjXCmVrC/wTo=; b=7DnFv7f6yeoPmpNXaywQg4vYEi2K8Io5AUWulR0znIJ0BmPRqRFj4rIxLz2guI5MbGBLtU r5e57ia5gE+b9gD5FjGn8GlID/xUdO+NlOhZLlJZEE0a3chizWy0CxPdeDHOzyCYt9Q6hP WyQm1u6JrhdnJWfgE0b3U2zmU2uI4sY= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=bTfU2M28; spf=pass (imf29.hostedemail.com: domain of bhe@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=bhe@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679578313; a=rsa-sha256; cv=none; b=tzjJLCGlRggoMYK+6PtJQhw1Xgf42+yMuEdFy9vAJpyqaQv//admQjPKOgGk4dF1qsiV1j 9Xn6OUX8GVIr+/eRtFGj/egvUrDd0Vq6rP38OhXyNncv8PsTsOY6qbnXbCNF4f0I1/bfHG rgYe93AGs/p2CydiJR9/lGW/IFbP7NE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679578312; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=mXzYR6c2WCVTnZ/W9wW/e0omNfMuDAlzjXCmVrC/wTo=; b=bTfU2M28PXDnMMbGtrnuchH/dPawyAs2uRNgfwGDsn+QhAqmyGKVjuQPnPubdcSaFoJDiu bFMZXnz+EvFLT3dvY0OpOLgRlaXOw9g6Q1dM/Ycb1HziOHw5DDd7nLn6pr7fyEwyW1ppUM 5o0hFNXelHrGtLGKMJH+2kMAnjdnqPM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-599-yk90ah0pPruWBLBfLw4E4g-1; Thu, 23 Mar 2023 09:31:47 -0400 X-MC-Unique: yk90ah0pPruWBLBfLw4E4g-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3EE80855304; Thu, 23 Mar 2023 13:31:46 +0000 (UTC) Received: from localhost (ovpn-12-97.pek2.redhat.com [10.72.12.97]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DFDC8C15BA0; Thu, 23 Mar 2023 13:31:44 +0000 (UTC) Date: Thu, 23 Mar 2023 21:31:40 +0800 From: Baoquan He To: David Hildenbrand Cc: Lorenzo Stoakes , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Andrew Morton , Uladzislau Rezki , Matthew Wilcox , Liu Shixin , Jiri Olsa , Jens Axboe , Alexander Viro Subject: Re: [PATCH v7 4/4] mm: vmalloc: convert vread() to vread_iter() Message-ID: References: <941f88bc5ab928e6656e1e2593b91bf0f8c81e1b.1679511146.git.lstoakes@gmail.com> <7aee68e9-6e31-925f-68bc-73557c032a42@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7aee68e9-6e31-925f-68bc-73557c032a42@redhat.com> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: qppdojawiff8mrf7tw5bqhssrd7smfnb X-Rspamd-Queue-Id: 73D7B120029 X-HE-Tag: 1679578313-199128 X-HE-Meta: U2FsdGVkX19L/Mu3ndvvCqnDczylW9DjuPjjhdluE/0HhManl1/SWGeeLu6h4wKnui4xC5Mwg4LMdLgrzLAmW0yGjI8b52CQhLENeBurkjFeVerzBKeggEPRAspIyDg65zy4Cnp0GsH7DOH5cemRaqd9L7wbGxt4ERo+a7n8FqlAMQRhmxSyEuG+KWX2UaF7KfpiJMHZpO3jrFbhhDIpzdveS49r6Kf58DO4okXQF1QwN4f7Wt0cLRWWP6YcPywcAwb9XAwPODg+ry3T53R5Kc5o2oNvlhKXXA1/x1cuPvcdqCYt0xpIY51JQcT9pnc4RLdg8bfjiswnIMxBpVWBMbh+LUKghQH9FIj8sraDWuJaaGmnZStQVhSBSOgl6w+hv9WyJtMFjz8sJotU2/i+0wMHoeEUhYlFePsRe4YLOOrr1spehrbnQyEwFii6u1tB15dTogN1PPCk0jN1UJSD95WnX+5uZuX4iwOsdJJA2urt8/0KcvtUtcwWInPF1ZZOeb7hKVXsjL8fqy1VZXoyTExXSl0txTcGXLCF5Tuo1xbVLO5yHffo1EiRkz62VKOPcBA9/BZzCD9qshSEbjkdNkV3ZhejoAx4PjFwWsGzpMe+Rk/4WsplNEes3hDD4FV2pqR1q8voGbi5AyWCdPqpkXvd/LJAEZG900hZS+pUthQZSw8/NwI/g08mtw1/7P6JlRbvZD6azmQ69hmtSCVzI2qXpBux606FycBRbsSjlHauwG1DFdP3H+OxKg32R66Cpr5MNICy+tNXE4wK4/BugFYkHR59Ab9QXmsKftz34YjhJi5oTCbxZsFKvWSdNPm1q6cbLTElvl+vQ17TcPO8Dpc17Z7QSH5jBQnOnPhAkbpkAH9efX9PXqAYMQSU8fS/mCP4zKtF9+8sFSuspmcAXwKOn5v5ujSjC4NstzwA7hbR8sF8GVqI64INdo2bW3+kKqMRJzCERXiCjjW1Pv+ BXe/sQ2s k/uDy+BpHsQBeUGi2EI5lFEu3X/F+jwIajwnlF1gST+SY9j53PnI0X/qdnp5+A5i7+b3IYbIQ8oW6XwBCZFkHMckmd0rKdMcWBPsQvdQWDHnEeM6hsn/3YAEIftWVBtEeWvKue3cp377miU02ac2iqiBhQvQXkMtJIpUm7NPudKlVq5H+hdQUla7RgwtVAQCVihtl7g+7waI95oy5NOzwzXAQtUfevcKLx6jmoM7gAmoEWF6Jhb+JmL0ZfPCtAFnRATdx+2f4xrmP6tg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 03/23/23 at 11:38am, David Hildenbrand wrote: > On 23.03.23 11:36, Baoquan He wrote: > > On 03/23/23 at 06:44am, Lorenzo Stoakes wrote: > > > On Thu, Mar 23, 2023 at 10:52:09AM +0800, Baoquan He wrote: > > > > On 03/22/23 at 06:57pm, Lorenzo Stoakes wrote: > > > > > Having previously laid the foundation for converting vread() to an iterator > > > > > function, pull the trigger and do so. > > > > > > > > > > This patch attempts to provide minimal refactoring and to reflect the > > > > > existing logic as best we can, for example we continue to zero portions of > > > > > memory not read, as before. > > > > > > > > > > Overall, there should be no functional difference other than a performance > > > > > improvement in /proc/kcore access to vmalloc regions. > > > > > > > > > > Now we have eliminated the need for a bounce buffer in read_kcore_iter(), > > > > > we dispense with it, and try to write to user memory optimistically but > > > > > with faults disabled via copy_page_to_iter_nofault(). We already have > > > > > preemption disabled by holding a spin lock. We continue faulting in until > > > > > the operation is complete. > > > > > > > > I don't understand the sentences here. In vread_iter(), the actual > > > > content reading is done in aligned_vread_iter(), otherwise we zero > > > > filling the region. In aligned_vread_iter(), we will use > > > > vmalloc_to_page() to get the mapped page and read out, otherwise zero > > > > fill. While in this patch, fault_in_iov_iter_writeable() fault in memory > > > > of iter one time and will bail out if failed. I am wondering why we > > > > continue faulting in until the operation is complete, and how that is done. > > > > > > This is refererrring to what's happening in kcore.c, not vread_iter(), > > > i.e. the looped read/faultin. > > > > > > The reason we bail out if failt_in_iov_iter_writeable() is that would > > > indicate an error had occurred. > > > > > > The whole point is to _optimistically_ try to perform the operation > > > assuming the pages are faulted in. Ultimately we fault in via > > > copy_to_user_nofault() which will either copy data or fail if the pages are > > > not faulted in (will discuss this below a bit more in response to your > > > other point). > > > > > > If this fails, then we fault in, and try again. We loop because there could > > > be some extremely unfortunate timing with a race on e.g. swapping out or > > > migrating pages between faulting in and trying to write out again. > > > > > > This is extremely unlikely, but to avoid any chance of breaking userland we > > > repeat the operation until it completes. In nearly all real-world > > > situations it'll either work immediately or loop once. > > > > Thanks a lot for these helpful details with patience. I got it now. I was > > mainly confused by the while(true) loop in KCORE_VMALLOC case of read_kcore_iter. > > > > Now is there any chance that the faulted in memory is swapped out or > > migrated again before vread_iter()? fault_in_iov_iter_writeable() will > > pin the memory? I didn't find it from code and document. Seems it only > > falults in memory. If yes, there's window between faluting in and > > copy_to_user_nofault(). > > > > See the documentation of fault_in_safe_writeable(): > > "Note that we don't pin or otherwise hold the pages referenced that we fault > in. There's no guarantee that they'll stay in memory for any duration of > time." Thanks for the info. Then swapping out/migration could happen again, so that's why while(true) loop is meaningful.