From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3C5EC6FD1C for ; Thu, 23 Mar 2023 06:44:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 309DF6B0072; Thu, 23 Mar 2023 02:44:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2B9216B0074; Thu, 23 Mar 2023 02:44:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1819F6B0075; Thu, 23 Mar 2023 02:44:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 08CA76B0072 for ; Thu, 23 Mar 2023 02:44:16 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id AF98B14056D for ; Thu, 23 Mar 2023 06:44:15 +0000 (UTC) X-FDA: 80599223670.01.FCE77B4 Received: from mail-wr1-f47.google.com (mail-wr1-f47.google.com [209.85.221.47]) by imf07.hostedemail.com (Postfix) with ESMTP id BB12C4000B for ; Thu, 23 Mar 2023 06:44:13 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=GhmjoBqZ; spf=pass (imf07.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.221.47 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679553853; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XMt6IiIj1g/DqQnbQSAvbNGtbOJm5tLZxMcMsHALdIk=; b=ftODavTriFYbSB8tguk2PW7DacuhuA83ZRguawnhRt5XpLaNJ2NJKvFAKjEmHobLu/Tano j4G+hsKQEEcnNiCt8LX6ZGcvSUI1gIfGs75Y6X+nyiUP9y8corNYbU1rYcjryKTg6+2lRH 4I6zEhhZTEiTru8uYgYU1BMYjgiXLFU= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=GhmjoBqZ; spf=pass (imf07.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.221.47 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679553853; a=rsa-sha256; cv=none; b=Va7aYepUp+14lMtfXFBH+d8UFSUKtKVf4pnLB2mq0++5HGvlTSaqDsLgfJVLYWFu4SjTEa 5fB2lfYyh//5M1m04Bo84lHxHe1XFB7GDUTC0tfhdLJY/eOj4tFPQYZEvyUtCYj13i2RgC /qqTfqnzFam7kGK3GK4ItxNwXwm9kiE= Received: by mail-wr1-f47.google.com with SMTP id l12so19298574wrm.10 for ; Wed, 22 Mar 2023 23:44:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679553852; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=XMt6IiIj1g/DqQnbQSAvbNGtbOJm5tLZxMcMsHALdIk=; b=GhmjoBqZnTla9JBkjF1UZtu8wu6QjffqlHVeeBSYX2nsCx2EjNUPd9PoBLsyD6Fvay PUDva7C4NGKsmwEbwBCxAOGd9EL87aiHG+kQudl9u9UlFpmv9e/4RjhfDSdSVwnwvcBm baBY0/IoZRR49XX2VcDjY1RfhNbEJkmSxuN/ffwm0Mm67KsM6hDo5BFwoP77jL1yua9J 37n9J27adgwFwke7WNhN8IzlQlAPD55U+bklIM3krb05IsmeD8vkf20E9UygrsGyoZ9R Tc00rBGRQW6lUANqYbvaQ+qdXnh/WeUiS1MpV6MziUy55YOmt4caiqeTmvENycbM24So ACTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679553852; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=XMt6IiIj1g/DqQnbQSAvbNGtbOJm5tLZxMcMsHALdIk=; b=V0LZOxo2jhheMLiSKvcyBgcvaiS5l2LKR96HuiwN3qq7pe2TQQXj1RAn/EWhrgqQHB YwKgvzbc7pbmTtSneNYu5LBZj3OQHJrGPZo3htP4dDabCntAFFbZWrHpsucoX7Q2MirT OfZBH5xmHXk4qXr7DX4Ur3c8B4n+UCoeWv5Va+gLHs3sRwpHNbZ1oqSNbSQHMsC68KhD mbGzVAy/FXzkVaP63oh2PxZY1o+0GzTN53L5ZTXs1PWCj6nqjwFxmXirMZxW2BaBG3oS Nxts8bRcsZeJgEa2qY/zGBSQh9ZVy+EGQqZ2W6+cBQchEP+0Ujitg1EOjG426SeqCiBE R8Jw== X-Gm-Message-State: AAQBX9dCrc5Icx3WpB88ust+Z0EZz9GQzJv8b7ZhFYdjWS0+XeEsJW4m 6ucOqTj//qLi0vUJ2I/O+oI= X-Google-Smtp-Source: AKy350ZGd5btCaPavQ2r8rRtJiNmG4yjfpo1Ok0NUgSUne0wsPU91oXttK7RYC1dOoDQRW6+1/L2hg== X-Received: by 2002:adf:ee84:0:b0:2ce:ae54:1592 with SMTP id b4-20020adfee84000000b002ceae541592mr1469269wro.38.1679553851918; Wed, 22 Mar 2023 23:44:11 -0700 (PDT) Received: from localhost ([2a00:23c5:dc8c:8701:1663:9a35:5a7b:1d76]) by smtp.gmail.com with ESMTPSA id w4-20020a05600c474400b003edc9a5f98asm845384wmo.44.2023.03.22.23.44.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Mar 2023 23:44:10 -0700 (PDT) Date: Thu, 23 Mar 2023 06:44:10 +0000 From: Lorenzo Stoakes To: Baoquan He Cc: David Hildenbrand , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Andrew Morton , Uladzislau Rezki , Matthew Wilcox , Liu Shixin , Jiri Olsa , Jens Axboe , Alexander Viro Subject: Re: [PATCH v7 4/4] mm: vmalloc: convert vread() to vread_iter() Message-ID: References: <941f88bc5ab928e6656e1e2593b91bf0f8c81e1b.1679511146.git.lstoakes@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: BB12C4000B X-Rspam-User: X-Stat-Signature: iu113tsxngu4g43tsfti8gripf3dxuki X-HE-Tag: 1679553853-594000 X-HE-Meta: U2FsdGVkX1+wXbYaDoebZVnfB4CEKMWRt38upO8I4pl9Yk3eWRCmNypLKecdgZE4hLYgvuThayQTPaz/ppigjNW/8GaO2k6vL7Hbv9S4zOemWl0POEOPlMM4ETQZHmHflF0xCBd1NyLctGNqmG5ZHyWjDUMGZCF5jJXAfbQ/UeTqg43Q6tXEZuJLoIFx4V6Rd4Bp1ncSEOSjI9s3VbG9r5rtHHe+MAELC+rUG+mTqfxPlvQOKPVxwOBaxGc9tzMBRKkG3CWkuJDCtAfYGEQ3k63/ncZZdNQyhizREEWfOL4G6QZYlz7RG4QuMf20XuH9wLJxyXBT/448ftcREbUa28OEkS22IJU5Us+DYx6RzfBYB5wLUT5+m0dbEAiPKo6ELpR8vSJ++HiTqtyrPnrraVm1MgLQdh8MRa5pqqkrYyyobQ5asXFElvFK7GeO97WEK6Y9nM+JRWNJaT7ibU0jVmloSKoYSAmPM/kXqHjNW+epzTV3PPhIed4hBkop51qYy6312XaK++o7RJa7En2tAtWuNw1fT5LIOMYDvdBc0NZACVCOLKcE97AGVLe5xgxaGxQSrJ7l5sIDOo0lZdg7/Mzc/c28HuY7H7ZMwNzWH/eiQU1sXrSYpMS+PiKFX2DG4D0geiMg57xRhX1QoQj/MM3rHJoexCGivnPcyLiVafGPKEOcfrh0ECNGdpY3xism2gNoScMjJksrSSQnmLR//FeFguzHk+/Ub9aFq8KOBkTd6kT3+UIAy5Ju85T8B6FMbBjsukgckst/9tvHVMjS6NJP/JQyXxztRbxMzI1w0YmjKo6A8A6lyY4gHeieiM/PA25Y4UspXjV3Z3cN/5LHqGCSVUmw7Cht7aOhrmxrg/B0Z34wgCyDyULOJaoYC99niqHYnq4XTswLCGqNK+unTWucvjoIe74PzjnxCtZzCDjbmmFzeOio8zKxb7MJjWrXhRrgdiGBcT15gFCODp1 6DzaTk7q J3Fwyv/0gJalbnKHQHn8m3U3X8vshnxDUFK2nFpTmQlL1hq98Hz60A58MeXJY+vpf/ow+NW+7hoV9fpZtEkiysdQCWbjtBSITb/CN+EVUpn98/xKmhQdIJpXUNbGpKcVg6vjU9bIJ6scJ1B7RhsWLrRUKgZoYPGt2FJDGUe/3/wDOtOcy50u29OKApSaup6SokZMhNNaz45HvJMmio9KdYqs2caRCLRFIsCGcNuSnrMB2vVXWyN71MGTuuMM3gJPdFBKohmTFZndZFxvhsFh/khUzpOPCJvYA+pRwmdEJGgmoiHmRADA1iU5hiEbPSXQ3U0H2enFHjjopbR3CeTjrFD5ntU66gY12oigIJMzMUH3xUChi99mo7SZmi9N42U+bMwYH5El2yvkY1IFTcAe4EgV/l7d6fqwFRRtDPVFyBHMvECyOEci1/NMANcwoPe7lz8j7Pct0NkcizvyHYF0o+01YUA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Mar 23, 2023 at 10:52:09AM +0800, Baoquan He wrote: > On 03/22/23 at 06:57pm, Lorenzo Stoakes wrote: > > Having previously laid the foundation for converting vread() to an iterator > > function, pull the trigger and do so. > > > > This patch attempts to provide minimal refactoring and to reflect the > > existing logic as best we can, for example we continue to zero portions of > > memory not read, as before. > > > > Overall, there should be no functional difference other than a performance > > improvement in /proc/kcore access to vmalloc regions. > > > > Now we have eliminated the need for a bounce buffer in read_kcore_iter(), > > we dispense with it, and try to write to user memory optimistically but > > with faults disabled via copy_page_to_iter_nofault(). We already have > > preemption disabled by holding a spin lock. We continue faulting in until > > the operation is complete. > > I don't understand the sentences here. In vread_iter(), the actual > content reading is done in aligned_vread_iter(), otherwise we zero > filling the region. In aligned_vread_iter(), we will use > vmalloc_to_page() to get the mapped page and read out, otherwise zero > fill. While in this patch, fault_in_iov_iter_writeable() fault in memory > of iter one time and will bail out if failed. I am wondering why we > continue faulting in until the operation is complete, and how that is done. This is refererrring to what's happening in kcore.c, not vread_iter(), i.e. the looped read/faultin. The reason we bail out if failt_in_iov_iter_writeable() is that would indicate an error had occurred. The whole point is to _optimistically_ try to perform the operation assuming the pages are faulted in. Ultimately we fault in via copy_to_user_nofault() which will either copy data or fail if the pages are not faulted in (will discuss this below a bit more in response to your other point). If this fails, then we fault in, and try again. We loop because there could be some extremely unfortunate timing with a race on e.g. swapping out or migrating pages between faulting in and trying to write out again. This is extremely unlikely, but to avoid any chance of breaking userland we repeat the operation until it completes. In nearly all real-world situations it'll either work immediately or loop once. > > If we look into the failing point in vread_iter(), it's mainly coming > from copy_page_to_iter_nofault(), e.g page_copy_sane() checking failed, > i->data_source checking failed. If these conditional checking failed, > should we continue reading again and again? And this is not related to > memory faulting in. I saw your discussion with David, but I am still a > little lost. Hope I can learn it, thanks in advance. > Actually neither of these are going to happen. page_copy_sane() checks the sanity of the _source_ pages, and the 'sanity' is defined by whether your offset and length sit within the (possibly compound) folio. Since we control this, we can arrange for it never to happen. i->data_source is checking that it's an output iterator, however we would already have checked this when writing ELF headers at the bare minimum, so we cannot reach this point with an invalid iterator. Therefore it is not possible either cause a failure. What could cause a failure, and what we are checking for, is specified in copyout_nofault() (in iov_iter.c) which we pass to the iterate_and_advance() macro. Now we have a fault-injection should_fail_usercopy() which would just trigger a redo, or copy_to_user_nofault() returning < 0 (e.g. -EFAULT). This code is confusing as this function returns the number of bytes _not copied_ rather than copied. I have tested this to be sure by the way :) Therefore the only way for a failure to occur is for memory to not be faulted in and thus the loop only triggers in this situation. If we fail to fault in pages for any reason, the whole operation aborts so this should cover all angles. > ...... > > diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c > > index 08b795fd80b4..25b44b303b35 100644 > > --- a/fs/proc/kcore.c > > +++ b/fs/proc/kcore.c > ...... > > @@ -507,13 +503,30 @@ read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter) > > > > switch (m->type) { > > case KCORE_VMALLOC: > > - vread(buf, (char *)start, tsz); > > - /* we have to zero-fill user buffer even if no read */ > > - if (copy_to_iter(buf, tsz, iter) != tsz) { > > - ret = -EFAULT; > > - goto out; > > + { > > + const char *src = (char *)start; > > + size_t read = 0, left = tsz; > > + > > + /* > > + * vmalloc uses spinlocks, so we optimistically try to > > + * read memory. If this fails, fault pages in and try > > + * again until we are done. > > + */ > > + while (true) { > > + read += vread_iter(iter, src, left); > > + if (read == tsz) > > + break; > > + > > + src += read; > > + left -= read; > > + > > + if (fault_in_iov_iter_writeable(iter, left)) { > > + ret = -EFAULT; > > + goto out; > > + } > > } > > break; > > + } > > case KCORE_USER: > > /* User page is handled prior to normal kernel page: */ > > if (copy_to_iter((char *)start, tsz, iter) != tsz) { >