From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE7ADC5320E for ; Sun, 25 Aug 2024 21:55:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DCC9A8D0031; Sun, 25 Aug 2024 17:55:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DA2918D0029; Sun, 25 Aug 2024 17:55:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C43C68D0031; Sun, 25 Aug 2024 17:55:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id A35278D0029 for ; Sun, 25 Aug 2024 17:55:57 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id ECC674075A for ; Sun, 25 Aug 2024 21:55:56 +0000 (UTC) X-FDA: 82492125912.05.AB5F81B Received: from mail-pj1-f44.google.com (mail-pj1-f44.google.com [209.85.216.44]) by imf23.hostedemail.com (Postfix) with ESMTP id 20050140003 for ; Sun, 25 Aug 2024 21:55:54 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=HecVCbHL; spf=pass (imf23.hostedemail.com: domain of hughd@google.com designates 209.85.216.44 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724622870; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Pi9Fm+R9T0rGJPWvIwJycL3JZ9zC9/GPVe7x98Z67QQ=; b=FGS+siTQTzKNKpzrBCYVNn+jhnVfLYrX/sO0BLwgUGBRH87SmoKV70UbWLLUzUnTm2O3k7 TEvknsU6tfCZkqGsC7bRR30+4YQNjsT/AYoskWWEt0cn/AlbyjgtmTc476mWCZUhRG9i/6 RH1rn4yZM7swmjV2XZowWY1hxtE2+TA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724622870; a=rsa-sha256; cv=none; b=Fsw2St/SuNwVj+S8ChCn74KpbS4a+kJ4sTxoRT7OBU+wyH/gq6hEYaqNoLDCVSeY94i6QR RxGUZF6GOd1qvszeVPDTRg4zB2zdI7xaT79hzYrShfPIji7rlXziXtkuKzy4321nWEFeT3 DZEhtNgrQXf8sVOD5+aiUQOfOuPZfYA= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=HecVCbHL; spf=pass (imf23.hostedemail.com: domain of hughd@google.com designates 209.85.216.44 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-pj1-f44.google.com with SMTP id 98e67ed59e1d1-2d3d0b06a2dso3055644a91.0 for ; Sun, 25 Aug 2024 14:55:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1724622954; x=1725227754; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=Pi9Fm+R9T0rGJPWvIwJycL3JZ9zC9/GPVe7x98Z67QQ=; b=HecVCbHLed8hfM0PODUNtJjmKhdTtD9bno2FlqcGqWN2YT2TAhZQFrzb0gRJj0rwAD lA5gVdXqzgxQHlO5kRI84mcM1pfTiDIniBm+vyMOVAqtQgS7PUEHs/EMk3yhoXecIBmZ Bzi8XkTUFD4w7uwnM5gsvFcObOYqfpJ5H2vmmmP2Omp5Gs7Lj98j9JrOl9fuifR8eVBG rujEELBgpG/v8hlvo0a3597a0khGPomh1UJN50QuIyopCQj+UMJ/xrJrnQCCA4g46JZ4 g3nC79+SyLMs8MS+UP6LvWB4J6qnOHPeRaWmqCUKPpIszj6nzaUAXeCYk6KuRXVoYrsf NODg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724622954; x=1725227754; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Pi9Fm+R9T0rGJPWvIwJycL3JZ9zC9/GPVe7x98Z67QQ=; b=OQZhiRmC+VXa4jsIPS4HGtevjmK5lvR5C+pnFNs2S8PmWFjDLrwtmqtJz0YOZ5llYe TH6WaSPryUBCJt4WKRlOceGZKnbP9QfWdaesDG2MY3W3B3AfAuEYUHpKrjpk9YLhdTtH gK6rbNQsvNrafCvtJRBK3kIlcEYjeceYiyBc1cK8WrGaVO13mUF7ZiVmeDfVaFfKp5DY B91Tu8RxTaPZ28gY/BvmvpSvu5wT+jNZQNaaDSGs3S1+V8nj+/noCche9EUo4noTdc6T LE80ZBp0FDQFUSAtgIyvWXlGTKS2l/kCyzkbBIcM9p0RGXwfpyTiuS0lPFJTzUY3P9yK kDFg== X-Forwarded-Encrypted: i=1; AJvYcCXDtRAVQ0rHOhQF6ojVLAR8TahYwm3rq9isxI5eAIShQHI1Amr+ai8ZpY/+oMXvngYd6g3+UuVi0g==@kvack.org X-Gm-Message-State: AOJu0YwuHBoxiUmjJfnR+WQSLXHR8ZkimbrG8nKJz8gHAmDgHShje71V gpqnhiwr1r+mu9XwO1K/fe/bDXHWqfRZnZqa+eiuUbkTx7ta2ALEvpzg/M6Few== X-Google-Smtp-Source: AGHT+IGrjAP6qyQUpi1/Y6P33R7/e0VuJ5Re8JjRgCmxOA5sj6urQ2h5cv+QC+HFqRz9wA0hKA8bRg== X-Received: by 2002:a17:90a:6287:b0:2d3:cd27:c480 with SMTP id 98e67ed59e1d1-2d646d247f9mr9000335a91.33.1724622953304; Sun, 25 Aug 2024 14:55:53 -0700 (PDT) Received: from darker.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2d613a58fdbsm8352099a91.29.2024.08.25.14.55.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 25 Aug 2024 14:55:52 -0700 (PDT) Date: Sun, 25 Aug 2024 14:55:41 -0700 (PDT) From: Hugh Dickins To: Baolin Wang cc: akpm@linux-foundation.org, hughd@google.com, willy@infradead.org, david@redhat.com, wangkefeng.wang@huawei.com, chrisl@kernel.org, ying.huang@intel.com, 21cnbao@gmail.com, ryan.roberts@arm.com, shy828301@gmail.com, ziy@nvidia.com, ioworker0@gmail.com, da.gomez@samsung.com, p.raghav@samsung.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v5 4/9] mm: filemap: use xa_get_order() to get the swap entry order In-Reply-To: <6876d55145c1cc80e79df7884aa3a62e397b101d.1723434324.git.baolin.wang@linux.alibaba.com> Message-ID: References: <6876d55145c1cc80e79df7884aa3a62e397b101d.1723434324.git.baolin.wang@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Stat-Signature: ntfdab3iaf6gff9tpi8hiqneiwqsjibg X-Rspamd-Queue-Id: 20050140003 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1724622954-141663 X-HE-Meta: U2FsdGVkX19W0eiZ3TbQlv7RrNIE5Yop4hPw6uEMHiMgYEuoxP8Fsd6ndHkMSOR7NGtrcvJsvAPZAgUV/uy9J+tDLQOuxBaEx52J4QqVxmmMbc3Fq3ej0q+VsKmgcKrh6gT3ialvwrFQF00yh8tMYn+Kp+IKc7E+Ek0/NFnVppN12iivr+4ScjFi7LLaP/pkw27gJoCH5DzZs1pcWqrv/Pnwbq4UZVXWbaV6iEwnPpVSKOmOemN5B2/EAVB9F+UYcMNy5l4lYDjqbtsKU4gFv9xPMerHuGgAfp2I+VdKl5IyaeF5VSSpH7bvsXzx/zk/o3oeHKnYsYGu/0oBOYKhd1ZpkEdz44M57o5Cz/87vFvQ7aqZncyspuXXRgZ3qWlgJsaSpEun2Vxje+Bu2gZyMOHtrmG+0gTmXVml5If8/TGukjAIj244L4OLDjXtK97HQTEeUeejIM0OfWhtcKhPp5f4q7DD+ShiB/pzMHlhNsCYJGaQQW/7yjN5vezbkiahAVNiUNuqfv1rA39AXSIRkbRFa7/XNvSey8OC05N8HJzfnaccvMH2GJBTaa2qgzHX6y2q9SZ/M3w1d2Y/Mpde4TQrbi5Ahthrd4dTxKlChH02e0lFAS4/zCgVB6MEWG+Ah9LKvbMzQVyOZSh8bI9Gjx3wlEFoJiei/xMReOjFRbqrVZ/fy2+xl/tpVeSzaTuoNztHScE0pZiU55baX8E8I3YLJg59sjTNHQSbvt18af2MLXbe9sRDscDIG9s86bojGVCP1bSvF9N4RoIKb2CbabnX+P2U80oooGEq8ZhCsc6/uwsIgDfN5fPBhxgs9IK8CF6PWkMeWCIeVwHv51qe0jrADn6p6DfvUyGci5Vn8c4FC49H0QVdTh2mVVsVcRu8cesoRis0BTUK79fBOFa/oct8isHbhQ6c22G4ZHeo53Me74VS5MQvLL2Ewd+TzAkAQiJXDuBlLaOmwFQQCgy EMuZaJn+ G50OHjDdy29vicUJ1bIcQPxIRQsh3Oqz496YRXRUPv1JxJuJ0U7AQ45OMBSpXljtzWpgKRqXGceNFtzhhP0fiYVHC5ipJSt46RHD1kHBWquAhfLTW/7STsDD6vj4a3I10K/rViFQp92vLa0smUeLYlYximAQHX2/i60VRDdmwA2vPdBIjcF8OrcTt7EHmegvK3hcO/ch0QZC/tLZYxkPk1fbb8l9IPhouvdeoBycB8taTjxhPtzvL3BTzB4MCZX7MUA12fAT/Hvj3KbhGt6tfIpR5vPkJwJfy8jKR8/0TIiLEVX0Y987jq4wQpKi3kymQk2ekida3YczEXRDVYI+keLDcebLUPWY1njnAIn0hO6lEJ3NeqkmcZaZ3e3C1k5nFQzBup2rNdX70HgQsOCLcMOq/eHfZIuNJJKpy X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, 12 Aug 2024, Baolin Wang wrote: > In the following patches, shmem will support the swap out of large folios, > which means the shmem mappings may contain large order swap entries, so > using xa_get_order() to get the folio order of the shmem swap entry to > update the '*start' correctly. > > Signed-off-by: Baolin Wang > --- > mm/filemap.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/mm/filemap.c b/mm/filemap.c > index 4130be74f6fd..4c312aab8b1f 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -2056,6 +2056,8 @@ unsigned find_get_entries(struct address_space *mapping, pgoff_t *start, > folio = fbatch->folios[idx]; > if (!xa_is_value(folio)) > nr = folio_nr_pages(folio); > + else > + nr = 1 << xa_get_order(&mapping->i_pages, indices[idx]); > *start = indices[idx] + nr; > } > return folio_batch_count(fbatch); > @@ -2120,6 +2122,8 @@ unsigned find_lock_entries(struct address_space *mapping, pgoff_t *start, > folio = fbatch->folios[idx]; > if (!xa_is_value(folio)) > nr = folio_nr_pages(folio); > + else > + nr = 1 << xa_get_order(&mapping->i_pages, indices[idx]); > *start = indices[idx] + nr; > } > return folio_batch_count(fbatch); > -- Here we have a problem, but I'm not suggesting a fix for it yet: I need to get other fixes out first, then turn to thinking about this - it's not easy. That code is almost always right, so it works well enough for most people not to have noticed, but there are two issues with it. The first issue is that it's assuming indices[idx] is already aligned to nr: not necessarily so. I believe it was already wrong in the folio_nr_pages() case, but the time I caught it wrong with a printk was in the swap (value) case. (I may not be stating this correctly: again more thought needed but I can't spend so long writing.) And immediately afterwards that kernel build failed with a corrupted (all 0s) .o file - I'm building on ext4 on /dev/loop0 on huge tmpfs while swapping, and happen to be using the "-o discard" option to ext4 mount. I've been chasing these failures (maybe a few minutes in, maybe half an hour) for days, then had the idea of trying without the "-o discard": whereupon these builds can be repeated successfully for many hours. IIRC ext4 discard to /dev/loop0 entails hole-punch to the tmpfs. The alignment issue can easily be corrected, but that might not help. (And those two functions would look better with the rcu_read_unlock() moved down to just before returning: but again, wouldn't really help.) The second issue is that swap is more slippery to work with than folios or pages: in the folio_nr_pages() case, that number is stable because we hold a refcount (which stops a THP from being split), and later we'll be taking folio lock too. None of that in the swap case, so (depending on how a large entry gets split) the xa_get_order() result is not reliable. Likewise for other uses of xa_get_order() in this series. There needs to be some kind of locking or retry to make the order usable, and to avoid shmem_free_swap() occasionally freeing more than it ought. I'll give it thought after. Hugh