From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0934CC3DA61 for ; Mon, 29 Jul 2024 13:19:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 83B246B007B; Mon, 29 Jul 2024 09:19:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7EAFF6B009C; Mon, 29 Jul 2024 09:19:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6D98F6B009D; Mon, 29 Jul 2024 09:19:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 507A46B009C for ; Mon, 29 Jul 2024 09:19:12 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id F3986A0347 for ; Mon, 29 Jul 2024 13:19:11 +0000 (UTC) X-FDA: 82392846102.23.1FDA11A Received: from mail-ua1-f43.google.com (mail-ua1-f43.google.com [209.85.222.43]) by imf29.hostedemail.com (Postfix) with ESMTP id 3EA04120024 for ; Mon, 29 Jul 2024 13:19:10 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dhc81j9j; spf=pass (imf29.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.43 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722259096; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6FKqt1eAYH+cXItv06HLC2cMI0VWfumC8x6ilbI+Xd4=; b=KSj19fvGXZTZxFlyHGEb3k6qcQGaoogPZVMwWv7BD/+rExVzPLDz2mNry7DaQBMf7jkc5x i+n7J1LOQ5xS8QCn86rKI3gM18fNDEtXVUcrwLur+fPQZxN8m78zb8cdOfj1oS1BwEnbJs pgw2a1LQChVSL1+AF+dY8hErkp1c1m0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722259096; a=rsa-sha256; cv=none; b=0B7Si0wP25K/orLLl98SePGCJzYGd9Qzo4Bel5ccWgMYYGa4ZKLfCxl5wTEMKHWx9AhS3+ QfS9tICqlnlp+wF7ghfW0xpTsdWz2awquXtsEoGqz6M14W9tb2iPiUwelnVW+XsZNRcx8l Vwvh21NMBC1f/S2N/1ePXj46SLuAl+c= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dhc81j9j; spf=pass (imf29.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.43 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ua1-f43.google.com with SMTP id a1e0cc1a2514c-8231d67a168so779257241.0 for ; Mon, 29 Jul 2024 06:19:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722259149; x=1722863949; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=6FKqt1eAYH+cXItv06HLC2cMI0VWfumC8x6ilbI+Xd4=; b=dhc81j9jJKr2GX2fasPYXEF9qLVen9u0erT+5o4QaozJrbIkYUDmsmIlb7TZZLougL KZ/9aZxkdED+cJerZLzWsxdRTA9lDpvEnLjWuaiS9zsExeHwgfw7imVNa4Ykx+AZp8gi i4mgKeHXVx8BeqNS/YtwqGO3pXuZg7krDQb0HJO4iep9xIH6GBrGa5VzX2D/JRfnpjh3 qfUtWRnqkZOVfDpdoC6kb1jZyLNvMeU+ZUpAMq9AjwhHxgTl6c5uLHLJcaStZL/b6RxR raVRa/ezl3jDzF6WOpO8jLURstEAWrLHGVltOUtO2LAmX+pQJ7jmW9K+1M5/m9jgKNZ3 K9Vw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722259149; x=1722863949; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6FKqt1eAYH+cXItv06HLC2cMI0VWfumC8x6ilbI+Xd4=; b=JTbqNvh5fHF8BPb3lQpwhMHnPAsrjDLkOEx2tzdlvjiC/aR9kg4lZGgsbrjr78g4/A 6fiAzcuLeC+dBSBc/nTtl1CSX66JB7ysAxrsQJqREWCkZzOQf+6XLTS/FSKVW3DsQZvO 0i8eejTKxyRoHIZUKADjBGCY5Bo3CeDsgcir0cj3V/XckqidOjMbXDpS2wdtSIJaHtig TTylVysUuQOOTS1Z/HzMGP5iFc5dkU10qlZGp9PmQlfrhLTia8zuz+86EiagbxjF2kAw 8eaN9Gh/rVv2CenmhLJBZqSaUVcMi47alkWbAoWLpg3tSYStJOtB6hCKpB/WvSJhBIAX +t4Q== X-Forwarded-Encrypted: i=1; AJvYcCU6GSS7HMJh3UtYkiG/uUaSVCve5VHoocFxlGS+JKUElJ6+KFWiQoNeB9HHm+6rfBKMZwJMQ6BEObaw2QovM2Gl/pI= X-Gm-Message-State: AOJu0YwKibSqwr1sqYL0rTqZFnm4hpT5VB2Zilg1kPLvpgBXdNkcRDkE PIj04pXq0/tdub0xMzCdA8XUbBGA5BN2VFAxspBtcbjxwvHQ0+ui2yQ3idoGbAcreSxgVu6rHc2 FtLwAJW9hAP4D19CFyPbPsODH0ZE= X-Google-Smtp-Source: AGHT+IEF18gRGBcRm4C2kzP2XjuH+DByJ08XwKjCthRsd4n2o8QZHHrtx5+4QCyH3NI1V/tSbZZ+RscfNcLpmtBHWv0= X-Received: by 2002:a05:6122:208d:b0:4f3:2b42:b4ad with SMTP id 71dfb90a1353d-4f6e69aab77mr8488375e0c.14.1722259149100; Mon, 29 Jul 2024 06:19:09 -0700 (PDT) MIME-Version: 1.0 References: <20240726094618.401593-1-21cnbao@gmail.com> <20240726094618.401593-4-21cnbao@gmail.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Tue, 30 Jul 2024 01:18:58 +1200 Message-ID: Subject: Re: [PATCH v5 3/4] mm: support large folios swapin as a whole for zRAM-like swapfile To: Matthew Wilcox Cc: Chuanhua Han , akpm@linux-foundation.org, linux-mm@kvack.org, ying.huang@intel.com, baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, hughd@google.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, mhocko@suse.com, minchan@kernel.org, nphamcs@gmail.com, ryan.roberts@arm.com, senozhatsky@chromium.org, shakeel.butt@linux.dev, shy828301@gmail.com, surenb@google.com, v-songbaohua@oppo.com, xiang@kernel.org, yosryahmed@google.com, Chuanhua Han Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: egtwwpzwe1ekgb9eribxggbqrc6m5pdy X-Rspamd-Queue-Id: 3EA04120024 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1722259150-759234 X-HE-Meta: U2FsdGVkX1+p6Emu3jl4Hk+V2i7qsVzbxRWVceXG4hsD2NNAFsuzoJg6MzGYLkKz6b8BFuxxaFedaJWLnQJaKlHsj0dIx+BjOjrH6YcYkmoePcCxQHjlMHVM6BXYTn7sfiI3IuFl0E97Yu6dgv4DCIsZLKKBeh1syaGb9hNLO8rD3NkKrI/Z+KD/Zrj9awY2WTaH2/KDNX14jODYars3QY/RbAhWYHZ1I6PhJIQTXQFiikbQZYqzZVI5TAS6zgmjRZ5oT5X5w8agCgdmBKM4gROfpDi8E63ArVJvyTi2kfT8xLQaLs5cOVvgxuQoR7VMD/gFCwzIWInkrUkhC7be9ePe4/sfw9e8hmfk8DYucpNp47Ruy3dPX/HoOxH0d8Yg/Bzr9DjY4z3i9+A+r2dW7kaeNoBEtbZfJ5fN25GlmPb1UtDaLJUk4XmNGmZt/TciBs5OcwhIbMluE7WCMrgjj/ojIsxnn4QlcggE6rBOZyUwm/CMC/JyCTngEGvfkCuVwPsUthya/+EMHtzly6UFtA71QowFdabC81kETmKBgIWHlaM+J+kPccF1PZaVVtHlP+tva6JYSQ4vKYOHefwoIZHAOO/FSKcPnO8UERCVAGRBTEpO5/QEJDvnRyCKWm6KTeHdjr/L3dgCYZYMtwIP6/5nNfanbYTg58MOoVOaRIFGeBFy5RJaPTr0e0EapOcX5dPSAPEFWjupV1xnaHgbcbhuJocCsQ6anmUDnH4s1/eAP65Ke2Z01ZO7x/aquey2xDO6HGp89fgqhRxa99SG3lChP7qHCkaBmNiGCQoagNWo3Udw3T9AHdrsXPO1SUGNfOz8upSO5GwnZ/Uq/qsAcgXE3HNcLpNxfdyt8fds6wCab5rYG91U92gXWL+ixMii4+JaOMtGfkMMGprb8LqhPj1DcTKGUT/iP4LmA0bqP2sbuQCrDXsPELwPDVpTzjpGPs/A8Rm1nhIyvavjX2v u+rsKR04 KEbJ+4cuGOc/mz1Ztk/09ZhbPyS7F4x4QsKONpWpcsPRGC4pIT/ct4KZBDYKrmpFx5Jz08CqVmjtlZHJenofXgtAhcJYEOQahtmCw2nyyPimlr9imi22KuLS8r8fauuovHtqGgpokKPyjMpHYO1WX+/cjCUmVdUDgHGtr7fgYnke27g89Wj94lnjtP8sYd0bPpBXsY8fwPHdzYfoNAfb2uNK+WbwhdhnkhuS9tX2G70+vIifDRF9jEgh1CaoL5LQG/vkUtVmAtcl/2nueR3se+dculQbk9DnnhmTKZnhsm38K0qdvsCIUHo8smRj1EVbLmFqWbEIEUYIoSFSx+k4WGKAorbZgvdLui+oMJiWGveaj0ybN9qqYf2aXcqbolyQE3mFrZMl2LvdVHRg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jul 30, 2024 at 12:55=E2=80=AFAM Matthew Wilcox wrote: > > On Mon, Jul 29, 2024 at 02:36:38PM +0800, Chuanhua Han wrote: > > Matthew Wilcox =E4=BA=8E2024=E5=B9=B47=E6=9C=8829= =E6=97=A5=E5=91=A8=E4=B8=80 11:51=E5=86=99=E9=81=93=EF=BC=9A > > > > > > On Fri, Jul 26, 2024 at 09:46:17PM +1200, Barry Song wrote: > > > > - folio =3D vma_alloc_folio(GFP_HIGHUSER_MOVABL= E, 0, > > > > - vma, vmf->address, fa= lse); > > > > + folio =3D alloc_swap_folio(vmf); > > > > page =3D &folio->page; > > > > > > This is no longer correct. You need to set 'page' to the precise pag= e > > > that is being faulted rather than the first page of the folio. It wa= s > > > fine before because it always allocated a single-page folio, but now = it > > > must use folio_page() or folio_file_page() (whichever has the correct > > > semantics for you). > > > > > > Also you need to fix your test suite to notice this bug. I suggest > > > doing that first so that you know whether you've got the calculation > > > correct. > > > > > > > > > > This is no problem now, we support large folios swapin as a whole, so > > the head page is used here instead of the page that is being faulted. > > You can also refer to the current code context, now support large > > folios swapin as a whole, and previously only support small page > > swapin is not the same. > > You have completely failed to understand the problem. Let's try it this > way: > > We take a page fault at address 0x123456789000. > If part of a 16KiB folio, that's page 1 of the folio at 0x123456788000. > If you now map page 0 of the folio at 0x123456789000, you've > given the user the wrong page! That looks like data corruption. > > The code in > if (folio_test_large(folio) && folio_test_swapcache(folio)) { > as Barry pointed out will save you -- but what if those conditions fail? > What if the mmap has been mremap()ed and the folio now crosses a PMD > boundary? mk_pte() will now be called on the wrong page. Chuanhua understood everything correctly. I think you might have missed that we have very strict checks both before allocating large folios and bef= ore mapping them for this new allocated mTHP swap-in case. to allocate a large folio, we check all alignment requirements; PTEs have aligned swap offset and all physically contiguous, that is how mTHP is swapped out. if a mTHP has been mremap() to be unaligned, we won't swap them in as mTHP. two reasons: 1. we have no way to figure out what is the start address of a previous mTHP for non-swapcache case; 2. mremap() to unaligned addresses is rare. to map a large folio, we check all PTEs are still there by double confirmin= g can_swapin_thp() is true. if PTEs have changed, this is a "goto out_nomap" case. /* allocated large folios for SWP_SYNCHRONOUS_IO */ if (folio_test_large(folio) && !folio_test_swapcache(folio)) { unsigned long nr =3D folio_nr_pages(folio); unsigned long folio_start =3D ALIGN_DOWN(vmf->address, nr * PAGE_SIZE); unsigned long idx =3D (vmf->address - folio_start) / PAGE_S= IZE; pte_t *folio_ptep =3D vmf->pte - idx; if (!can_swapin_thp(vmf, folio_ptep, nr)) goto out_nomap; page_idx =3D idx; address =3D folio_start; ptep =3D folio_ptep; goto check_folio; } Thanks Barry