From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45694D11183 for ; Sun, 3 Nov 2024 17:11:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9AC2B6B0082; Sun, 3 Nov 2024 12:11:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 95A436B0083; Sun, 3 Nov 2024 12:11:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 822326B0085; Sun, 3 Nov 2024 12:11:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 6316B6B0082 for ; Sun, 3 Nov 2024 12:11:52 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 03DD814028D for ; Sun, 3 Nov 2024 17:11:51 +0000 (UTC) X-FDA: 82745423922.20.7F131D5 Received: from mail-lf1-f50.google.com (mail-lf1-f50.google.com [209.85.167.50]) by imf07.hostedemail.com (Postfix) with ESMTP id EA06240013 for ; Sun, 3 Nov 2024 17:11:06 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bPgWDxKE; spf=pass (imf07.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.167.50 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730653728; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QLyza2h9ma2jxK6vPQnYdRjqcPDBNwVKVafx/EoNoco=; b=Zmitq6csP5gxYaHD3Qq/9Ak06k/q25/jIbBCrUnjIGewYNbZL6S9ZA9jNtjuT8mUrjszZo O3L3W8iH+7mSQ5YiluKrxQhAbwr3t5K+XiQkMxtd8ShSvf9gsYEMIaVq2tGpVRwZud6+A9 jOa63jLP+1jay8EHxWGAOGS1YfkXstc= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bPgWDxKE; spf=pass (imf07.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.167.50 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730653728; a=rsa-sha256; cv=none; b=qSqHCf8sb7WIuP5Yl/hsAxRJ4FSsDTwjZftPRSxYnhruzskKh9hGHP3QUNdxEFOVcrCiH9 HkDxANMa63S0N7HvMhmD69hgtxNxV1Ys9JO75uu7eABU4P5ogQYEupPlbnAV1OGG1wTKUF oF1CvjSohABxCBM9Y1fVpWUqY5WUnDs= Received: by mail-lf1-f50.google.com with SMTP id 2adb3069b0e04-539ee1acb86so3392397e87.0 for ; Sun, 03 Nov 2024 09:11:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730653908; x=1731258708; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=QLyza2h9ma2jxK6vPQnYdRjqcPDBNwVKVafx/EoNoco=; b=bPgWDxKE7KiC1RyQ0CZSnNcpo+EMkgfJ/EiJs29+6+EBMN/akZN4dUpZaVi6i+WWBs 6YGxA3NRP1gUo08zksSL76pB5Z4fnrxWizezK/pmQmZZnL39AmwckWG4yT2GQ6oh8jzm HDsWQHUBPGoxi2LcybzIQvMy0ry/TNzj0FXeQddTFb6t9/2GaiJNBuQa9MhE7hoZZbx4 hS5nsBmloXOnm2A+TIu8uNg/j6XY6y2no/NgazCvdkRnIwN1eub/f9+/QJTzsgasIfoU 7q7/OxqPNzg+u0mKLCXKy3x/5c1KJFNgseO1S0uPz0jbD7iB56uCBZZN/DHzmTZ2Dj9r uRKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730653908; x=1731258708; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QLyza2h9ma2jxK6vPQnYdRjqcPDBNwVKVafx/EoNoco=; b=BVeyF3shYgyGvr7CsdU7Fz8wTX6ntOiaKzLXmuMxZL4lPowPw27nPO4YVxt1GfXa3w JNY35E/Aq8NScO4klnxSIBKySubz/smFIXog8GrQWZpJ4EEZyKY+w8SS8Sw3I2Nuh+Pf qkHEnOqiUKLbENAS5BEyf5UJQVOYMmYEF42o4UHp9u5yVku/AY1lIgEOq59lYnbOJCXN 4zTzMhi3UE5RucmxQK/uw/IaKc81cWpAXG48TuCKNUPp1XTZdR1KjndAr4lOt41AEAa9 AsXlQ/P3yJBy3obqxX+UHbf2SmColbthm0MiXHHbGGmaW9BtGrq01xdQBPExIxSOPGdg b3ZQ== X-Gm-Message-State: AOJu0YxV3Nkl+W4gATACPOTjECZZZT4zFBw2GnjqIeBVe6lCae8HkaJC KacxUKNcIc58FoDd8BT7HS8jUaypqJT9+gGgbdTQPDfVj0JC7j0YbDHxsFd6vXpVklSV036flew Gwj0If7ndTrVbB4HOIBG+F4UzkDc= X-Google-Smtp-Source: AGHT+IEJucPSnNbhGtM+DhdqiySlDMLQ5Orey1EG6ZvLT1FvMhCYpUm+ptDSQ0U361oVvI/PYg9kNNVznGzgpjmQxCI= X-Received: by 2002:a05:651c:160d:b0:2fb:3d1d:dd96 with SMTP id 38308e7fff4ca-2fedb7c8f4dmr43009011fa.20.1730653907724; Sun, 03 Nov 2024 09:11:47 -0800 (PST) MIME-Version: 1.0 References: <871q0641xd.fsf@yhuang6-desk2.ccr.corp.intel.com> <87a5es3f1f.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87a5es3f1f.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Kairui Song Date: Mon, 4 Nov 2024 01:11:30 +0800 Message-ID: Subject: Re: cluster_alloc_swap_entry() may change per-cpu cluster of another CPU randomly To: "Huang, Ying" Cc: linux-mm , Chris Li Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: EA06240013 X-Stat-Signature: pih53me4ss1rfanhy8q8hbwoznnj6ruz X-Rspam-User: X-HE-Tag: 1730653866-209137 X-HE-Meta: U2FsdGVkX1+RpidHIIK9B76HOXMHpEtO/9Mq84xM8hMs/DJGZSA2uwWEwBaLinrmdbswHMJxjFivmmjwHxkWzbSGPw80xad+JXWpQD+xXIo+or/OKNskl75scQt78q+CV+rBOVToO1AA2Dxvv6cu7DJer3dFGEZo+NXeN6bvUyASKq5mCw877xwdYI6O8g3YJI+rHAD8PKrkJ79S29QF2g0Z4F6nnMNcALcSbSVLzAeKcjULHPu3VHb8sgX7511NbbN/AKsQJRbHsZ55W26Qt4UVNtIvZiv36NNMD4ntDqvEQv22B1v0mwPOzQniLEVP/HMlR14/0nDSV8KEOBD809bFAGd1nG+HONtgZNsT34LB8NtpUaDpFsxTBdNKiwypbhPmGqQEMuUNzvHaGkClR/MwJQMR+RaaMF/JUyxlqu+mQCBjGmXv2Cf/nPzneWb7xU7nN60LSnqXUjvzRrprgi3Ui22imA9Dd6gmmCXPq88OkJWpaljazLVGGlSz56pdYqB37U9Ll6DFWkMUZiGH2REm3wz37f2XUvxeLbdkUfKPCXdE8569bzrgasUYa5/GW39kSszrrdzfFhiOKNlXmbU9Dy0WxnXALjIl/sisz8aIsLGFyRDM4T67rcKQIcZddCNSet5wBJ02nm6YQgqgN9ezPaEGMS1GXOduHlX78JFJS4E13bFU8Te/JRXO5MQh3/zlpApdnDWZkgx3o2fAMWRNt1CDX6LdRNoOBEDHpIZXpvROATHf9E4pEp3eRce5goscLFclBvvk1Lz9l/sTNGlwY5HMo/yE09k9RUFI9KzcpgTa6z2PS4c+whXKwEZ6q3t6/SC4uk6js/b2heQR/MNL7fM9b0fV9kPUmxuiB5GK+80suGl2GYNJ1d/32VLieaVQAwz4Yp+n8Jpbo52cOPZMiE0aD0bHwRW8sPsSBGEb4vsDpICDs4uiIy3m0Rkl3oFAgi2gqmQ2SAkXPES R+h72JEz GePym4BpejJQu3e9djDluwLD5LxumK9C47po0vHOtu1uQk2cj+N6iWo0vc990Jk6G+cOK+/ggw5/Dyevu1PYhYOjpw5y2ds3tWl7bp4VlVyNeJx0sKCyRWPZvmqoATHDkIj+IUcMmLwN2+9UxAvcH+Q4sMzejN2HrBvfvmHohqA0E0+xp6cMmjj5swKISdJgUhFoqx4hwt9eLBE+4ZAraCt/6z3P+25/D4nDlXyvz8Ku9/vjTWG1N64ikkI8RYSRvWusm2H+NK2jCuVdxRBO/N5RXqlX8ybF8My0ux69BoLdyhpEKNl98JjbNAdW8gSbcbQ6gwxl6XNcVoyDFG4R2aLnWsR5mvUcNj7GP1NcOvzsloyvKptKeqfDioJCgc2CMktw7NYeZQ5C+z1io0EUGlfxbJxhSbYmn/oICY2p28i3x8UlJ5zl2+4ZHwA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000003, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Ying, On Fri, Oct 25, 2024 at 1:56=E2=80=AFPM Huang, Ying = wrote: > > Kairui Song writes: > > > On Thu, Oct 24, 2024 at 11:29=E2=80=AFAM Huang, Ying wrote: > >> > >> Hi, Kairui, > >> > >> When reading new swap cluster allocation code. Sorry, I didn't find > >> time to review your patch at the first place. I found that in > >> cluster_alloc_swap_entry(), the following code path is possible, > >> > >> cluster =3D this_cpu_ptr(si->percpu_cluster); > >> offset =3D alloc_swap_scan_cluster(); > >> ... > >> spin_unlock(&ci->lock); > >> spin_unlock(&si->lock); > >> /* migrate to another cpu */ > >> spin_lock(&ci->lock); > >> spin_lock(&si->lock); > >> cluster->next[order] =3D offset; > >> > >> That is, the per cpu cluster of a CPU may be changed on another CPU. = I > >> guess that this will not cause some functionality issue. However, thi= s > >> makes code harder to be reasoned. Is it possible to avoid unlock befo= re > >> changing per cpu cluster? Or, give up updating per cpu cluster if we > >> need to unlock. > > > > Hi Ying > > > > Yes, I've noticed this when working on the new si->lock series. It may > > cause fragmentation, only if the CPU the current allocation migrates > > from requested another swap allocation, in such case it may waste one > > per cpu cluster. > > > > No functionality or leak issue, but make things harder to reason > > indeed, as you mentioned. > > > > In the new series I used a local lock to prevent migration, so the cpu > > is stable during the whole process. However that is not doable unless > > si->lock is removed from allocation path or there are lock dependency > > issue. > > > > Do we need to fix this separately in the current tree? The > > fragmentation is overall better with these patches, so I think that is > > causing any issues. Before these patches per CPU cluster can also be > > stolen by other CPUs so these were not guaranteed to be accurate. > > > > One easy way we can do is set the cluster->next[order] to 0 before > > scanning for a new cluster to use, and check if it is 0 before setting > > next[order] again. > > > > That may still cause fragmentation as migration will cause extra > > clusters to be scanned and used, which is hard to avoid without > > changing too much code (like the things being done in the new si->lock > > series). > > Find another possible more serious problem. After unlock/lock cycle, > the state of swap device may be changed (for example, swapoffed). So in > the previous code, > > si->flags +=3D SWP_SCANNING; > > if (!(si->flags & SWP_WRITEOK)) > > is used for that. I don't find they are used after unlock/lock cycle. > Can you check it? Thanks for your review. Yes, I think you are right. I did notice the flags are not updated properly in the latest mm branch, but did not notice it was a newly introduced issue. So I fixed the issue in https://lore.kernel.org/linux-mm/20241022192451.38138-8-ryncsn@gmail.com/ But a quick fix can be posted before the series. Following patch should be enough, how do you think? diff --git a/mm/swapfile.c b/mm/swapfile.c index 46bd4b1a3c07..6b7d4ac1d4a3 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1041,7 +1041,9 @@ static int cluster_alloc_swap(struct swap_info_struct= *si, VM_BUG_ON(!si->cluster_info); - while (n_ret < nr) { + si->flags +=3D SWP_SCANNING; + while (n_ret < nr && si->flags & SWP_WRITEOK) { unsigned long offset =3D cluster_alloc_swap_entry(si, order, usage); if (!offset) @@ -1049,6 +1051,8 @@ static int cluster_alloc_swap(struct swap_info_struct= *si, slots[n_ret++] =3D swp_entry(si->type, offset); } + si->flags -=3D SWP_SCANNING; + return n_ret; } The "+=3D SWP_SCANNING" looks ugly, but has to be done in such a way, just like in the original allocation code. Multiple scanners can update the flag, and SCANNING is the highest defined bit in the flag field, it's not a strictly bit flag in such cases. This will be cleaned up in a proper way in the patch I just mentioned. > > -- > Best Regards, > Huang, Ying