From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC459CF9C6B for ; Tue, 24 Sep 2024 15:48:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 417E06B00A0; Tue, 24 Sep 2024 11:48:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3A0A56B00A2; Tue, 24 Sep 2024 11:48:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24B1C6B00A3; Tue, 24 Sep 2024 11:48:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 041406B00A0 for ; Tue, 24 Sep 2024 11:48:24 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 76C6641B92 for ; Tue, 24 Sep 2024 15:48:24 +0000 (UTC) X-FDA: 82600063728.20.DFBBC16 Received: from mail-qt1-f172.google.com (mail-qt1-f172.google.com [209.85.160.172]) by imf05.hostedemail.com (Postfix) with ESMTP id B329810000A for ; Tue, 24 Sep 2024 15:48:21 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GVQOy0Dq; spf=pass (imf05.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.160.172 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727192782; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=04wmdnC8h/x5cUtDCBufySp5GXdDQG3ibVfSh0oRqk0=; b=qdS2wRyJCY+oMDrGqdYNpDc2VrmuChLirwgtbL8FMZo8n2mZXphM7Ff0UbGg4P6ZwGf7SZ RnrFwC00+RyUpQjywf9nq0o0uk54+ay8NdEK8+faeMO12MbKMxUDQAVH6OKwot55yF1Ag8 4iXmAW/o27vPSMaog+Ft8/1wO8DUrZM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727192782; a=rsa-sha256; cv=none; b=f6UuQmZIgs6oZvW3rG4t23KI4Z1seEhaRx9SdBmfV8ravd9msvtJfMbmJmALZjDY2VXkS8 K0fX9cxKKP4QKF8TJLSFUn3hO7yIGgXHqCaNGevc80boCi9dMyHbziQb5UMv9sV8acApWB ApOUINw1+UH7E3tl5slpn+lAClLPV6k= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GVQOy0Dq; spf=pass (imf05.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.160.172 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-qt1-f172.google.com with SMTP id d75a77b69052e-4585e25f42bso54770021cf.2 for ; Tue, 24 Sep 2024 08:48:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727192901; x=1727797701; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=04wmdnC8h/x5cUtDCBufySp5GXdDQG3ibVfSh0oRqk0=; b=GVQOy0DqSL2fgKKrSZarTJvYywtuGQlsNYOt9AlZtai6t1CgAcJUvg3em7UMWPpmLr SjuFmR7FgGdl4e9qhNYmR1Laf04EVMPO7vrSW/QzC2Y0O/abSF1EwZzEvblW55qyTmDG 5kEBRa/JHpSDBMig/tJkttBXZ8bThd9WucdHGRrgoqYWynrOVh4BKwWAVcABl9sevF1b bgOCvl1NwLi79DwZtDPgb6+BFCYSMnNn3FmZoC53ZXpCkW181ZXxJQy9OFFuKqixF/fJ 1OD+zYhLuJKbVkVk1pg9bN+pukU23PPgLKtLrl49b3T9S6aD+KClrUDVWliEVGvhlD/O w8Tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727192901; x=1727797701; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=04wmdnC8h/x5cUtDCBufySp5GXdDQG3ibVfSh0oRqk0=; b=GklhQXgC88hKB/TejQfXYFAGPPZS77yPWBiGd6K67IAeTJlhdkwzQk/hp1ApnmUKic pP2rPXTzT6Vtqeh2ZBhvyCCGaHgrCMeF4B/brtqaWTZwhFopOlPpjqSD/S1w/oVUe3L4 evotBvnsDS5ulyX4glG4z/eNyweJeBf3t/iwavCKDxqL8g193M0shNjUq8Vb4pQ6Du4w ffGlPOD+HfG7U73Rni39SRUbgIUeafWFPqIC9Z90Rxp01cmjoZcLqfq3H8IrLYii69e0 +6T93etca1W2sN2yLFe27bKv/9rY+B9agHY00DV+2xtM8cPHT+XfY5Rno5Dg7h2an57n u4Sw== X-Forwarded-Encrypted: i=1; AJvYcCUNqkFI9VxhndE/63bBSp1sE7KCSOLILcgalruThuxKfJm/OXa9EOpwVtROgSWh6Uy6uUTK9nigmQ==@kvack.org X-Gm-Message-State: AOJu0YxdipG+7dbZlBXkwZliKQD//+ZxRs0NxXNkSwTbfBJD5tvPRYZF EdTg089yazVQRYoPm60/8dwjMZcrGmFJUAC3EG6oC/BxP6ahMhGf9giGUOxDn9AS1CLd9AoodVC hJeyoBQ/T68wFNS4uwhBzch0sP80= X-Google-Smtp-Source: AGHT+IHJAhU8h0nLzZdcIkTMioVsHewyXYTfciPeaGEmHAvNWxX1jPN0ETXQY9ayQlnsSIrq7KmpomE/DlvGRPrXyRs= X-Received: by 2002:a05:6214:3d87:b0:6c4:6f3e:d955 with SMTP id 6a1803df08f44-6c7bc6ce13cmr291090366d6.10.1727192900615; Tue, 24 Sep 2024 08:48:20 -0700 (PDT) MIME-Version: 1.0 References: <20240923231142.4155415-1-nphamcs@gmail.com> <4d38c65d-760c-43a5-bb47-8e0235c13a51@linux.alibaba.com> <9a110f20-42ad-468b-96c6-683e162452a9@linux.alibaba.com> In-Reply-To: From: Nhat Pham Date: Tue, 24 Sep 2024 08:48:09 -0700 Message-ID: Subject: Re: [RFC PATCH 0/2] remove SWAP_MAP_SHMEM To: Yosry Ahmed Cc: Baolin Wang , akpm@linux-foundation.org, hannes@cmpxchg.org, hughd@google.com, shakeel.butt@linux.dev, ryan.roberts@arm.com, ying.huang@intel.com, chrisl@kernel.org, david@redhat.com, kasong@tencent.com, willy@infradead.org, viro@zeniv.linux.org.uk, baohua@kernel.org, chengming.zhou@linux.dev, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: wk174tpwcmdtohrxuyi5d1d967fqxy69 X-Rspamd-Queue-Id: B329810000A X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1727192901-919596 X-HE-Meta: U2FsdGVkX1+00+LHTy6oxzPPGim5v/WALM0GjO1/DO/78bkDyZIQ3AeqdlfQQXk2Vh1asLTXdu/uWIklBPs3dLM3STepv1YGeBxcWD293q9cCm71qJWk7aoEQp0g/7nfCwP6khv1tr4D9LNAGJSbHZYOn55VhNcyJyjEA7hrs0M5/OhVce0fa2Tjk0YgwknEdr+rfOxDYbgIkHsb2xiulAtrC/R7S4X37wun0oHcbAU3Vb0jh3uV5RRvNt0n49tvk6UHYaEz3K/i2AFLsw3moXr4A4MQTXfp4a5Yj1t4q282Fa63nU2osFXDx1DA/hE9kmeS413+xVbG/GpbwI/7VjH6B07oJNlZiT9rKa56Ebk7kWUSOSeibEvgUYluT6h8psKVtPDukVoY7kRcHhDzm1DDMJh/7EVkBe9ubatk00oQYKou416Oa/rbz7ZxK9gc7cH/ZLtG/gG4I2emmI6rhMz7JtfdRYjoN3FKus2QYUsYa/Q0tqojIphYeL+XlGZFTOeLXZRN4VAl73KE5R6WKHUkvzKWZuYrBpPaSvit1OgT/arSjkjO4bwQSlxud64c3rlbuotiIY+YU82eXQI+eSiyETKo9iVMLN8l+FgThaVAL9GwDOxur9qbhgJesjMgqiWBj8UYABM2oodrAPFlxLGG3owMtapri5FS395AiNg8G5plv+q7UuRL0/WozRQRUtNn8ZXpAmczE32lwf+2uMQtY2dXikRbTNU9PXmQ42TWaO8GDSnk8edbu30iWcMQLaFu2N7cUyFJ2PEpIVSPljY35z9+qnWU+ECUBW3r++sOmuMYSnh3onCFSl7Df6LZZNPS47sMeN5hItQNp5w0VofsJmkg0Jg039FGhVjo39FKK47VD5iYZqLF2Yp8MMyVwyb1cnMEa2IDZ/fAw44cQsO7gzIZlg1Zc2acY6+8Uf8GBREhP/gYqkcFlfddIN9QIB2cNZaalmsxfvlCeXY 9PC5Tt8M kG7OY37sfC4s+OGFwtIrIrFwhUgP0AmAv2e5Z2ikga4xcMi4D82qaW8aqS4FmEbwbZJKa1Pee/lT6swRDJdZTG7avqG5EvApRtHuveaxUENAxgL4Ri2dCkBJSGXuSQnaKcG0iwVZOvU4p/1HPz4vEimPdoXthRKuYPVDyYQZXXnGF3qGOje3mVmJwaRI3pY6NCpWxgQehRtELRusnCEIycvONBs5l1zUVeAU0Uk3fwPa6x4yNdfukn3IyucG1pn87/Uc91k2ViOwGIs9jj3FodYfA7egHn2/opeeklQWjdc0tHQYjWqdNu6juPlIO4BullMRqUMPQiqcDUKwBnr11MzlHjarwnRWbBrmwcJaHLrkdjV5foYebAIcPE/aNNvRUEsW2AkZZz197HzWfuQ1GTnY+tCDhVTjfGNDXxTEwxIOQyJv75wwG/FQ3VA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Sep 24, 2024 at 8:08=E2=80=AFAM Yosry Ahmed = wrote: > > On Tue, Sep 24, 2024 at 7:32=E2=80=AFAM Nhat Pham wro= te: > > > > On Mon, Sep 23, 2024 at 8:25=E2=80=AFPM Baolin Wang > > wrote: > > > > > > > > > On 2024/9/24 10:15, Yosry Ahmed wrote: > > > > On Mon, Sep 23, 2024 at 6:55=E2=80=AFPM Baolin Wang > > > > wrote: > > > >> > > > >> > > > >> > > > >> On 2024/9/24 07:11, Nhat Pham wrote: > > > >>> The SWAP_MAP_SHMEM state was originally introduced in the commit > > > >>> aaa468653b4a ("swap_info: note SWAP_MAP_SHMEM"), to quickly deter= mine if a > > > >>> swap entry belongs to shmem during swapoff. > > > >>> > > > >>> However, swapoff has since been rewritten drastically in the comm= it > > > >>> b56a2d8af914 ("mm: rid swapoff of quadratic complexity"). Now > > > >>> having swap count =3D=3D SWAP_MAP_SHMEM value is basically the sa= me as having > > > >>> swap count =3D=3D 1, and swap_shmem_alloc() behaves analogously t= o > > > >>> swap_duplicate() > > > >>> > > > >>> This RFC proposes the removal of this state and the associated he= lper to > > > >>> simplify the state machine (both mentally and code-wise). We will= also > > > >>> have an extra state/special value that can be repurposed (for swa= p entries > > > >>> that never gets re-duplicated). > > > >>> > > > >>> Another motivation (albeit a bit premature at the moment) is the = new swap > > > >>> abstraction I am currently working on, that would allow for swap/= zswap > > > >>> decoupling, swapoff optimization, etc. The fewer states and swap = API > > > >>> functions there are, the simpler the conversion will be. > > > >>> > > > >>> I am sending this series first as an RFC, just in case I missed s= omething > > > >>> or misunderstood this state, or if someone has a swap optimizatio= n in mind > > > >>> for shmem that would require this special state. > > > >> > > > >> The idea makes sense to me. I did a quick test with shmem mTHP, an= d > > > >> encountered the following warning which is triggered by > > > >> 'VM_WARN_ON(usage =3D=3D 1 && nr > 1)' in __swap_duplicate(). > > > > > > > > Apparently __swap_duplicate() does not currently handle increasing = the > > > > swap count for multiple swap entries by 1 (i.e. usage =3D=3D 1) bec= ause it > > > > does not handle rolling back count increases when > > > > swap_count_continued() fails. > > > > > > > > I guess this voids my Reviewed-by until we sort this out. Technical= ly > > > > swap_count_continued() won't ever be called for shmem because we on= ly > > > > ever increment the count by 1, but there is no way to know this in > > > > __swap_duplicate() without SWAP_HAS_SHMEM. > > > > Ah this is my bad. I compiled with CONFIG_THP_SWAP, but forgot to > > remove the swapfile check (that's another can of worms, but I need > > data before submitting the patch to remove it...) > > > > One thing we can do is instead of warning here, we can handle it in > > the for loop check, where we have access to count - that's the point > > of having that for-loop check anyway? :) > > > > There's a couple of ways to go about it: > > > > 1. VM_WARN_ON(usage =3D=3D 1 && nr > 1 && count !=3D 0 ); > > Hmm that should work, although it's a bit complicated tbh. > > > (or more accurately, (count & ~COUNT_CONTINUED) >=3D SWAP_MAP_MAX)) > > I think this will make the warning very hard to hit if there's a > misuse of __swap_duplicate(). It will only be hit when an entry needs > count continuation, which I am not sure is very common. If there's a > bug, the warning will potentially catch it too late, if ever. > > The side effect here is failing to decrement the swap count of some > swap entries which will lead to them never being freed, essentially > leaking swap capacity slowly over time. I am not sure if there are > more detrimental effects. > > > > > 2. Alternatively, instead of warning here, we can simply return > > -ENOMEM. Then, at shmem callsite, have a VM_WARN_ON/VM_BUG_ON(), since > > this MUST succeed. > > We still fail to rollback incremented counts though when we return > -ENOMEM, right? Maybe I didn't get what you mean. My understanding now is that there are two for loops. One for loop that checks the entry's states, and one for loop that does the actual incrementing work (or state modification). We can check in the first for loop, if it is safe to proceed: if (!count && !has_cache) { err =3D -ENOENT; } else if (usage =3D=3D SWAP_HAS_CACHE) { if (has_cache) err =3D -EEXIST; } else if ((count & ~COUNT_CONTINUED) > SWAP_MAP_MAX) { err =3D -EINVAL; } else if (usage =3D=3D 1 && nr > 1 && (count & ~COUNT_CONTINUED) >=3D SWAP_MAP_MAX)) { /* the batched variants currently do not support rollback */ err =3D -ENOMEM; } At this point, IIUC, we have not done any incrementing, so no rollback needed? :) > > > > > Either solutions should follow with careful documentation to make it > > clear the expectation/guarantee of the new API. > > > > Yosry, Baolin, how do you two feel about this? Would something like > > this work? I need to test it first, but let me know if I'm missing > > something. > > > > If this does not work, we can do what Baolin is suggesting, and > > perhaps maintain the swap_shmem_alloc() helper. It's less than ideal, > > but at least we still lose a state... > > Depending on the complexity tbh, right now removing SWAP_MAP_SHMEM is > just a cleanup with small wins, so if it's too complicated to remove > it it may not be worth it. I am assuming with your ongoing work, it > becomes much more valuable, so maybe if it's too complicated we can > defer it until the benefits are realizable? I agree :)