From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78276C54E69 for ; Fri, 15 Mar 2024 08:34:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E8E0C80106; Fri, 15 Mar 2024 04:34:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E16D1800B4; Fri, 15 Mar 2024 04:34:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C903380106; Fri, 15 Mar 2024 04:34:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id B37BA800B4 for ; Fri, 15 Mar 2024 04:34:47 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 5EDA880ED6 for ; Fri, 15 Mar 2024 08:34:47 +0000 (UTC) X-FDA: 81898612614.28.A3993B8 Received: from mail-ed1-f45.google.com (mail-ed1-f45.google.com [209.85.208.45]) by imf23.hostedemail.com (Postfix) with ESMTP id A6235140003 for ; Fri, 15 Mar 2024 08:34:45 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=EQh8okWd; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of chuanhuahan@gmail.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=chuanhuahan@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710491685; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4dEcH2CQoMqPC9jL2XzwLkEEUQ4/4awp4zhF56g7bw4=; b=WHuS0zaXbIunizBCUIAZZD41d3NUyUXHxwJLgu1TSOr76dNHbp3snOjC4fRsmcCtY3y5lq P32Wa7O12FQ4V2buWVfOrBJ1umJi5h8ngjYQz1xDCr9pRLaCNIK8YjLPq87YK/NYFunekQ yMaFwZNu27cUbkvtpHCthEEoos5OXv0= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=EQh8okWd; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of chuanhuahan@gmail.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=chuanhuahan@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710491685; a=rsa-sha256; cv=none; b=v7Sz05TTpQPGiwesOm1/aQ6AcXd1h/jJXVeHp0u+ZkN3M4ykpwxI7f1tAqU9hE1AppB0RE AzAzZInLy5bfmepCRLibyTzJ451dxbcs6IoSz7d6wv7Hx5KzLt7Tk4qsQjyNb8idRPZjwR zBwAuGizkCggs8nnaaMFlIxR6+ABhy8= Received: by mail-ed1-f45.google.com with SMTP id 4fb4d7f45d1cf-5649c25369aso2422177a12.2 for ; Fri, 15 Mar 2024 01:34:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1710491684; x=1711096484; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=4dEcH2CQoMqPC9jL2XzwLkEEUQ4/4awp4zhF56g7bw4=; b=EQh8okWdVRCn9YQdut89uQGzkyD0LX4RlMABHNI4JN+Sq3sJbk0FN+zC+FGnYOOOE1 My/AQn2yPmNrJx5Mo/vuA0ZEG2Adjrvt1FnmFjjieFdq4weAgIPWKg7Y+gEYYDVeAcYa nVPKVTPSXm6mQswuCyObjg4NIdpxea4/p9lWj2jBWbVXujFRvbMbAQq2/zHe1YmngkB4 vtC7bXmehvJHrLqXc0wKoUmSzNaTYpNnTo0vdg4txCmMz79U7E9pOSnu/IHGB72chA2n 4mrt0YtyWuZ21RwJVo3pTdUz+aAbKKzNB7b8dmV+TTaUx34ZnpgV+6cJL/+tCsmwUiqG McEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710491684; x=1711096484; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4dEcH2CQoMqPC9jL2XzwLkEEUQ4/4awp4zhF56g7bw4=; b=AmxdXFY92XH9vh88YY05n50QHpQGehhFwa5p8nwb3poJWJLHqhWZXG/fjVbBlTV3k8 ctOHNxnF9XR/kpUQ7BBUXVEhnWS8UiVDQhWRMnPELnkdBhAZZkgKQ1LNWfYZHci/NixT fdOY3FIxc2cv1zUIhNinaYW9MyD6nrxNsdbRxFTw/XqEKxNFe3p49NV4A+yOIZLZeDtw RAoZBCTu1WMjselfXp/NYzXrL10Od3bPwYOCYfkWAskcNTLDZAca7O73/Vccu24YewdE mQSVGRBtAow6dY/rhHvsdq3JUrXwVFOevNKJueyWTmmeTBxEdgCEQ1Nwe7KAT8OllrPj 1wPg== X-Forwarded-Encrypted: i=1; AJvYcCUp5n/kSfmIc7Zn2sPDliIf7kEqwuQ0Muc0NxSpL6S1ihk+Mb6gCX26CcN8qXSqGN/yN8B3C+fP7yWtda6TZ122dlQ= X-Gm-Message-State: AOJu0YxVU7Rq91VdMaAEWIXAuIVGd7eo5eti2Y/nJumPnOn5piDFbOUl a331R0/BpXcQyteeMTyxGAgmmiqCU+G0ztIcDLdW9OoMytk3yDG/vyUTi3bBzE9heAUUHcuFxgS +zqyaF1+lqu2c77xoH9udEiQS8aw= X-Google-Smtp-Source: AGHT+IEXjSdkvjtxWYQcLRf8qaSUdAMJ6mc7CJ5VQN66uNb72ne3E0QX85V2VErvE87IpbMsOezEdKKt7Nr1SCWGQis= X-Received: by 2002:a05:6402:3707:b0:567:504e:e7b0 with SMTP id ek7-20020a056402370700b00567504ee7b0mr2585995edb.3.1710491683938; Fri, 15 Mar 2024 01:34:43 -0700 (PDT) MIME-Version: 1.0 References: <20240304081348.197341-1-21cnbao@gmail.com> <20240304081348.197341-3-21cnbao@gmail.com> <499a60c6-eeb8-4bbd-8563-9717c0d2e43d@arm.com> <716bb29c-d2a2-4eef-b300-b037f08f458f@arm.com> In-Reply-To: <716bb29c-d2a2-4eef-b300-b037f08f458f@arm.com> From: Chuanhua Han Date: Fri, 15 Mar 2024 16:34:31 +0800 Message-ID: Subject: Re: [RFC PATCH v3 2/5] mm: swap: introduce swap_nr_free() for batched swap_free() To: Ryan Roberts Cc: Barry Song <21cnbao@gmail.com>, akpm@linux-foundation.org, linux-mm@kvack.org, chengming.zhou@linux.dev, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, mhocko@suse.com, nphamcs@gmail.com, shy828301@gmail.com, steven.price@arm.com, surenb@google.com, wangkefeng.wang@huawei.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yosryahmed@google.com, yuzhao@google.com, Chuanhua Han , Barry Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: A6235140003 X-Stat-Signature: gendhzbo7jfrizunu9j31ioydd1fd6y8 X-HE-Tag: 1710491685-243648 X-HE-Meta: U2FsdGVkX1+jJbsSC/7mgsNz7wj/YV6qqOMCCutPeZP0p3Gw9Dm7/XtskPgVv02Mvo4qar+PokipU3ynH/FBZdCUvHM3obzFJIhCcdtvvZLqZ3LUTm0nvVCPN7tmmfKL1hpTDcKIox7K/F5C/5igbGKXccINL/qJlbQ0VTAbCQhHW8uvTIDbUqZtqUpL2Sm/JlTYZJ3W9hgrvn+RWpdLRbGxa1y/Ras8tUlobvWLdWAJEuywHrOh7yT5ClLzfmyjRUkkatq/G4l0aAnY7ex8RpYS1SOVjSh6Ej0SFOZzqp429IJdHplnm5SVf4v8gDRPt3ZcXJoCDKKb/11IBqheCigHJVvtMBz6sURP1SQDwFQM2kvvOYcNToLMXU9kBh1wHK66FqmXXMihijWiuD+sp2IJgGGhdi+SaNvb2PBITViZI2Y7QQb1v4e+7xwAy9fBN5PQroBD79yDlirjydK0bt0cGscGVqp36gBm6ivAloqNPR14QIDd32jhC7gqyF+Uq5M+Fc3K/kKDhZRK73l8eERuE7GtaYuUIBDaIRUd/cLF/4QN8F4TQ96QcgfKreKr3XuL0g3HM1PSQymVEv24wMh5qk/DzhdFsc0huXDNBXDcu+vsTvxKNkUkbm9QuOu2yQNYNq7VVwckx7ZNqt0sEfW3PPcYyruhIyN3mzgiJL5p29NHaWSLojQrVv4vk8RoboJBsSO+71T6ZAFo/TFGVctdzZhzVH+vTHQYpaDeJeHzDKkF3/uWJKswSinDmUQAYvvOwSRLB7tcRnIWnHkM+jKOyh2JhGa7s6y02Ny7tQWkAynpdY3mF7r8cPp1DXpHc9ZOlm3Q9dh+9oD8tRQaICM4q8tYWTE/8gp5WRix8U45FBt9UCi41OIX9ye8jYyH9Op6GGdoh/PpNXohHirH2rv3bkspenUaW0gYkicZUZvR6ad3uYj/OL+tY0Rdrhoub06X13s0HzC8qz/OLrC keaglMCi WmK6UeS42xbBpDV4Yl/m693IozB9qQTB8t07j37PoWvhoVCkRPHMrLLINbiR5cbl16vcePleTm7zwutbc+LicSPiv8A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Ryan Roberts =E4=BA=8E2024=E5=B9=B43=E6=9C=8814=E6= =97=A5=E5=91=A8=E5=9B=9B 21:43=E5=86=99=E9=81=93=EF=BC=9A > > On 14/03/2024 13:12, Chuanhua Han wrote: > > Ryan Roberts =E4=BA=8E2024=E5=B9=B43=E6=9C=8812= =E6=97=A5=E5=91=A8=E4=BA=8C 02:51=E5=86=99=E9=81=93=EF=BC=9A > >> > >> On 04/03/2024 08:13, Barry Song wrote: > >>> From: Chuanhua Han > >>> > >>> While swapping in a large folio, we need to free swaps related to the= whole > >>> folio. To avoid frequently acquiring and releasing swap locks, it is = better > >>> to introduce an API for batched free. > >>> > >>> Signed-off-by: Chuanhua Han > >>> Co-developed-by: Barry Song > >>> Signed-off-by: Barry Song > >>> --- > >>> include/linux/swap.h | 6 ++++++ > >>> mm/swapfile.c | 35 +++++++++++++++++++++++++++++++++++ > >>> 2 files changed, 41 insertions(+) > >>> > >>> diff --git a/include/linux/swap.h b/include/linux/swap.h > >>> index 2955f7a78d8d..d6ab27929458 100644 > >>> --- a/include/linux/swap.h > >>> +++ b/include/linux/swap.h > >>> @@ -481,6 +481,7 @@ extern void swap_shmem_alloc(swp_entry_t); > >>> extern int swap_duplicate(swp_entry_t); > >>> extern int swapcache_prepare(swp_entry_t); > >>> extern void swap_free(swp_entry_t); > >>> +extern void swap_nr_free(swp_entry_t entry, int nr_pages); > >> > >> nit: In my swap-out v4 series, I've created a batched version of > >> free_swap_and_cache() and called it free_swap_and_cache_nr(). Perhaps = it is > >> preferable to align the naming schemes - i.e. call this swap_free_nr()= . Your > >> scheme doesn't really work when applied to free_swap_and_cache(). > > Thanks for your suggestions, and for the next version, we'll see which > > package is more appropriate! > >> > >>> extern void swapcache_free_entries(swp_entry_t *entries, int n); > >>> extern int free_swap_and_cache(swp_entry_t); > >>> int swap_type_of(dev_t device, sector_t offset); > >>> @@ -561,6 +562,11 @@ static inline void swap_free(swp_entry_t swp) > >>> { > >>> } > >>> > >>> +void swap_nr_free(swp_entry_t entry, int nr_pages) > >>> +{ > >>> + > >>> +} > >>> + > >>> static inline void put_swap_folio(struct folio *folio, swp_entry_t s= wp) > >>> { > >>> } > >>> diff --git a/mm/swapfile.c b/mm/swapfile.c > >>> index 3f594be83b58..244106998a69 100644 > >>> --- a/mm/swapfile.c > >>> +++ b/mm/swapfile.c > >>> @@ -1341,6 +1341,41 @@ void swap_free(swp_entry_t entry) > >>> __swap_entry_free(p, entry); > >>> } > >>> > >>> +/* > >>> + * Called after swapping in a large folio, batched free swap entries > >>> + * for this large folio, entry should be for the first subpage and > >>> + * its offset is aligned with nr_pages > >>> + */ > >>> +void swap_nr_free(swp_entry_t entry, int nr_pages) > >>> +{ > >>> + int i; > >>> + struct swap_cluster_info *ci; > >>> + struct swap_info_struct *p; > >>> + unsigned type =3D swp_type(entry); > >> > >> nit: checkpatch.py will complain about bare "unsigned", preferring "un= signed > >> int" or at least it did for me when I did something similar in my swap= -out patch > >> set. > > Gee, thanks for pointing that out! > >> > >>> + unsigned long offset =3D swp_offset(entry); > >>> + DECLARE_BITMAP(usage, SWAPFILE_CLUSTER) =3D { 0 }; > >> > >> I don't love this, as it could blow the stack if SWAPFILE_CLUSTER ever > >> increases. But the only other way I can think of is to explicitly loop= over > >> fixed size chunks, and that's not much better. > > Is it possible to save kernel stack better by using bit_map here? If > > SWAPFILE_CLUSTER=3D512, we consume only (512/64)*8=3D 64 bytes. > > I'm not sure I've understood what you are saying? You're already using > DECLARE_BITMAP(), so its already consuming 64 bytes if SWAPFILE_CLUSTER= =3D512, no? > > I actually did a bad job of trying to express a couple of different point= s: > > - Are there any configurations today where SWAPFILE_CLUSTER > 512? I'm no= t sure. > Certainly not for arm64, but not sure about other architectures. For exam= ple if > an arch had 64K pages with 8192 entries per THP and supports SWAP_THP, th= at's 1K > for the bitmap, which is now looking pretty big for the stack. I agree with you.The current bit_map grows linearly with the SWAPFILE_CLUSTER, which may cause the kernel stack to swell. I need to think of a way to save more memory . > > - Would it be better to decouple stack usage from SWAPFILE_CLUSTER and in= stead > define a fixed stack size (e.g. 64 bytes -> 512 entries). Then free the r= ange of > entries in batches no bigger than this size. This approach could also all= ow > removing the constraint that the range has to be aligned and fit in a sin= gle > cluster. Personally I think an approach like this would be much more robu= st, in > return for a tiny bit more complexity. Because we cannot determine how many swap entries a cluster has in an architecture or a configuration, we do not know how large the variable needs to be defined=EF=BC=9F > > >> > >>> + > >>> + /* all swap entries are within a cluster for mTHP */ > >>> + VM_BUG_ON(offset % SWAPFILE_CLUSTER + nr_pages > SWAPFILE_CLUST= ER); > >>> + > >>> + if (nr_pages =3D=3D 1) { > >>> + swap_free(entry); > >>> + return; > >>> + } > >>> + > >>> + p =3D _swap_info_get(entry); > >> > >> You need to handle this returning NULL, like swap_free() does. > > Yes, you're right! We did forget to judge NULL here. > >> > >>> + > >>> + ci =3D lock_cluster(p, offset); > >> > >> The existing swap_free() calls lock_cluster_or_swap_info(). So if swap= is backed > >> by rotating media, and clusters are not in use, it will lock the whole= swap > >> info. But your new version only calls lock_cluster() which won't lock = anything > >> if clusters are not in use. So I think this is a locking bug. > > Again, you're right, it's bug! > >> > >>> + for (i =3D 0; i < nr_pages; i++) { > >>> + if (__swap_entry_free_locked(p, offset + i, 1)) > >>> + __bitmap_set(usage, i, 1); > >>> + } > >>> + unlock_cluster(ci); > >>> + > >>> + for_each_clear_bit(i, usage, nr_pages) > >>> + free_swap_slot(swp_entry(type, offset + i)); > >>> +} > >>> + > >>> /* > >>> * Called after dropping swapcache to decrease refcnt to swap entrie= s. > >>> */ > >> > >> Thanks, > >> Ryan > >> > >> > > > > > --=20 Thanks, Chuanhua