From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28D12C3DA7F for ; Thu, 1 Aug 2024 02:50:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AD20F6B0088; Wed, 31 Jul 2024 22:50:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A81CD6B00A3; Wed, 31 Jul 2024 22:50:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 922686B00A5; Wed, 31 Jul 2024 22:50:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 6EACD6B00A3 for ; Wed, 31 Jul 2024 22:50:40 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 1A39A1C2673 for ; Thu, 1 Aug 2024 02:50:40 +0000 (UTC) X-FDA: 82402148640.24.901D149 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by imf23.hostedemail.com (Postfix) with ESMTP id 77553140010 for ; Thu, 1 Aug 2024 02:50:37 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=lsDmJcWK; spf=pass (imf23.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.18 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722480610; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5avaqtdLhcKWy5Vi18o2h/QdrrCChaX1tlCVa95W/Nc=; b=IN+6aG3A8wzq59HFF3b8cx2TxxJYkr8D4fHMj1oddbASgM4UFEmzeghJ9aooEJC70IWQQG izSVWtCfVfe/T3G5IpsHi93TpQKwjEbcWN85KA7mZ4cK0q7mFyK1zNzogA2jWzQ0DB2gNH TavVNiJR8l3b1GQSeU7G2MMZt3YT+qE= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=lsDmJcWK; spf=pass (imf23.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.18 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722480610; a=rsa-sha256; cv=none; b=zN/tHdtKYCGw6K0jtj0zgAWd+3v+6Vz+7j3YsN6wpHwFvfjjQG15ggg9Kk6tAzZ30thU5h j6/hJ5Vr7b4sxK+xMcxA4TRM4O0X4MHlA1n0YpI7qO5IAMjKr5vl7sBv3zvFcyyB60+B4y R/yL9hhRYPtwmbKc9fdD2Qn9pYOseCo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1722480638; x=1754016638; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=rROVbZwkmWpXdpFc7CD6uAnLop8LpjlsHoTHzczFnBc=; b=lsDmJcWKv8tDJSC0mUKpBvOOvwnST/0JK1mqIUp3raJaBoDSg7NSA/sm jbthcrLei3pvSZLRj/fxf54OFVimmoCZgclVHWKry565Al7By7++1BgI+ PBAJJwhSJJ47j9K5ATD3x7qxeFWfNXXEOew9lcB4Uiea1gWdbAywVJJa3 wRw1m8gQMTw/CEoaQgH6qOu3koTh72fAVEcyXI3QHhRhSZKoVNeBA4xle 5iY6uUHNjvJTbGzeoWR3eLF+x7MCj9imAKXeKShElmTWty4BsD9m13D1D dxG0gQHTC/C0Io6xLfm+Y/hZfzLztSy2IAlFJEuCA9BKH4wjC8fMFL/ZL Q==; X-CSE-ConnectionGUID: HheDLsSRS1yPYGvj1tl2YQ== X-CSE-MsgGUID: XdmF9GrFRBW/9DauMGsa+A== X-IronPort-AV: E=McAfee;i="6700,10204,11150"; a="19989264" X-IronPort-AV: E=Sophos;i="6.09,253,1716274800"; d="scan'208";a="19989264" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2024 19:50:36 -0700 X-CSE-ConnectionGUID: qX0IxHmqRi6/PkOES01h0Q== X-CSE-MsgGUID: qItUv+wmRiSTjPfkBmeXOg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,253,1716274800"; d="scan'208";a="54873175" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2024 19:50:31 -0700 From: "Huang, Ying" To: Barry Song <21cnbao@gmail.com> Cc: akpm@linux-foundation.org, linux-mm@kvack.org, baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, hughd@google.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, mhocko@suse.com, minchan@kernel.org, nphamcs@gmail.com, ryan.roberts@arm.com, senozhatsky@chromium.org, shakeel.butt@linux.dev, shy828301@gmail.com, surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org, xiang@kernel.org, yosryahmed@google.com Subject: Re: [PATCH 1/1] mm: swap: add nr argument in swapcache_prepare and swapcache_clear to support large folios In-Reply-To: (Barry Song's message of "Thu, 1 Aug 2024 10:42:49 +0800") References: <20240730071339.107447-1-21cnbao@gmail.com> <20240730071339.107447-2-21cnbao@gmail.com> <874j86oubf.fsf@yhuang6-desk2.ccr.corp.intel.com> <87zfpynf2r.fsf@yhuang6-desk2.ccr.corp.intel.com> <871q39kq0e.fsf@yhuang6-desk2.ccr.corp.intel.com> <87wml1j7l9.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Thu, 01 Aug 2024 10:46:58 +0800 Message-ID: <87sevpj6xp.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 77553140010 X-Stat-Signature: xd9zbmq6yx95fkkm43y47ko5gu1tq6z7 X-HE-Tag: 1722480637-856109 X-HE-Meta: U2FsdGVkX19izAJr1/SGHKCG2rmvpJwiEj0lba3c3FpxzZ7kIRK7RPHm8dBnW16zq4weQt4E0ZZWg03aq7nu9Xy631KtVaZzD9djVoTJyaptxX1Te55wZbm8hw/vEFqsaKbMUaQgfjvMut7DU2f9APOnAsPx5OJ9OezfQENJHTqj8iAWuh5UePoWUXnxbBSsiJVV6jLo82CbVxk5wNCxoWFmDbU0np1ceCgVtj2XJZaQmnuOgI88nDFZcx3u0IZE564uiQQ50HQJb/9QeRfQh9YQm3jrc49Od+G+YXCCs8eKmlaKwLh3DQkuhgr2NMpaRx/Poqdg6ANsUHkkBWKTv7tSeBaVNTFnMS8fepVYIiiYOVkn8bglfiDSjISnU0fxX8B1IsC2JrmZ2gf0X0k/hYLNDHFy2aPPmTkH9ZeXtxOOerJwucBzDcuc+E6ijc5b2JuC8rsQTFvU8eeWO1W8nDzJE/16/dbDHJ0Oe/g9xs/om3fp9sAerTUT6M33En/3yP8ytXwMsErbNoCKWrzqh/5gtT5wwfZoaS/7Pj8f07WLAgfQIi7BHebs12LqwFDKdnGPXcAdYQ/acYMEsGk6Gpm6VXH09mJwBkNQuxai6vf9+vUrGLHGRTgIWfh2wRX2YqggasZ2G0OPcW3n1mDAUUyYkIXd1CWoXEizaHi6xPn8UpjOR1tlsf+3422VfnW3aemh9IyK/JQZ7b8W2L/6uxItbKxukoXRMqBhLFhqJL4zj3odKtFpFTRMZ3ADMm1nLPtfV5AnPIVPFs2v7w1//MKoKpFs8xR+mzV6ZRXo3Lyg8FgXzYl4wxfcdTmdQBgw6vmDjcO77jLPK06cjx/049hrxdImWVVfUmzpti847UERR84cPE13dNutpHZRwRE3zVrpjVa793aQMVicDKggfaJ/tPlkc94GfkcNPjwwKTE5bqV2c8tDWAkIdREpVhTIzsl5UnR5KMiqhLhsX+J dvq/hkf1 J/lWHYcOCAy5QRy4JQzR9IFlY2COizfNwZ4OMgVHvjEAmob/6UcIXvsCmgoLk4B69faQY9CqRJdOw0NXcEdZ9YfGzjCkjjtqVt8HGWn2RTJqo2zcSEewIQDrieieA2cHARvxA0J1GZFvU9LZe4J2Sp2tKxrstsyh6zQLbU05HxaZrrp8ZJ8T3mlC8HUwKgRWMtv5kpfyvr/D2idkL+Ekg3E7wqUy7yyeALmnmck2lPDPqRjZZ8dppnmMsVmUJGRoqnoO8mB73o7REE4OrsWIICypx6+dVjHpAbmYnFzbuf0m8OwF2G27+5tkcJ82CZGAEHqwFkkpqmR5HXYmgy0sExlTUZr4PUwQSk46/ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Barry Song <21cnbao@gmail.com> writes: > On Thu, Aug 1, 2024 at 10:37=E2=80=AFAM Huang, Ying wrote: >> >> Barry Song <21cnbao@gmail.com> writes: >> >> > On Thu, Aug 1, 2024 at 9:13=E2=80=AFAM Huang, Ying wrote: >> >> >> >> Barry Song <21cnbao@gmail.com> writes: >> >> >> >> > On Wed, Jul 31, 2024 at 4:28=E2=80=AFPM Huang, Ying wrote: >> >> >> >> >> >> Barry Song <21cnbao@gmail.com> writes: >> >> >> >> >> >> > On Wed, Jul 31, 2024 at 4:14=E2=80=AFPM Huang, Ying wrote: >> >> >> >> >> >> >> >> Hi, Barry, >> >> >> >> >> >> >> >> Barry Song <21cnbao@gmail.com> writes: >> >> >> >> >> >> >> >> > From: Barry Song >> >> >> >> > >> >> >> >> > Right now, swapcache_prepare() and swapcache_clear() supports= one entry >> >> >> >> > only, to support large folios, we need to handle multiple swa= p entries. >> >> >> >> > >> >> >> >> > To optimize stack usage, we iterate twice in __swap_duplicate= (): the >> >> >> >> > first time to verify that all entries are valid, and the seco= nd time >> >> >> >> > to apply the modifications to the entries. >> >> >> >> > >> >> >> >> > Currently, we're using nr=3D1 for the existing users. >> >> >> >> > >> >> >> >> > Reviewed-by: Baolin Wang >> >> >> >> > Signed-off-by: Barry Song >> >> >> >> > --- >> >> >> >> > include/linux/swap.h | 4 +- >> >> >> >> > mm/memory.c | 6 +-- >> >> >> >> > mm/swap.h | 5 ++- >> >> >> >> > mm/swap_state.c | 2 +- >> >> >> >> > mm/swapfile.c | 101 +++++++++++++++++++++++++--------= ---------- >> >> >> >> > 5 files changed, 68 insertions(+), 50 deletions(-) >> >> >> >> > >> >> >> >> > diff --git a/include/linux/swap.h b/include/linux/swap.h >> >> >> >> > index ba7ea95d1c57..5b920fa2315b 100644 >> >> >> >> > --- a/include/linux/swap.h >> >> >> >> > +++ b/include/linux/swap.h >> >> >> >> > @@ -480,7 +480,7 @@ extern int get_swap_pages(int n, swp_entr= y_t swp_entries[], int order); >> >> >> >> > extern int add_swap_count_continuation(swp_entry_t, gfp_t); >> >> >> >> > extern void swap_shmem_alloc(swp_entry_t); >> >> >> >> > extern int swap_duplicate(swp_entry_t); >> >> >> >> > -extern int swapcache_prepare(swp_entry_t); >> >> >> >> > +extern int swapcache_prepare(swp_entry_t entry, int nr); >> >> >> >> > extern void swap_free_nr(swp_entry_t entry, int nr_pages); >> >> >> >> > extern void swapcache_free_entries(swp_entry_t *entries, int= n); >> >> >> >> > extern void free_swap_and_cache_nr(swp_entry_t entry, int nr= ); >> >> >> >> > @@ -554,7 +554,7 @@ static inline int swap_duplicate(swp_entr= y_t swp) >> >> >> >> > return 0; >> >> >> >> > } >> >> >> >> > >> >> >> >> > -static inline int swapcache_prepare(swp_entry_t swp) >> >> >> >> > +static inline int swapcache_prepare(swp_entry_t swp, int nr) >> >> >> >> > { >> >> >> >> > return 0; >> >> >> >> > } >> >> >> >> > diff --git a/mm/memory.c b/mm/memory.c >> >> >> >> > index 833d2cad6eb2..b8675617a5e3 100644 >> >> >> >> > --- a/mm/memory.c >> >> >> >> > +++ b/mm/memory.c >> >> >> >> > @@ -4081,7 +4081,7 @@ vm_fault_t do_swap_page(struct vm_fault= *vmf) >> >> >> >> > * reusing the same entry. It's undetec= table as >> >> >> >> > * pte_same() returns true due to entry= reuse. >> >> >> >> > */ >> >> >> >> > - if (swapcache_prepare(entry)) { >> >> >> >> > + if (swapcache_prepare(entry, 1)) { >> >> >> >> > /* Relax a bit to prevent rapid= repeated page faults */ >> >> >> >> > schedule_timeout_uninterruptibl= e(1); >> >> >> >> > goto out; >> >> >> >> > @@ -4387,7 +4387,7 @@ vm_fault_t do_swap_page(struct vm_fault= *vmf) >> >> >> >> > out: >> >> >> >> > /* Clear the swap cache pin for direct swapin after PTL= unlock */ >> >> >> >> > if (need_clear_cache) >> >> >> >> > - swapcache_clear(si, entry); >> >> >> >> > + swapcache_clear(si, entry, 1); >> >> >> >> > if (si) >> >> >> >> > put_swap_device(si); >> >> >> >> > return ret; >> >> >> >> > @@ -4403,7 +4403,7 @@ vm_fault_t do_swap_page(struct vm_fault= *vmf) >> >> >> >> > folio_put(swapcache); >> >> >> >> > } >> >> >> >> > if (need_clear_cache) >> >> >> >> > - swapcache_clear(si, entry); >> >> >> >> > + swapcache_clear(si, entry, 1); >> >> >> >> > if (si) >> >> >> >> > put_swap_device(si); >> >> >> >> > return ret; >> >> >> >> > diff --git a/mm/swap.h b/mm/swap.h >> >> >> >> > index baa1fa946b34..7c6330561d84 100644 >> >> >> >> > --- a/mm/swap.h >> >> >> >> > +++ b/mm/swap.h >> >> >> >> > @@ -59,7 +59,7 @@ void __delete_from_swap_cache(struct folio = *folio, >> >> >> >> > void delete_from_swap_cache(struct folio *folio); >> >> >> >> > void clear_shadow_from_swap_cache(int type, unsigned long be= gin, >> >> >> >> > unsigned long end); >> >> >> >> > -void swapcache_clear(struct swap_info_struct *si, swp_entry_= t entry); >> >> >> >> > +void swapcache_clear(struct swap_info_struct *si, swp_entry_= t entry, int nr); >> >> >> >> > struct folio *swap_cache_get_folio(swp_entry_t entry, >> >> >> >> > struct vm_area_struct *vma, unsigned long addr); >> >> >> >> > struct folio *filemap_get_incore_folio(struct address_space = *mapping, >> >> >> >> > @@ -120,7 +120,7 @@ static inline int swap_writepage(struct p= age *p, struct writeback_control *wbc) >> >> >> >> > return 0; >> >> >> >> > } >> >> >> >> > >> >> >> >> > -static inline void swapcache_clear(struct swap_info_struct *= si, swp_entry_t entry) >> >> >> >> > +static inline void swapcache_clear(struct swap_info_struct *= si, swp_entry_t entry, int nr) >> >> >> >> > { >> >> >> >> > } >> >> >> >> > >> >> >> >> > @@ -172,4 +172,5 @@ static inline unsigned int folio_swap_fla= gs(struct folio *folio) >> >> >> >> > return 0; >> >> >> >> > } >> >> >> >> > #endif /* CONFIG_SWAP */ >> >> >> >> > + >> >> >> >> >> >> >> >> NITPICK: Is it necessary to add a blank line here? But I don't= think a >> >> >> >> new version is necessary if this is the only change needed. >> >> >> > >> >> >> > No need to add a blank line; it was probably a mistake I made in= Vim. >> >> >> > >> >> >> >> >> >> >> >> > #endif /* _MM_SWAP_H */ >> >> >> >> > diff --git a/mm/swap_state.c b/mm/swap_state.c >> >> >> >> > index a1726e49a5eb..b06f2a054f5a 100644 >> >> >> >> > --- a/mm/swap_state.c >> >> >> >> > +++ b/mm/swap_state.c >> >> >> >> > @@ -477,7 +477,7 @@ struct folio *__read_swap_cache_async(swp= _entry_t entry, gfp_t gfp_mask, >> >> >> >> > /* >> >> >> >> > * Swap entry may have been freed since our cal= ler observed it. >> >> >> >> > */ >> >> >> >> > - err =3D swapcache_prepare(entry); >> >> >> >> > + err =3D swapcache_prepare(entry, 1); >> >> >> >> > if (!err) >> >> >> >> > break; >> >> >> >> > >> >> >> >> > diff --git a/mm/swapfile.c b/mm/swapfile.c >> >> >> >> > index 5f73a8553371..757d38a86f56 100644 >> >> >> >> > --- a/mm/swapfile.c >> >> >> >> > +++ b/mm/swapfile.c >> >> >> >> > @@ -3363,7 +3363,7 @@ void si_swapinfo(struct sysinfo *val) >> >> >> >> > } >> >> >> >> > >> >> >> >> > /* >> >> >> >> > - * Verify that a swap entry is valid and increment its swap = map count. >> >> >> >> > + * Verify that nr swap entries are valid and increment their= swap map counts. >> >> >> >> > * >> >> >> >> > * Returns error code in following case. >> >> >> >> > * - success -> 0 >> >> >> >> > @@ -3373,60 +3373,77 @@ void si_swapinfo(struct sysinfo *val) >> >> >> >> > * - swap-cache reference is requested but the entry is not = used. -> ENOENT >> >> >> >> > * - swap-mapped reference requested but needs continued swa= p count. -> ENOMEM >> >> >> >> > */ >> >> >> >> > -static int __swap_duplicate(swp_entry_t entry, unsigned char= usage) >> >> >> >> > +static int __swap_duplicate(swp_entry_t entry, unsigned char= usage, int nr) >> >> >> >> > { >> >> >> >> > struct swap_info_struct *p; >> >> >> >> > struct swap_cluster_info *ci; >> >> >> >> > unsigned long offset; >> >> >> >> > unsigned char count; >> >> >> >> > unsigned char has_cache; >> >> >> >> > - int err; >> >> >> >> > + int err, i; >> >> >> >> > >> >> >> >> > p =3D swp_swap_info(entry); >> >> >> >> > >> >> >> >> > offset =3D swp_offset(entry); >> >> >> >> > + VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CL= USTER); >> >> >> >> > ci =3D lock_cluster_or_swap_info(p, offset); >> >> >> >> > >> >> >> >> > - count =3D p->swap_map[offset]; >> >> >> >> > + err =3D 0; >> >> >> >> > + for (i =3D 0; i < nr; i++) { >> >> >> >> > + count =3D p->swap_map[offset + i]; >> >> >> >> > >> >> >> >> > - /* >> >> >> >> > - * swapin_readahead() doesn't check if a swap entry is = valid, so the >> >> >> >> > - * swap entry could be SWAP_MAP_BAD. Check here with lo= ck held. >> >> >> >> > - */ >> >> >> >> > - if (unlikely(swap_count(count) =3D=3D SWAP_MAP_BAD)) { >> >> >> >> > - err =3D -ENOENT; >> >> >> >> > - goto unlock_out; >> >> >> >> > - } >> >> >> >> > + /* >> >> >> >> > + * swapin_readahead() doesn't check if a swap e= ntry is valid, so the >> >> >> >> > + * swap entry could be SWAP_MAP_BAD. Check here= with lock held. >> >> >> >> > + */ >> >> >> >> > + if (unlikely(swap_count(count) =3D=3D SWAP_MAP_= BAD)) { >> >> >> >> > + err =3D -ENOENT; >> >> >> >> > + goto unlock_out; >> >> >> >> > + } >> >> >> >> > >> >> >> >> > - has_cache =3D count & SWAP_HAS_CACHE; >> >> >> >> > - count &=3D ~SWAP_HAS_CACHE; >> >> >> >> > - err =3D 0; >> >> >> >> > + has_cache =3D count & SWAP_HAS_CACHE; >> >> >> >> > + count &=3D ~SWAP_HAS_CACHE; >> >> >> >> > >> >> >> >> > - if (usage =3D=3D SWAP_HAS_CACHE) { >> >> >> >> > + if (usage =3D=3D SWAP_HAS_CACHE) { >> >> >> >> > + /* set SWAP_HAS_CACHE if there is no ca= che and entry is used */ >> >> >> >> > + if (!has_cache && count) >> >> >> >> > + continue; >> >> >> >> > + else if (has_cache) /* some= one else added cache */ >> >> >> >> > + err =3D -EEXIST; >> >> >> >> > + else /* no u= sers remaining */ >> >> >> >> > + err =3D -ENOENT; >> >> >> >> > >> >> >> >> > - /* set SWAP_HAS_CACHE if there is no cache and = entry is used */ >> >> >> >> > - if (!has_cache && count) >> >> >> >> > - has_cache =3D SWAP_HAS_CACHE; >> >> >> >> > - else if (has_cache) /* someone else= added cache */ >> >> >> >> > - err =3D -EEXIST; >> >> >> >> > - else /* no users rem= aining */ >> >> >> >> > - err =3D -ENOENT; >> >> >> >> > + } else if (count || has_cache) { >> >> >> >> > >> >> >> >> > - } else if (count || has_cache) { >> >> >> >> > + if ((count & ~COUNT_CONTINUED) < SWAP_M= AP_MAX) >> >> >> >> > + continue; >> >> >> >> > + else if ((count & ~COUNT_CONTINUED) > S= WAP_MAP_MAX) >> >> >> >> > + err =3D -EINVAL; >> >> >> >> > + else if (swap_count_continued(p, offset= + i, count)) >> >> >> >> > + continue; >> >> >> >> >> >> >> >> IIUC, this will make the change to swap map directly instead of >> >> >> >> verification. If the verification failed for some entry later,= the >> >> >> >> count will be wrong? Or I missed something? >> >> >> > >> >> >> > To avoid using a bitmap or a larger stack, we actually verify du= ring >> >> >> > the first iteration. >> >> >> > This ensures that by the second iteration, we can safely commit = the >> >> >> > modification. >> >> >> > >> >> >> > I actually put some words in the changelog :-) >> >> >> > >> >> >> > To optimize stack usage, we iterate twice in __swap_duplicate():= the >> >> >> > first time to verify that all entries are valid, and the second = time >> >> >> > to apply the modifications to the entries. >> >> >> >> >> >> Yes, I have seen it and I think that it is a good strategy. >> >> >> >> >> >> But, IIUC, swap_count_continued() will change the higher bits of t= he >> >> >> swap_map instead of verifying. Or, my understanding is wrong? >> >> >> >> >> > >> >> > Ying, your understanding is 100% correct. but the code also has not= hing >> >> > broken. we didn't extend swap_duplicate() to have argument nr, >> >> > so all users which can set usage=3D1 will definitely have nr=3D1. >> >> > >> >> > int swap_duplicate(swp_entry_t entry) >> >> > { >> >> > int err =3D 0; >> >> > >> >> > while (!err && __swap_duplicate(entry, 1, 1) =3D=3D -ENOMEM) >> >> > err =3D add_swap_count_continuation(entry, GFP_ATOM= IC); >> >> > return err; >> >> > } >> >> >> >> I understand that we don't have requirements to support "usage =3D=3D= 1 && >> >> nr > 1" case for __swap_duplicate() at least for now. >> >> >> >> > Maybe I can add a VM_WARN_ON to warn those people who might >> >> > want to extend swap_duplicate()? in that case, things could be quite >> >> > tricky. >> >> > >> >> > --- a/mm/swapfile.c >> >> > +++ b/mm/swapfile.c >> >> > @@ -3386,6 +3386,7 @@ static int __swap_duplicate(swp_entry_t entry, >> >> > unsigned char usage, int nr) >> >> > >> >> > offset =3D swp_offset(entry); >> >> > VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTE= R); >> >> > + VM_WARN_ON(usage =3D=3D 1 && nr > 1); >> >> > ci =3D lock_cluster_or_swap_info(p, offset); >> >> > >> >> > err =3D 0; >> >> >> >> Please add this. And, I think that we need to make it explicit in pa= tch >> >> description and comments to avoid potential confusing. >> > >> > cool. make sense to me. I will post something for Andrew to squash int= o. >> > >> >> >> >> And, because it's hard to implement the verify and change strategy if >> >> "usage =3D=3D 1". Can we only use that strategy for "usage =3D=3D >> >> SWAP_HAS_CACHE"? >> > >> > I believe Baolin also needs the case for shmem. I don't feel a strong >> > need to split two logics(1 and non-1) as the code will be quite ugly := -) >> >> Don't need to split like that, it could be something like >> >> for (i =3D 0; i < nr; i++) { >> if (usage =3D=3D SWAP_HAS_CACHE) { >> /* Only verify for SWAP_HAS_CACHE */ >> } >> } >> >> for (i =3D 0; i < nr; i++) { >> if (usage =3D=3D SWAP_HAS_CACHE) { >> } else { >> /* Verify and change for usage =3D=3D 1 */ >> } >> } >> > > but we also have cases where nr can be > 1 > __swap_duplicate(entry, SWAP_MAP_SHMEM, 1); If we can do verification for "usage =3D=3D SWAP_MAP_SHMEM", we can add that in the first loop. That is, we only do verification in the first loop, not do committing. In the second loop, we can ignore verifying if we have done that in the first loop. IMHO, this make code easier to be understood. -- Best Regards, Huang, Ying >> >> >> >> >> >> >> >> >> >> > + else >> >> >> >> > + err =3D -ENOMEM; >> >> >> >> > + } else >> >> >> >> > + err =3D -ENOENT; /* un= used swap entry */ >> >> >> >> > >> >> >> >> > - if ((count & ~COUNT_CONTINUED) < SWAP_MAP_MAX) >> >> >> >> > + if (err) >> >> >> >> > + goto unlock_out; >> >> >> >> > + } >> >> >> >> > + >> >> >> >> > + for (i =3D 0; i < nr; i++) { >> >> >> >> > + count =3D p->swap_map[offset + i]; >> >> >> >> > + has_cache =3D count & SWAP_HAS_CACHE; >> >> >> >> > + count &=3D ~SWAP_HAS_CACHE; >> >> >> >> > + >> >> >> >> > + if (usage =3D=3D SWAP_HAS_CACHE) >> >> >> >> > + has_cache =3D SWAP_HAS_CACHE; >> >> >> >> > + else if ((count & ~COUNT_CONTINUED) < SWAP_MAP_= MAX) >> >> >> >> > count +=3D usage; >> >> >> >> > - else if ((count & ~COUNT_CONTINUED) > SWAP_MAP_= MAX) >> >> >> >> > - err =3D -EINVAL; >> >> >> >> > - else if (swap_count_continued(p, offset, count)) >> >> >> >> > - count =3D COUNT_CONTINUED; >> >> >> >> > else >> >> >> >> > - err =3D -ENOMEM; >> >> >> >> > - } else >> >> >> >> > - err =3D -ENOENT; /* unused swa= p entry */ >> >> >> >> > + count =3D COUNT_CONTINUED; >> >> >> >> > >> >> >> >> > - if (!err) >> >> >> >> > - WRITE_ONCE(p->swap_map[offset], count | has_cac= he); >> >> >> >> > + WRITE_ONCE(p->swap_map[offset + i], count | has= _cache); >> >> >> >> > + } >> >> >> >> > >> >> >> >> > unlock_out: >> >> >> >> > unlock_cluster_or_swap_info(p, ci); >> >> >> >> > @@ -3439,7 +3456,7 @@ static int __swap_duplicate(swp_entry_t= entry, unsigned char usage) >> >> >> >> > */ >> >> >> >> > void swap_shmem_alloc(swp_entry_t entry) >> >> >> >> > { >> >> >> >> > - __swap_duplicate(entry, SWAP_MAP_SHMEM); >> >> >> >> > + __swap_duplicate(entry, SWAP_MAP_SHMEM, 1); >> >> >> >> > } >> >> >> >> > >> >> >> >> > /* >> >> >> >> > @@ -3453,29 +3470,29 @@ int swap_duplicate(swp_entry_t entry) >> >> >> >> > { >> >> >> >> > int err =3D 0; >> >> >> >> > >> >> >> >> > - while (!err && __swap_duplicate(entry, 1) =3D=3D -ENOME= M) >> >> >> >> > + while (!err && __swap_duplicate(entry, 1, 1) =3D=3D -EN= OMEM) >> >> >> >> > err =3D add_swap_count_continuation(entry, GFP_= ATOMIC); >> >> >> >> > return err; >> >> >> >> > } >> >> >> >> > >> >> >> >> > /* >> >> >> >> > - * @entry: swap entry for which we allocate swap cache. >> >> >> >> > + * @entry: first swap entry from which we allocate nr swap c= ache. >> >> >> >> > * >> >> >> >> > - * Called when allocating swap cache for existing swap entry, >> >> >> >> > + * Called when allocating swap cache for existing swap entri= es, >> >> >> >> > * This can return error codes. Returns 0 at success. >> >> >> >> > * -EEXIST means there is a swap cache. >> >> >> >> > * Note: return code is different from swap_duplicate(). >> >> >> >> > */ >> >> >> >> > -int swapcache_prepare(swp_entry_t entry) >> >> >> >> > +int swapcache_prepare(swp_entry_t entry, int nr) >> >> >> >> > { >> >> >> >> > - return __swap_duplicate(entry, SWAP_HAS_CACHE); >> >> >> >> > + return __swap_duplicate(entry, SWAP_HAS_CACHE, nr); >> >> >> >> > } >> >> >> >> > >> >> >> >> > -void swapcache_clear(struct swap_info_struct *si, swp_entry_= t entry) >> >> >> >> > +void swapcache_clear(struct swap_info_struct *si, swp_entry_= t entry, int nr) >> >> >> >> > { >> >> >> >> > unsigned long offset =3D swp_offset(entry); >> >> >> >> > >> >> >> >> > - cluster_swap_free_nr(si, offset, 1, SWAP_HAS_CACHE); >> >> >> >> > + cluster_swap_free_nr(si, offset, nr, SWAP_HAS_CACHE); >> >> >> >> > } >> >> >> >> > >> >> >> >> > struct swap_info_struct *swp_swap_info(swp_entry_t entry) >> >> >> >> >> >> >> >> -- >> >> >> >> Best Regards, >> >> >> >> Huang, Ying >> >> >> > >> > >> > Thanks >> > Barry