From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E65C6C47DD9 for ; Wed, 27 Mar 2024 08:18:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 661C96B0095; Wed, 27 Mar 2024 04:18:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 611EC6B009D; Wed, 27 Mar 2024 04:18:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4D9D36B009E; Wed, 27 Mar 2024 04:18:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3CE2A6B0095 for ; Wed, 27 Mar 2024 04:18:31 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id DD71F1C0CA0 for ; Wed, 27 Mar 2024 08:18:30 +0000 (UTC) X-FDA: 81942117180.03.532AAFC Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) by imf29.hostedemail.com (Postfix) with ESMTP id 47AC412001F for ; Wed, 27 Mar 2024 08:18:28 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="R6/fSfD7"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf29.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.7 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711527509; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iY46BWFDAJ5B4xITJofsWS1KYLfNJ7BIACWiDoRWd3E=; b=B28GTJWD2WPs0JWd13PL9GqEqgFoELiR9j82jTtqfqmTttIMjZQwEpqBzn8G5GSc0Diyj6 UrSrgLdBMy5dhzFQ+pKUjcqMNkQ+JXFYCdjyOnXIKrV4s/3ft9gYJHz+UCJEZG9vA4iqFj R5Pw3GKpUL4BrKlyokRJACETAGMqTDU= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="R6/fSfD7"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf29.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.7 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711527509; a=rsa-sha256; cv=none; b=KaV34CF0+5FkkpIgcAU6/lhRu+pOHtDgzBFF+T9No4O7PlUV/EwLFUAp9UatqArJi+xaxH G5jGJv5TXt9Gp861DJEFQo+GgtAp1dMCh8noB+1rD1hcfkiNww6bvwyqL1H44N0lDe86vp Ko0FRHNG7Gg/hMIWf3LsU6yJwgPQHgk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711527508; x=1743063508; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=4wWd4i/vKGYQmj3wwF8rqhpT36RPM/HAKao0l4UPWdo=; b=R6/fSfD7XOGtz5HZ5bSK35OWPsVWTtIcGxRo1/GuySS02XIZ8T7nlgZo jsuh4ktB5VvMEomgqa7eDgx5OQwepkqDN62yVNddw+MXOAlJjxBHZiOVU INXvXP9HhQPRDhMUSgYH+VnUxQb3gcMe3otGpUUNLrQeVR6G8I+Wm5KsL VNyKIKbvDwIj+yOEOE/CUpw17Y2itLCmBWMhbtXSv11Rw9lG/wKBWmr1p 3mA+/BnMv5hjuHX2ae1U+er4ItuXjT+gvBweoMgx9fGYhe3Ed+WVWGhkX L6Lbm5GjhxDShECRrBtL4Twcw9HXBP3XEFL2jdl75HhbJXTzbnDeKLVVV Q==; X-CSE-ConnectionGUID: e/seDKoHQ5+tw5YqFgrgBw== X-CSE-MsgGUID: g+KBOvPoTNmQIUOkWg4hfw== X-IronPort-AV: E=McAfee;i="6600,9927,11025"; a="32055615" X-IronPort-AV: E=Sophos;i="6.07,158,1708416000"; d="scan'208";a="32055615" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2024 01:18:27 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,158,1708416000"; d="scan'208";a="16156089" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2024 01:18:23 -0700 From: "Huang, Ying" To: Kairui Song Cc: linux-mm@kvack.org, Chris Li , Minchan Kim , Barry Song , Ryan Roberts , Yu Zhao , SeongJae Park , David Hildenbrand , Yosry Ahmed , Johannes Weiner , Matthew Wilcox , Nhat Pham , Chengming Zhou , Andrew Morton , linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 10/10] mm/swap: optimize synchronous swapin In-Reply-To: (Kairui Song's message of "Wed, 27 Mar 2024 15:14:03 +0800") References: <20240326185032.72159-1-ryncsn@gmail.com> <20240326185032.72159-11-ryncsn@gmail.com> <87zfukmbwz.fsf@yhuang6-desk2.ccr.corp.intel.com> <87r0fwmar4.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Wed, 27 Mar 2024 16:16:30 +0800 Message-ID: <87il18m6n5.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: fre7hz9wstk986f75cgnp3o5gr9dtoud X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 47AC412001F X-HE-Tag: 1711527508-993127 X-HE-Meta: U2FsdGVkX1/h1ZOgr7cZC85qqcD/pG+AC6tbOorRCsXmjXiyFDYF+LzSsC0zUuLLxU+8c51aCFCyTmdcMvPDO541L00SCXZnekyC7DKy1U3YkJKj73h6oIgT5WIPheB0Qcqi530C9b4TMGn5c1RKGfd2FhC6ImhJcy2PnRW5zt+xqNA0K8SrSBuONXsCDEryTNgCk4j02aio6EiIyt4ZUjeja/sw5uqmHs1XF5F/1qekLK3Tiu5q1tIhMtC8s9AGPXMHjgPJUc7LqbBiCUdudq/CQp0R3O4ZSJ0lDpGXpIrQKazJMkgzpUC2zyjUBcKJIYVKmnmsI+i+kHk8lUw/xYyeLR6OeFLB2XAq8QI4YFUGNSUyTkOMHIBq1k1HZ6Z/xNSDBTJ7yjLLrJvVyllv3qhFnmo/L06ubfKAT633W7plIIzdSQZxvgygP3bITr1ckofVE/W5uEMxzNR1kKqW/Lz45MMKFLWVjD5bFb+Cho0L6Guqf2HWWUcJi41S9a2kug6TryNqASj4jzcbkn+xbTXARbfjnzL2QXu3Sml8zNbQCyTlNN3H5/J3FfzsbIQGUC2MSSYCYmrOREri0uz3EyNKA7H2Aw2KrWW+ufb3D1qGq4OdNy/U4ePgi3RT5uhoCsR3tY/FT++a5xdQ1aZEhaFB3aGNzq60pU3CF1SGEBXEeE9LwsA4p0KpH/LQN2anxCtRKMxk4mgO9M69jfjtdZI9ZuY8AgPKVNBsWxSVWpTD51h+PKswY2mTV7chdwvqgK5Noq6hT42AmJ86YNk1gzGpd8W91VkAghnqvwKL8bfiPzTVfr7tVBwXlVn9/UfrOjQaqP595LuiIbY7aPpiY3NlttpR2leK X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Kairui Song writes: > On Wed, Mar 27, 2024 at 2:49=E2=80=AFPM Huang, Ying wrote: >> >> Kairui Song writes: >> >> > On Wed, Mar 27, 2024 at 2:24=E2=80=AFPM Huang, Ying wrote: >> >> >> >> Kairui Song writes: >> >> >> >> > From: Kairui Song >> >> > >> >> > Interestingly the major performance overhead of synchronous is actu= ally >> >> > from the workingset nodes update, that's because synchronous swap in >> >> >> >> If it's the major overhead, why not make it the first optimization? >> > >> > This performance issue became much more obvious after doing other >> > optimizations, and other optimizations are for general swapin not only >> > for synchronous swapin, that's also how I optimized things step by >> > step, so I kept my patch order... >> > >> > And it is easier to do this after Patch 8/10 which introduces the new >> > interface for swap cache. >> > >> >> >> >> > keeps adding single folios into a xa_node, making the node no longer >> >> > a shadow node and have to be removed from shadow_nodes, then remove >> >> > the folio very shortly and making the node a shadow node again, >> >> > so it has to add back to the shadow_nodes. >> >> >> >> The folio is removed only if should_try_to_free_swap() returns true? >> >> >> >> > Mark synchronous swapin folio with a special bit in swap entry embe= dded >> >> > in folio->swap, as we still have some usable bits there. Skip worki= ngset >> >> > node update on insertion of such folio because it will be removed v= ery >> >> > quickly, and will trigger the update ensuring the workingset info is >> >> > eventual consensus. >> >> >> >> Is this safe? Is it possible for the shadow node to be reclaimed aft= er >> >> the folio are added into node and before being removed? >> > >> > If a xa node contains any non-shadow entry, it can't be reclaimed, >> > shadow_lru_isolate will check and skip such nodes in case of race. >> >> In shadow_lru_isolate(), >> >> /* >> * The nodes should only contain one or more shadow entries, >> * no pages, so we expect to be able to remove them all and >> * delete and free the empty node afterwards. >> */ >> if (WARN_ON_ONCE(!node->nr_values)) >> goto out_invalid; >> if (WARN_ON_ONCE(node->count !=3D node->nr_values)) >> goto out_invalid; >> >> So, this isn't considered normal and will cause warning now. > > Yes, I added an exception in this patch: > - if (WARN_ON_ONCE(node->count !=3D node->nr_values)) > + if (WARN_ON_ONCE(node->count !=3D node->nr_values && > mapping->host !=3D NULL)) > > The code is not a good final solution, but the idea might not be that > bad, list_lru provides many operations like LRU_ROTATE, we can even > lazy remove all the nodes as a general optimization, or add a > threshold for adding/removing a node from LRU. We can compare different solutions. For this one, we still need to deal with the cases where the folio isn't removed from the swap cache, that is, should_try_to_free_swap() returns false. >> >> >> >> >> If so, we may consider some other methods. Make shadow_nodes per-cpu? >> > >> > That's also an alternative solution if there are other risks. >> >> This appears a general optimization and more clean. > > I'm not sure if synchronization between CPUs will make more burden, > because shadow nodes are globally shared, one node can be referenced > by multiple CPUs, I can have a try to see if this is doable. Maybe a > per-cpu batch is better but synchronization might still be an issue. Yes. Per-CPU shadow_nodes needs to find list from shadow node. That has some overhead. If lock contention on list_lru lock is the root cause, we can use hashed shadow node lists. That can reduce lock contention effectively. -- Best Regards, Huang, Ying