From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35ED4C61D88 for ; Tue, 21 Nov 2023 01:12:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C03996B03B5; Mon, 20 Nov 2023 20:12:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BB3C26B03B6; Mon, 20 Nov 2023 20:12:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A7B276B03C0; Mon, 20 Nov 2023 20:12:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 98B966B03B5 for ; Mon, 20 Nov 2023 20:12:15 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 722C3B5CE7 for ; Tue, 21 Nov 2023 01:12:15 +0000 (UTC) X-FDA: 81480185430.02.C23B5C8 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) by imf09.hostedemail.com (Postfix) with ESMTP id CC12214000B for ; Tue, 21 Nov 2023 01:12:12 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=nbh1gWrl; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf09.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.9 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700529133; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mOMJiIvW7/bI+2/9annW1l4JKYHd5/s7Cil65AxTpXg=; b=Cqv4uEZ6P7Hmc1F3of0UVhepNxAoZAnmHaPURokaGrx6rGgorkk650NK2GH2fUiEC50hUU xKcKJ+0YBN4/xVq0Tz1YX2ZBsQaplS/mD2zKqIzx+UeS3cRs0iAmHn4Q6jNhQGMSrVGiDT iNJZaIEK09Kw9lOrc22Z3TbBGhl55GY= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=nbh1gWrl; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf09.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.9 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700529133; a=rsa-sha256; cv=none; b=ojGlXi552z6AL/abDPpLyk/QuEJ2b1wNEWvDxrpZmYZMkgYybUHANjznQ9SzKc+e4Bljld 71eOHRetRq+gnzmbzop/fqmsdjApDGgJuxt7prp1iqg686EWZ7LpXy4qCesB8GDpa1t8r/ WzKQWgObUP7BewPhOTv7uN2fyJC6hcQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700529133; x=1732065133; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=VEFsIUgfPV39a0OqrPWAUirhQiyA7JQSSKxLhbj5aiA=; b=nbh1gWrlTcQXYc6gBEum5CFwUPkrVR1Lvl2OwHQLB4RO2nH9+9EKcml8 olcdUiO3dEXa2HdFPjEezJQ+tqGOcPVXSy7kA9xlBBncg9oogF5AaKobp Gzu3L5jtOvUpzgrB3LIKg1MtM4qG5Q4qwAtjqS5ptKcF69xeEmksITrZh rB3N+/nNyVC7A1Tf61WHrYva/JA6twAyK+mhoz3l4Q9l86CAK8k2UCFY5 55H0uiK5W/Tad0JPOOb0bl6k2cGv0K5siuPUxJ7cWDtnLcXZP/eT59/dv Lb5jSas+wuHTHgCXkTdfjtpwLB3IG8LqbU5KdIfvKxCEyA+q9qZKL2Wiv w==; X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="10407917" X-IronPort-AV: E=Sophos;i="6.04,214,1695711600"; d="scan'208";a="10407917" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2023 17:12:11 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="1097886053" X-IronPort-AV: E=Sophos;i="6.04,214,1695711600"; d="scan'208";a="1097886053" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2023 17:12:07 -0800 From: "Huang, Ying" To: Kairui Song Cc: linux-mm@kvack.org, Andrew Morton , David Hildenbrand , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , linux-kernel@vger.kernel.org Subject: Re: [PATCH 08/24] mm/swap: check readahead policy per entry In-Reply-To: (Kairui Song's message of "Mon, 20 Nov 2023 19:17:12 +0800") References: <20231119194740.94101-1-ryncsn@gmail.com> <20231119194740.94101-9-ryncsn@gmail.com> <87r0klarjp.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Tue, 21 Nov 2023 09:10:06 +0800 Message-ID: <87a5r7c3o1.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: CC12214000B X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: hb8xytkrhnhktsf96yfqd7gb6z9f9kam X-HE-Tag: 1700529132-407715 X-HE-Meta: U2FsdGVkX1+VHLQhkoE9WD+AnZqKDVlkT4XO7oMAL8bwaLnlB+2/BtjzERMz1v7IwQH7Lmi8KcfgeTTGvY8b0yIwuCyvPqnAZV9xg62Gfz7IQ1yjTeZ9kA7pFD/BNInGjjc3cCko5a9dg7ucj4Ben7dyVA3BKDtp2F0STG1VaALj94qKc+A4mQStttoBxM+gmxYCkIKfTb3kZr0VCJFYceo2JjI0TPT284jUODKzQkU5tYsVVI2f0VgZgLYHWEOrPcqK7IIqwRQ3WblQimT0trOuE2moDA6b7jvqgUhj9js3qvA72nb3PWYvBh9qqVzoBctTa37e9h2lO11YAUGN8oGrcOYS0j/uyCRRJQ/op2hkySOC4uGvFLOjmrH03OlbmPSz+IPGkHkvWP6Qr3a1JkI8kRe5HnuywktA2HJ3QdGbf7o6fYG9WUAdrLOBkDwW0vLzq79lOJGB58b+lJZ9A1UQxSvSziA7Z4XbRhCV8rUsx9p/RzLHw2UbAq/loLDigd49ADmOR/aVPuRSqyrE5S0hMGkRahr1FluLD9WDaeQyTz0Rr/C/d58hCYLy704SaOu0QRwSmEl2y3S2VhHD3bh4pH317xGAJAdE3pkhECz9+J2nQhzrleD6LY38fVDmCVPlAYuTAmsJelnn6L9JVcOj39ordGCZv3LDw7rxJ0JawlTeuJyHBurRPxZNapNYYnQ4QwYq/A6JRsOrNkilL6intDG8tvy5jWjAywagLb2qYrLPdsT/4mMDEmFq5NrB5eok59cFFXuRgv6zZ701FXvM0Azwb0qdKtP/bbgvW0VOk4D4bXg50RNJfCBlszwsAYGjXneJXZvwojolpJJL0LsNfp+6PmELJIDyWtwVQUZZKDS8CT0P0ECuMnPXpzUquz/TTh23j/fOgrIMVFQ+TdOaR8DSPk8nG1JKQRlw81b+qBhW9vL84qlc9DRRjp+TnBOWTeCVT24mCGyNuYG w9+q8Bk1 CFhp4Mtt74euhZapagIKi7wy3QYexBpx381KX/mdXbwNJ0lNPIfxYaxRW/hvuhFDsfLcRDC60C+zBBHuVFsvxjcvwzDmSaUXpC3E4PZUgIE/iG4l8tu+mBNub10w10fkvXoZUGw7bUsc5/QkVz5zxJbjV6ZJHymysJbrL2fsar80nEVwNMSz6BRE8zWpoZhMjGbDtVm9HyWMOA0bLvTexCzCYDBLfUSL2N4yKp1IlJKyXoPAOMl0bb2HTgyHEExnnM9oGeiTZJfi669eP/pCoDFXOaYht6E1vMyanMd336LwZ/X0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Kairui Song writes: > Huang, Ying =E4=BA=8E2023=E5=B9=B411=E6=9C=8820=E6= =97=A5=E5=91=A8=E4=B8=80 14:07=E5=86=99=E9=81=93=EF=BC=9A >> >> Kairui Song writes: >> >> > From: Kairui Song >> > >> > Currently VMA readahead is globally disabled when any rotate disk is >> > used as swap backend. So multiple swap devices are enabled, if a slower >> > hard disk is set as a low priority fallback, and a high performance SSD >> > is used and high priority swap device, vma readahead is disabled globa= lly. >> > The SSD swap device performance will drop by a lot. >> > >> > Check readahead policy per entry to avoid such problem. >> > >> > Signed-off-by: Kairui Song >> > --- >> > mm/swap_state.c | 12 +++++++----- >> > 1 file changed, 7 insertions(+), 5 deletions(-) >> > >> > diff --git a/mm/swap_state.c b/mm/swap_state.c >> > index ff6756f2e8e4..fb78f7f18ed7 100644 >> > --- a/mm/swap_state.c >> > +++ b/mm/swap_state.c >> > @@ -321,9 +321,9 @@ static inline bool swap_use_no_readahead(struct sw= ap_info_struct *si, swp_entry_ >> > return data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count= (entry) =3D=3D 1; >> > } >> > >> > -static inline bool swap_use_vma_readahead(void) >> > +static inline bool swap_use_vma_readahead(struct swap_info_struct *si) >> > { >> > - return READ_ONCE(enable_vma_readahead) && !atomic_read(&nr_rotat= e_swap); >> > + return data_race(si->flags & SWP_SOLIDSTATE) && READ_ONCE(enable= _vma_readahead); >> > } >> > >> > /* >> > @@ -341,7 +341,7 @@ struct folio *swap_cache_get_folio(swp_entry_t ent= ry, >> > >> > folio =3D filemap_get_folio(swap_address_space(entry), swp_offse= t(entry)); >> > if (!IS_ERR(folio)) { >> > - bool vma_ra =3D swap_use_vma_readahead(); >> > + bool vma_ra =3D swap_use_vma_readahead(swp_swap_info(ent= ry)); >> > bool readahead; >> > >> > /* >> > @@ -920,16 +920,18 @@ static struct page *swapin_no_readahead(swp_entr= y_t entry, gfp_t gfp_mask, >> > struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, >> > struct vm_fault *vmf, bool *swapcached) >> > { >> > + struct swap_info_struct *si; >> > struct mempolicy *mpol; >> > struct page *page; >> > pgoff_t ilx; >> > bool cached; >> > >> > + si =3D swp_swap_info(entry); >> > mpol =3D get_vma_policy(vmf->vma, vmf->address, 0, &ilx); >> > - if (swap_use_no_readahead(swp_swap_info(entry), entry)) { >> > + if (swap_use_no_readahead(si, entry)) { >> > page =3D swapin_no_readahead(entry, gfp_mask, mpol, ilx,= vmf->vma->vm_mm); >> > cached =3D false; >> > - } else if (swap_use_vma_readahead()) { >> > + } else if (swap_use_vma_readahead(si)) { >> >> It's possible that some pages are swapped out to SSD while others are >> swapped out to HDD in a readahead window. >> >> I suspect that there are practical requirements to use swap on SSD and >> HDD at the same time. > > Hi Ying, > > Thanks for the review! > > For the first issue "fragmented readahead window", I was planning to > do an extra check in readahead path to skip readahead entries that are > on different swap devices, which is not hard to do, This is a possible solution. > but this series is growing too long so I thought it will be better > done later. You don't need to keep everything in one series. Just use multiple series. Even if they are all swap-related. They are dealing with different problem in fact. > For the second issue, "is there any practical use for multiple swap", > I think actually there are. For example we are trying to use multi > layer swap for offloading memory of different hotness on servers. And > we also tried to implement a mechanism to migrate long sleep swap > entries from high performance SSD/RAMDISK swap to cheap HDD swap > device, with more than two layers of swap, which worked except the > upstream issue, that readahead policy will no longer work as expected. Thanks for your information. >> > page =3D swap_vma_readahead(entry, gfp_mask, mpol, ilx, = vmf); >> > cached =3D true; >> > } else { -- Best Regards, Huang, Ying