From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3DDCC48292 for ; Mon, 5 Feb 2024 19:10:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0A73A6B0075; Mon, 5 Feb 2024 14:10:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 02FCB6B007E; Mon, 5 Feb 2024 14:10:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DEB606B0080; Mon, 5 Feb 2024 14:10:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id C8A326B0075 for ; Mon, 5 Feb 2024 14:10:21 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 80D471604FB for ; Mon, 5 Feb 2024 19:10:21 +0000 (UTC) X-FDA: 81758691042.22.6B63177 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf01.hostedemail.com (Postfix) with ESMTP id 80F5740016 for ; Mon, 5 Feb 2024 19:10:19 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=bQS+aQHG; spf=pass (imf01.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707160219; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=T8ArDFbsGj08PVe8TZAjVCbyQ0UqBKxtxX41o9PstkU=; b=hVKjCcG27v8aW+NGvSqEcYsigIf3UQwDUdBS+5q9FB8+x57VjXYw3a0CcyvpkAsiHlYhgY fU+Qzvo/PWRo1tfMi8l90WKXzNYxgRQNYw634WtVk0s6L+NzkhzllWl219OcfRGrU4jW0j SaiOLD6fTv6tRVo0TfNOxVWQ5TnMm0s= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=bQS+aQHG; spf=pass (imf01.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707160219; a=rsa-sha256; cv=none; b=06OmhvWbOq8ibOsAnbnpyyPpCc0rWO1X7uSCZy2kLMITWxvEnlTmn2juSEiRN7IliV8wAi 8L8f88ScmpWg6C+uWRHRvpD5OBDnrM0w4eDxAj8jQJGvbvRbEjxYJDEmwOUd+6/Kw8nAle aCd/HpHnJZIkhPsfYT1UEQrOtYDgrtc= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 94BCF61127 for ; Mon, 5 Feb 2024 19:10:18 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 44933C43399 for ; Mon, 5 Feb 2024 19:10:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707160218; bh=hij6B8VGtK1pSryB447Ft+fwG96XyyxlWl+9lohUqa8=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=bQS+aQHGiX0CP69WXDzZVS9EJJuEBJoio7Sdbe3T1nBnFY6pcXuLL6YhM0Oopr2l+ g4EhOzF2SWpjfe1eK9p05uc+xWUF3ACHGQs2UT9let+kfyNEKmAMA8HVK5QKR+U8uC KcnE2FawKJ/n1Hi6ysTeEfVUIpaOMRcT8q2Kvc3FM0JPqIqznKbbwa1eeKUIElmjsh WQDE+y6OcYGX6DCpwFIi29lyWZ9fK3phz8QhqQBiP13cjNmCKKYJ5xP7WCmt/WrmY8 nhWE4OraEBUTAhH9g01Rr3M7mfJE8RGXUBljUSlTTtDkLv0aYlWq2jBv9M+udHcx0p qw8Ceit06cRpg== Received: by mail-il1-f178.google.com with SMTP id e9e14a558f8ab-363b37b6799so8154595ab.1 for ; Mon, 05 Feb 2024 11:10:18 -0800 (PST) X-Gm-Message-State: AOJu0YxAmo7MIvfI+gmfB7UV/AV/HZkot3Ev3BQEx8SbqHK0aYR0iL44 lJn1FRlawirJ201BpnFUPpb0WNP4awyuMn8R//q7ttJpUkV5iYKwFYNq0Tn8XxUcOybJyaqRhob FO9lxTz/MClTQ3WAbLNl9tl8C2qQ9Uj9slhEB X-Google-Smtp-Source: AGHT+IFusXsgvprxeOT0A1F3MeVTCBZmkJ2RkoAHmdXXdohbcIpozfkRmTCmWdFPDaEZngS01GAFMfV30cFMW8ZNP8o= X-Received: by 2002:a05:6e02:1d1d:b0:363:8030:e98a with SMTP id i29-20020a056e021d1d00b003638030e98amr434324ila.0.1707160217525; Mon, 05 Feb 2024 11:10:17 -0800 (PST) MIME-Version: 1.0 References: <20240131-async-free-v2-1-525f03e07184@kernel.org> <87sf2ceoks.fsf@yhuang6-desk2.ccr.corp.intel.com> <7f19b4d69ff20efe8260a174c7866b4819532b1f.camel@linux.intel.com> <1fa1da19b0b929efec46bd02a6fc358fef1b9c42.camel@linux.intel.com> In-Reply-To: <1fa1da19b0b929efec46bd02a6fc358fef1b9c42.camel@linux.intel.com> From: Chris Li Date: Mon, 5 Feb 2024 11:10:05 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v2] mm: swap: async free swap slot cache entries To: Tim Chen Cc: "Huang, Ying" , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, =?UTF-8?B?V2VpIFh177+8?= , =?UTF-8?B?WXUgWmhhb++/vA==?= , Greg Thelen , Chun-Tse Shao , =?UTF-8?Q?Suren_Baghdasaryan=EF=BF=BC?= , =?UTF-8?B?WW9zcnkgQWhtZWTvv7w=?= , Brain Geffon , Minchan Kim , Michal Hocko , Mel Gorman , Nhat Pham , Johannes Weiner , Kairui Song , Zhongkun He , Kemeng Shi , Barry Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 80F5740016 X-Rspam-User: X-Stat-Signature: bopiwgfrta4uo5j6mrcbifbgf4jdj311 X-Rspamd-Server: rspam01 X-HE-Tag: 1707160219-436942 X-HE-Meta: U2FsdGVkX19RAnLp0b1cFQ5bdKuxhgIaeXeHiDhgRQoXWN9C82sHJCXMpUbyfZDn4jgIG16MmAhBZDdSYS8i2Y/O1o3jXMnpzEw8kvh0iMlNDvVLj2IfhUpyUKq3NAQIOBxU0evsG1aZnCTly+W1Nnd3D4K7uIS/AFvaQkF56h5iTq+7JF3kuL0TC+t4y34oAkwX2vYAi3nh+zlprH5+Ukpe4DY8QsNJL+1DUmIAbOA+cH+Xri6QHwL4R3uvm9HBV7SwTnTUNDK9Y+Psv4+y/iKqlgLQ8SawE8/8+a+c+QrXt7CXcc3yV3sBLJeDTkKi87+7pbvFV/Lg1wFE2J2ZJYxkVLFC6J09A3xqzgooFG7mggHZ2CBDtHt8k8a4QG60ZKrj87JGKBy7ydWNYDUnJjGrSd+2OOgyLmrQLLnMAdrWJ2iByYTEBb6di6/ooFrjF8ErUiEUOhrabRNprKLSHpC01DGDAN8Yzdpmx/EoI6kxB9btkv+6Dy0mhKPyPoHNrzvJWouEzunPXI+R9mGuCH4ye36pPmS5MpwhHgNBu/X4Xc/qNDFlWpeRNSYHT2O5uXueDvJNW/UsB1Rv5/cFv8LAwA7IyNqrxm05vCdVmsnyoTnoWs3Jy0ws8NP68iPP4HvJygJ0J8wDAzkTRi8UlISVuFZ731d6Ya6L1ttqrBaAAuYi096cAKxOZ/0KOzB0EPCkDGQNH+dc7yj4YqOLOBqXf8gsFHqNHXH1ZccKwXiAHZJ8SVVVmO4SF+esO9kF3sY6c8kbd+yZbBjyOMQb2YP2XK6sDjHF+NecCegQpsXu5TBBP+T4XDl55VnVscgRLagPx6+DFPpRMhhV7OnHN1LuAk1gyMfId+1vEWxvsweZAFvPjzhI9m+GRZVvoOisZ/IvpQv5tr3u04eGt5mhCOstAb+22s9tj7PZ16AS3LxCjc+5V14lgTMsc8H8FlCHULkpAijVVEb7Smew2eF S33uLb0x ZQrthdMNIR+e97CVBHryt5rseDOJpMuwBJeVTMOzsiPdjr4WB1KrF9OMhCkdq2at9Ttw+0UlFZg1uFgkPs5+Zp1EpFcwLKt6xvKKtC6/fYkg4BgSC3xaJEHjZ/ZGzu8Ej/+nAlVmJMhMlDTN9rL12QMMs7RcmX7ESGHEf5ge2BkM3Gz9fpoPx7JomU2FEDsgXHs1pvzunUGHMkIj/AlhJnzH2UFKDCc6tKjdPEJrG7007FWAKG50eNCrylA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Feb 5, 2024 at 10:15=E2=80=AFAM Tim Chen wrote: > > On Sat, 2024-02-03 at 10:12 -0800, Chris Li wrote: > > > > > > > { > > > > > struct swap_slots_cache *cache; > > > > > @@ -282,17 +298,14 @@ void free_swap_slot(swp_entry_t entry) > > > > > goto direct_free; > > > > > } > > > > > if (cache->n_ret >=3D SWAP_SLOTS_CACHE_SIZE) { > > > > > - /* > > > > > - * Return slots to global pool. > > > > > - * The current swap_map value is SWAP_HAS_CAC= HE. > > > > > - * Set it to 0 to indicate it is available fo= r > > > > > - * allocation in global pool > > > > > - */ > > > > > - swapcache_free_entries(cache->slots_ret, cach= e->n_ret); > > > > > - cache->n_ret =3D 0; > > > > > + spin_unlock_irq(&cache->free_lock); > > > > > + schedule_work(&cache->async_free); > > > > > + goto direct_free; > > > > > } > > > > > cache->slots_ret[cache->n_ret++] =3D entry; > > > > > spin_unlock_irq(&cache->free_lock); > > > > > + if (cache->n_ret >=3D SWAP_SLOTS_CACHE_SIZE) > > > > > + schedule_work(&cache->async_free); > > > > > > > > > I have some concerns about the current patch with the change above. > > > We could hit the direct_free path very often. > > > > > > By delaying the freeing of entries in the return > > > cache, we have to do more freeing of swap entry one at a time. When > > > we try to free an entry, we can find the return cache still full, wai= ting to be freed. > > > > You are describing the async free is not working. In that case it will = always > > hit the direct free path one by one. > > > > > > > > So we have fewer batch free of swap entries, resulting in an increase= in > > > number of sis->lock acquisitions overall. This could have the > > > effect of reducing swap throughput overall when swap is under heavy > > > operations and sis->lock is contended. > > > > I can change the direct free path to free all entries. If the async > > free hasn't freed up the batch by the time the next swap fault comes in= . > > The new swap fault will take the hit, just free the whole batch. It wil= l behave > > closer to the original batch free behavior in this path. > > > Will that negate the benefit you are looking for? It should not. In our deployment, the rate of swap faults isn't that high. It is one of the matrices that gets monitored and controlled. If the swap fault gets that busy, the app's performance most likely already starts to suffer from it. We should back off from swapping out that much. I expect the normal case, the async free already free up the 64 entries before the next swap fault on the same CPU hits. > A hack is to double the SWAP_SLOTS_CACHE_SIZE to 128, and trigger the > background reclaim when entries reach 64. This will allow you to avoid > the one by one relcaim direct path and hopefully the delayed job > would have done its job when slots accumulate between 64 and 128. I would have some concern on that due to higher per CPU memory usage. We have machines with high CPU count. That would mean more memory reserved. I actually have a variant of the patch that starts the async free before it reaches the 64 entries limit. e.g. 60 entries. It will give some head room to avoid the direct free path for another 4 entries. I did not include that in the patch because it makes things more complicated, also this code path isn't taking much at all. If it helps I can resurrect that in the patch as V3. > > However, I am unsure how well this hack is under really heavy > swap load. It means that the background reclaim will need to In our system, a really heavy swap load is rare and it means something is already wrong. At that point the app's SLO is likely at risk, regardless of long tail swap latency. It is already too late to address it at the swap fault end. We need to address the source of the problem which is swapping out too much. > work through a larger backlog and > hold the sis->lock longer. So if you hit the direct path while > the background reclaim is underway, you may have longer tail latency > to acquire the sis->lock. Chris