From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 869B0C4829E for ; Fri, 16 Feb 2024 00:11:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D63E98D001A; Thu, 15 Feb 2024 19:11:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D13378D0007; Thu, 15 Feb 2024 19:11:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BDB198D001A; Thu, 15 Feb 2024 19:11:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id ACBF98D0007 for ; Thu, 15 Feb 2024 19:11:24 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 381F11A25E8 for ; Fri, 16 Feb 2024 00:11:24 +0000 (UTC) X-FDA: 81795737688.17.39F32D1 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf21.hostedemail.com (Postfix) with ESMTP id B063D1C0024 for ; Fri, 16 Feb 2024 00:11:20 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=2JnqoC7T; spf=pass (imf21.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708042281; a=rsa-sha256; cv=none; b=lHJnbaDKVlXzeDecsKBn+R7DlS7lezNRDL3in62v4V+yCqI9bzEAfcIP3EXk0spkpjPfeh rPRI3ifhv9no3nfJSci8kCiuxYcK6mDeFtwqSNz3vuV3+aWyB9rU9J/CGGktIG6kZkzJ4A 6d1k2DyC9p3802PP2jO7+aN1J4jmpsU= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=2JnqoC7T; spf=pass (imf21.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708042281; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ual0aQnIT4nyCFFwHL4JmPnzYsZbkkQOUsObo//ddVY=; b=X+4CXlc9YqhWLEm0OLjTuHvM219U6rmJf2C56Wb0rkulBLQFfOHNXLuT4FNkcSiaZ6xS4o grqBl+UUFX+ABV30dYbfqLwWtKQUQcUE5XK4fMYjkUOV2szETrIVdjtq17tgH9WmMCrm7P UC4VJR338tO1FjS5dObhcjok9eHAxIM= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 75D20CE2944; Fri, 16 Feb 2024 00:11:16 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2E9F0C433F1; Fri, 16 Feb 2024 00:11:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1708042275; bh=Fn3qWD1l8WtNeMT36sgkXFrzcnjkumD5gzR7J9xNGow=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=2JnqoC7T7PFPayFBUfvmx2p43g3EwjuraCfHTdMXSi8hUfbXagstUKaiPR3VI9B9z eNZeJKcSNM+64t+t+STnMcoNdQD2nTDhc0G0pClYP0t32vtXILpxlQ0PiPFoxhbPVC xdL5h7l70ILmglM4+2wQS0EztFTId16TUkja23S0= Date: Thu, 15 Feb 2024 16:11:14 -0800 From: Andrew Morton To: Chris Li Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Wei Xu , Yu Zhao , Greg Thelen , Chun-Tse Shao , Yosry Ahmed , Michal Hocko , Mel Gorman , Huang Ying , Nhat Pham , Kairui Song , Barry Song , Tim Chen Subject: Re: [PATCH v4] mm: swap: async free swap slot cache entries Message-Id: <20240215161114.6bd444ed839f778eefdf6e0a@linux-foundation.org> In-Reply-To: <20240214-async-free-v4-1-6abe0d59f85f@kernel.org> References: <20240214-async-free-v4-1-6abe0d59f85f@kernel.org> X-Mailer: Sylpheed 3.8.0beta1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: B063D1C0024 X-Stat-Signature: nafnm36tbuif8sndttngezkjcwjaa1b9 X-Rspam-User: X-HE-Tag: 1708042280-571524 X-HE-Meta: U2FsdGVkX1+t4Qk1NHjICIHDl4VRNZcJ/dQ6wNgk2ZY345lT9AChZJbN/FM0xVOFQzH55z8YNICk9BhJ31IL4WA26uQLFGpY89zUpiPrKKHUW2iODW4Xe5G+ZrVsi9Tx1h6xDGrq7K7CzPauvYWqgSbRncnMh/V/YKFjTK+/1n1iTYukbkGjdNaOLQouX9wTPsvBhpQVbnYWVNSnraRIr1xGNz6eRQeQmP1e4pubBTdwumRtGIpIoCKrG9tsaS7WwuPDptB10Nx1BwTJruRPos7JUZwLMsDy9knoVexViBo6Sq1bmSVWuz2Oc/xpBgehndA5ArioIXZTu88ayFTFyKER7LitHTALvdtgCYD/j7Lob5PB0ep+JiQKzdp62RRxZ4ZF9yz1zPvtfcr8t9JfYqz7EJgMMMDx7vKnTyqJ5L2zBk9TDFGtCIPnkBcdhSuAbP05E9khiKeBXdVx5p2YrDjZ4iBUuvWxUO+i63YlpcA/tN3jDo2O4ctzNSdb8xCqEx2zBY5/gleU6o+irCBT0rVAjbBMbtsDpCODeuFFKaSAQxG44UI2mwaUhUEMV3xGL30aGet8IT/VcSm9Qiu04pxgfHiYFPuhvJnxc3SyhEym6qWpBmrxX8ojk8wD+05bbDBuScWgDLYWVi23fEpbSht8xEmqQffZoDkotrnaeEjcikwbSDX6UFsK3LzCPtFL+C+uGxWukD2g58vvFucAiI8ha/Jfbnj3m2yLmKt6vUT+tuDCvkx4SxW6Yu2vEPZCjPc80fY/UPYVAeU70hVQ0Ou0YmBaWH0vN13g4N7BcKI2v7G772/tRlorYNftU0ZfsueYZ5/qI0dYpQzENxDg18bc5Gf4KMpLXNQM6jSxM74imhzB33XxzgT3nE4EN/pdx25B21cQ67m0tJeGyCi/MaeVtV4bBREg5X9n0v+9mLJqdDkzezXmE2770+bV3ecpWtXH9LN+irJfP7NkoFM gO5IarV9 o31g0WMHRPxRTBit02x8aRwKm8MzbkGO8vgzXP/uJJce50hqeV8XYqKdCPWGs4CMFZiodE7wkvaCKsvcnxVMZWBRXgJ32FRsnmS/bNf7lC+ER8v8bALguwIxtsT7jD+CIm0CmVOxFCIisqI4dG0FWrpKVUJLQh+3Yznyy19K9QPqnexV4Q45QfKmEPIx9MBSOandfSYMNs/TK9UdG4saKJHg8kDA6U1pY2a1+CeIzBUI3wi9WmauLrYXKFkUOQlEoxY1kk5+6BlYw6pIQL4RTJLlqCLGspTrXubOl X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, 14 Feb 2024 17:02:13 -0800 Chris Li wrote: > We discovered that 1% swap page fault is 100us+ while 50% of > the swap fault is under 20us. >=20 > Further investigation shows that a large portion of the time > spent in the free_swap_slots() function for the long tail case. >=20 > The percpu cache of swap slots is freed in a batch of 64 entries > inside free_swap_slots(). These cache entries are accumulated > from previous page faults, which may not be related to the current > process. >=20 > Doing the batch free in the page fault handler causes longer > tail latencies and penalizes the current process. >=20 > When the swap cache slot is full, schedule async free cached > swap slots in a work queue,=A0before the next swap fault comes in. > If the next swap fault comes in very fast, before the async > free gets a chance to run. It will directly free all the swap > cache in the swap fault the same way as previously. >=20 > Testing: >=20 > Chun-Tse did some benchmark in chromebook, showing that > zram_wait_metrics improve about 15% with 80% and 95% confidence. >=20 > I recently ran some experiments on about 1000 Google production > machines. It shows swapin latency drops in the long tail > 100us - 500us bucket dramatically. >=20 > platform (100-500us) (0-100us) > A 1.12% -> 0.36% 98.47% -> 99.22% > B 0.65% -> 0.15% 98.96% -> 99.46% > C 0.61% -> 0.23% 98.96% -> 99.38% >=20 What this description lacks is any description of why anyone cares.=20 The patch clearly decreases overall throughput (speed-vs-latency is a common tradeoff). And the "we don't know how to fix this properly so punt it into a kernel thread" approach remains lame. For example, the risk that the now-liberated allocator can outpace the async freeing, resulting in unlimited object windup. And here's a fun one: what happens if the producer of these objects has SCHED_FIFO policy and it's a uniprocessor machine? If the producer sits there allocating objects and the freeing thread never executes? Has this been considered, and tested for? All these concerns, risks and complexity and the changelog offers us no reason to take any of this on. What's wrong with the existing code?=20 Please exhaustively describe the issues which are being seen. And explain why those issues are sufficiently serious to leave the above issues and risks unaddressed.