From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0AE37D6B6DB for ; Wed, 30 Oct 2024 21:32:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5568F6B0098; Wed, 30 Oct 2024 17:32:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 505AE6B00A5; Wed, 30 Oct 2024 17:32:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3CC606B00A6; Wed, 30 Oct 2024 17:32:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 1ECF06B00A5 for ; Wed, 30 Oct 2024 17:32:31 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id B3E56141133 for ; Wed, 30 Oct 2024 21:32:30 +0000 (UTC) X-FDA: 82731567408.28.65636C0 Received: from mail-qv1-f50.google.com (mail-qv1-f50.google.com [209.85.219.50]) by imf19.hostedemail.com (Postfix) with ESMTP id 29DD41A0022 for ; Wed, 30 Oct 2024 21:31:55 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="XMhlPa/G"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf19.hostedemail.com: domain of yosryahmed@google.com designates 209.85.219.50 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730323772; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tmQdUbtLnzYjF/xLQA+VMZk9QzQVpT+DQcxpiDhLHm4=; b=n5j8OTtiRBkJnbjdiWQOzioJhFf8h0OYeFQpphCYYMAfFAd8LLbQHGIq1uZ7MZ/8RpeLyA lfdhaul/kdoa1VHaxRktgxNFd8ZYYrTtSiGi+CdSEiopVLWyWfCU38v3cYiONf0pqvbJwd s9zYOYkhEDTIYtRrHqnNbH29dRV7zNo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730323772; a=rsa-sha256; cv=none; b=0LMom+jxxJ7LIP4OOMLtFYd3Qsu8eVhc2RRX5Ju2j1WWSh2IK6JCQiYo7pcr0fOO6yAyyg dazLHZIypXE6aWAlSFO3RlaetMLnpO0oJ5ybhOdcz/v1+fqULUCvW68s4V/lUhtQv7vKoF nrVnnszt4Wqfp0qBmdiP/Q69FyfF4hI= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="XMhlPa/G"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf19.hostedemail.com: domain of yosryahmed@google.com designates 209.85.219.50 as permitted sender) smtp.mailfrom=yosryahmed@google.com Received: by mail-qv1-f50.google.com with SMTP id 6a1803df08f44-6cbd1ae26a6so1836356d6.1 for ; Wed, 30 Oct 2024 14:32:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1730323948; x=1730928748; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=tmQdUbtLnzYjF/xLQA+VMZk9QzQVpT+DQcxpiDhLHm4=; b=XMhlPa/GpDnbJUvglMRy1qylAtYAOZKNEjO/0+FQJPUGPh55hWfqGoRmCASwcabiBg 58vtQnDaM/jI+b7RO8qPpLvCTcbIUoRAMdxkzobR9IOKjfBZrTQDbt9/rV8DXn3sfP7a UZk+2neoLilMT2ebsGY8+YnengkBSbMgQk7AAMjUba+BPEX9iRPp6GSPUsJUCE8BCclD 3HcPA/GsYbEcHDaMFNkHcJTINVdDBayBAQ1EkGZwaYc01qSxDgkoi24odDAL2V/VFeJr A26B5hyhNb0WhSiiJLfcUceEjzK0GHBK5SLdLCmn6amDFCUMzV2umecCe47LFD4UkFGt qukA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730323948; x=1730928748; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tmQdUbtLnzYjF/xLQA+VMZk9QzQVpT+DQcxpiDhLHm4=; b=UydskD8jPJY/9PzqRTvmg99rYVJblohK+laQOl5bMcRcVjA4CjUGaRkVa0mP6b7yZw UAnvodsKTlhTpZZLv4HgVXjyQcDl3tcpqNDoyyHumpMeWv6gsgJjcoAM3Ul5htL6MaN2 qBHjuNMhgdrUbhFbj6KO8Kngtp8L2zeXHwV+AFm/govSR9vP7T7UBIxUDlul+7sEVkKb tFW6xls68wXMx2aKVz3K6kFAJm0TlkebI4EQ2wvT3bTqzGyzQ3bG/OcqYzS5gOXMcTSi QEU4icjvnKoaTgHYr4I51eqnAkLKsBO8b42vx7CfuoiB4+fA8BUMZfY1jcuTzoNK/YN5 wHaw== X-Forwarded-Encrypted: i=1; AJvYcCWH4dAdmWJkTvz0YgPjE3N1U18uBz7vUwxD/6p3PGGh3yWLmTSXEnk7AYrgLUE97lNNtsDtxys2AA==@kvack.org X-Gm-Message-State: AOJu0YzNtpTJF9BUyVK397ObDamyg7vlNVtZYKA502Ic0Hv7eONcpUV2 jbHoMLzq9gEKHaxpFmfs5TEvto8VvfJU6smwZdCj7ipOM2EPZnQSE7OnofE0ZBEvEd3cHLI3gMF dWFrMpKfPDoDekOUTEwCWesR6p2z9Pa9uufIWBzJ2wNwpETsw3Q== X-Google-Smtp-Source: AGHT+IHhPc0t3DXIjTXga1D1bgTqV3v6iXrlbt5OARx5K1uEA7GiXoaRC3cOoo8s2D5UuAdlcyfBv9zOj5lvCzqfJfU= X-Received: by 2002:a05:6214:458e:b0:6d1:77ce:6a3c with SMTP id 6a1803df08f44-6d185832ab5mr293513116d6.38.1730323947839; Wed, 30 Oct 2024 14:32:27 -0700 (PDT) MIME-Version: 1.0 References: <20241027001444.3233-1-21cnbao@gmail.com> <33c5d5ca-7bc4-49dc-b1c7-39f814962ae0@gmail.com> <03b37d84-c167-48f2-9c18-24268b0e73e2@gmail.com> In-Reply-To: From: Yosry Ahmed Date: Wed, 30 Oct 2024 14:31:51 -0700 Message-ID: Subject: Re: [PATCH RFC] mm: mitigate large folios usage and swap thrashing for nearly full memcg To: Barry Song <21cnbao@gmail.com> Cc: Usama Arif , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , Kanchana P Sridhar , David Hildenbrand , Baolin Wang , Chris Li , "Huang, Ying" , Kairui Song , Ryan Roberts , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 29DD41A0022 X-Stat-Signature: 4fwro8j1on6ity3dpe5o8wj8xcoo5mnh X-Rspam-User: X-HE-Tag: 1730323915-197156 X-HE-Meta: U2FsdGVkX19TNVBNrwFguBy7PuMTq1/733nJA34LvtI8lKJ2pYuM5RUKBloxRtoEJqhcII5KygkHbT0Reetw/kTXct8oXpjAOlhqZFIv8XSwzMSzPrDbIy7hUWTN7jV/0VijLcHIKk/ypUMZpB25XVeZWejyBw1FDZSLXUJndPRrI7Bn7E2QVgHBi3Bqwkf7P6H8ngY/Zh15GN9nihB4uN9m3my9e//zUJ+U1MST0w8f/SeLucFZLTgaaEYTWeGGBCEXUg1skuFvKNpy0WwT8GTkiM/lNGQ+oFU4qbvJ5uXgFXxKExRIX7NgfO5xDtsjppGBzR3tyalE7g0ugPGQeQgbawVVFCZW0GJ3tWZDlQJuA3d3bqGDz+XAI2550+OUaVFRTeLD7vOH9iPysQrmAD7zVgYCKb91FkLPSBhVmKQLUWSc6TFqwQmww0qXXFkfVZ0T9elhLCUnKYXo8tyNN3PsY8Uj8IEN6WEVQnBuj3Am9dJH+eF9XybI/h0aikBrF7y5CHraBOA03BBC+rglixHXKRCH8/x3hOXIkZpPtZpNDUZ4MQq6SkO0fEEf/7Phiz25+z3f7hU5Ab6xtBoPUuc8ap995PCsjnMV5odKY1lEZI3wIpdoOq0V3Or+oVMN/HkxaKoYle5Fdi19GBxrBRc6xwnkWbCvberQ0/C/TR6MKRxTCXxY/Ll93BPVX1u5w8TLa7eCzw6a3gZlxnZ2vq4yVK4co14Be7DDKrktkq3/AMv/w5nQFH/WZPuMlGBxAzwTsGcHogchFgDzFAqPseJPBKznWMuFInZnJ8qu6Y5u8Gfnke1ZUN31/T/0PHvqQTVqSXrG2yyj9c2q19be5fdcdCOPMSxwgdIoCEdorAZ2Jvg2w+r1OAXvykh0MYlXBuq+RMX4/L1QOjW59aCfTh4sF8sdw7pth8rfbqIkBafzv2DJZI/0WpQAfWZD9GWdWyyTdSLz2ql8iWAD2wg KLCGfuQQ EK5UFiRPElXqE+dlca1sRxwx6KfeGsqAh8KlyhgwsuNyYNlBIoA3/uvaeQx8O8BqDXrOi+su/5qr2sSNVjAMjdwu7pafofS2uVmrPwCnxsRJ0BnCfURCdtONdux3/eQ1WLUMOzKR5WPl2ekwRwd+J3jICeVggoU/+UXzFfRR03/AMLxl8MB+ATNPjzHSBZfP1SPo1Eiu+GeyH42UgwrYSunz+NoiwPYKAdNYi7ntEI1HdP64hEujoJMr5ghBahjy5Q/zy0o9/293n5Hg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Oct 30, 2024 at 2:21=E2=80=AFPM Barry Song <21cnbao@gmail.com> wrot= e: > > On Thu, Oct 31, 2024 at 10:10=E2=80=AFAM Yosry Ahmed wrote: > > > > [..] > > > >>> A crucial component is still missing=E2=80=94managing the compres= sion and decompression > > > >>> of multiple pages as a larger block. This could significantly red= uce > > > >>> system time and > > > >>> potentially resolve the kernel build issue within a small memory > > > >>> cgroup, even with > > > >>> swap thrashing. > > > >>> > > > >>> I=E2=80=99ll send an update ASAP so you can rebase for zswap. > > > >> > > > >> Did you mean https://lore.kernel.org/all/20241021232852.4061-1-21c= nbao@gmail.com/? > > > >> Thats wont benefit zswap, right? > > > > > > > > That's right. I assume we can also make it work with zswap? > > > > > > Hopefully yes. Thats mainly why I was looking at that series, to try = and find > > > a way to do something similar for zswap. > > > > I would prefer for these things to be done separately. We still need > > to evaluate the compression/decompression of large blocks. I am mainly > > concerned about having to decompress a large chunk to fault in one > > page. > > > > The obvious problems are fault latency, and wasted work having to > > consistently decompress the large chunk to take one page from it. We > > also need to decide if we'd rather split it after decompression and > > compress the parts that we didn't swap in separately. > > > > This can cause problems beyond the fault latency. Imagine the case > > where the system is under memory pressure, so we fallback to order-0 > > swapin to avoid reclaim. Now we want to decompress a chunk that used > > to be 64K. > > Yes, this could be an issue. > > We had actually tried to utilize several buffers for those partial > swap-in cases, > where the decompressed data was held in anticipation of the upcoming > swap-in. This approach could address the majority of partial swap-ins for > fallback scenarios. > > > > > We need to allocate 64K of contiguous memory for a temporary > > allocation to be able to fault a 4K page. Now we either need to: > > - Go into reclaim, which we were trying to avoid to begin with. > > - Dip into reserves to allocate the 64K as it's a temporary > > allocation. This is probably risky because under memory pressure, many > > CPUs may be doing this concurrently. > > This has been addressed by using contiguous memory that is prepared on > a per-CPU basis., search the below: > "alloc_pages() might fail, so we don't depend on allocation:" > https://lore.kernel.org/all/20241021232852.4061-1-21cnbao@gmail.com/ Thanks. I think this is reasonable but it makes it difficult to increase the size of the chunk. I would still prefer for both series to remain separate. If we want to wait for the large folio zswap loads until your series goes in to offset the thrashing that's fine, but I really think we should try to address the thrashing on its own.