From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 960B7C48BC4 for ; Fri, 23 Feb 2024 03:48:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E0AD6B0075; Thu, 22 Feb 2024 22:48:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 168E96B0078; Thu, 22 Feb 2024 22:48:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F24D66B007B; Thu, 22 Feb 2024 22:48:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id DB9346B0075 for ; Thu, 22 Feb 2024 22:48:57 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 8AC4FA0EF2 for ; Fri, 23 Feb 2024 03:48:57 +0000 (UTC) X-FDA: 81821687514.16.210555E Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf10.hostedemail.com (Postfix) with ESMTP id A250BC00C0 for ; Fri, 23 Feb 2024 03:46:44 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Ss5+OnXO; spf=pass (imf10.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708660005; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=58jsu3tvAip4k0KaCU1aj8yUZQM/b7CCkzQYQ8mzbX0=; b=JPjOhiEm2CqdE14qDkXS1jS01cYzDVuxhYn9CN0cKKjyWND8ePrTE4Cpv24ImHncOs3OTb zQ/wAUlaUaENEpyDdiaKomHH+R0Ea62VYoJhB3eGJfwPTjwmAmOH1Ctm20dvYlC7fcVicg 7agv1k+mU0+oDTOidLSYitbsBkRppTM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708660005; a=rsa-sha256; cv=none; b=DUfxJaG3h2++YkzzkaEQhEREUSO7zBbwTdBP8rPFptQuBcfX0dIrcKGT9/32Ftz9mA/d7K hQWz/UUW0jQTKkegtEPzCkMIeTIjiBGUk8ryQv565xpVvuJO4r8MD/HiWNQSPgXPRegruA a7iMs+CwB/GJSc6JTGCw8V6aQ8WAwGg= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Ss5+OnXO; spf=pass (imf10.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 7204DCE2A8C for ; Fri, 23 Feb 2024 03:46:39 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AD41FC433B1 for ; Fri, 23 Feb 2024 03:46:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1708659998; bh=iGul607Cp7rDkfZ94qKL6w4xt1f9nX9iQReaI0swLc0=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=Ss5+OnXOHt9v6JYmfkpOwLkMePy0RI5IUBZyKFkwQDo1YwbYgU1EPqpM0ZD/HXCNS IoTBgqy113G4J1q7SDG/rNCRJf0RXKJHcQeVPrqHd8pTPBcnv/xipLjuKR7uMK9F+j 9EJpZYqP7ymdhvha5Eod9pabF6n4NOYSa2MFIzx67GGmVGr9iQduTUlakqgLGsyxhi 0XilYrgtCNI2bGZecbsmu/dGLvdnsp3LQXlzYSmWH+JmlyZQypJsrLuETJ2NnObS5v MA9Lck+1zAvGT8xXa43WUAaA16ccrxt2VWS73h6mabAb1uao6881k8/mYW5CgDyens /I3G+VmVXaMGQ== Received: by mail-il1-f170.google.com with SMTP id e9e14a558f8ab-36576488976so1586665ab.0 for ; Thu, 22 Feb 2024 19:46:38 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCUc3k+5oFZMgPx58qoMW6u9jixMqsTnefeiXqdRoedwPLrQM4NAnetawfTuWkmOw6OhQHaUYGfoCKlyblB020K2uMU= X-Gm-Message-State: AOJu0YzLmWrvwlm8QX5TMfcnSYzq5GG1zlIi3on1xcNmPxwgP/YKW6zn h59yKSVS55UQCuJxyFxGlccyjSHFLJUJUsl4aKYLNoINKONrnmObepOq5tAlTltYXiKc+8WcCJD /RE4F1HfNVDdMX5vmaRnrx1FkkjhNdXl7Qt9r X-Google-Smtp-Source: AGHT+IFsxTZLAPCkD0TH5XAL1lgftVp9VE5z8c865bPRirAR/yXtuKuQ6IawvGKBD9vlftWjblFJjx5+nNwDQPOuMaY= X-Received: by 2002:a05:6e02:12c7:b0:364:fffe:44c5 with SMTP id i7-20020a056e0212c700b00364fffe44c5mr1130416ilm.1.1708659997924; Thu, 22 Feb 2024 19:46:37 -0800 (PST) MIME-Version: 1.0 References: <2701740.1706864989@warthog.procyon.org.uk> <1B53E6AF-0EFA-4290-A4CF-CFA7F3BF0E51@dilger.ca> In-Reply-To: <1B53E6AF-0EFA-4290-A4CF-CFA7F3BF0E51@dilger.ca> From: Chris Li Date: Thu, 22 Feb 2024 19:46:24 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] Large folios, swap and fscache To: Andreas Dilger Cc: David Howells , lsf-pc@lists.linux-foundation.org, Matthew Wilcox , netfs@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: A250BC00C0 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: y5dwabftirpdn5ys8razrfro8m6bk4kk X-HE-Tag: 1708660004-252554 X-HE-Meta: U2FsdGVkX18Mk774jTmBh7CAteashjw7JFm72vd4FiXLaDe/6gSCat7XKGznzqhH1nYCt3Oue3PnoTqGgKW+dZRolvfeJl3+r6fuTFOUToH34WpAxQwh3iB5gZdylBwqMz7NygrE94SdX6diAfHUDuV/hBbf5bw44H7CvOHzeLFD0aR5i0pQgjnLZFjAu7N5TMIBUzUn4hjENV8DRll7Xgavi2vcHw63NWAnpoxH9lyh9Y5YESllDuoNieRuywQB1lDMqi1KxephHkWFJD28ZohrjOF5a7N9fP6XYWlBieJQ+e9I4H4j15kwxYHnVD6aUuf5PZMdEJjyzjlDB2KV3b2j8DHMv5edqKhxdSahA9zcj52OUhGfegb927UgLJivCDaotGRp24Oi1TPn17Os4tiyoffO45JYwVi8xyasJz/iM2J189vDu8qRyOWv2jA52ljIo7NaAYxWcMQ83ouPHkF4JrXjXJmNBlgQiMqiNDAAAQwdtqpEuJLHS9JSWXrMNx9elWePl4zbhehTNP+qJH6wIYo4FINxNPHPwwlmWxcAeJMh29uwdRrJYTOwNGx48aAUh32Tf6R4cG8tZXvr4W0UFgWoWHvhS8Yz03fjLaTM0/E61jt8F/geendWVtm09mZP4x+RhFIK00GIfsz3KMXlMVgaCEJANVSV+RMJsfkGO38+AlF5fRsP11Q30BypBupUPpU35dEw82URLNBwYo+SjHH5CNVMU63WaUBwR+KGyB8Hmi3BufietnE6TnBfx8SlRfr7UuiVy2afy0cB966Ar720PTih2AqpylTPU4KmZ0/LEhUrWu879N0v1zaY16FUr1E3A2S3ncrWZU+P3veh4k2mSlDtzTDEkal7eCoeg36fVLEkb6P/dYYfsQ96HcE+nE9R4yCgFs3ipIxy+5s/2GoL5jMN X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Andreas, On Thu, Feb 22, 2024 at 7:03=E2=80=AFPM Andreas Dilger = wrote: > > On Feb 22, 2024, at 3:45 PM, Chris Li wrote: > > > > Hi David, > > > > On Fri, Feb 2, 2024 at 1:10=E2=80=AFAM David Howells wrote: > >> > >> Hi, > >> > >> The topic came up in a recent discussion about how to deal with large = folios > >> when it comes to swap as a swap device is normally considered a simple= array > >> of PAGE_SIZE-sized elements that can be indexed by a single integer. > > > > Sorry for being late for the party. I think I was the one that brought > > this topic up in the online discussion with Will and You. Let me know > > if you are referring to a different discussion. > > > >> > >> With the advent of large folios, however, we might need to change this= in > >> order to be better able to swap out a compound page efficiently. Swap > >> fragmentation raises its head, as does the need to potentially save mu= ltiple > >> indices per folio. Does swap need to grow more filesystem features? > > > > Yes, with a large folio, it is harder to allocate continuous swap > > entries where 4K swap entries are allocated and free all the time. The > > fragmentation will likely make the swap file have very little > > continuous swap entries. > > One option would be to reuse the multi-block allocator (mballoc) from > ext4, which has quite efficient power-of-two buddy allocation. That > would naturally aggregate contiguous pages as they are freed. Since > the swap partition is not containing anything useful across a remount > there is no need to save allocation bitmaps persistently. That is a very interesting idea. I saw two ways to solve this problem, buddy allocation system is one of them. The buddy allocation system can keep the assumption that swap entries will be contiguous within the same folio. The buddy system also has its own limits due to external fragmentations. For one there is no easy way to relocate the swap entry to other locations. We don't have the rmap for swap entries. That makes the swap entries hard to compact. I do expect the buddy allocator can help reduce the fragmentation greatly. The other way is just to have an indirection for mapping a folio's swap entry to discontiguous swap entries. It will break more assumptions of the current code about contiguous swap entries. If we can reuse the ext4 mballoc for swap entries, that would be great. I will take a look at that and report back. Thanks for the great suggestion. Chris