From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76B00C4345F for ; Sun, 28 Apr 2024 17:46:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E78816B0088; Sun, 28 Apr 2024 13:46:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E29596B0089; Sun, 28 Apr 2024 13:46:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CEF476B008A; Sun, 28 Apr 2024 13:46:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B0A476B0088 for ; Sun, 28 Apr 2024 13:46:10 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 3AACB4151D for ; Sun, 28 Apr 2024 17:46:10 +0000 (UTC) X-FDA: 82059669300.05.92E510E Received: from mail-lj1-f178.google.com (mail-lj1-f178.google.com [209.85.208.178]) by imf27.hostedemail.com (Postfix) with ESMTP id 7E55C40009 for ; Sun, 28 Apr 2024 17:46:08 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jAAjakTV; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.178 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714326368; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Urp8+z3N4hHCle9QkhlH7xL+m+5t13Mh2RYPMaU6QnY=; b=vanLNz+f1q2VdL+/eTbpiZIrX1/xWb1d5EvQjkJzNXbNYjLesjxjDZ3iJ/Av/rc0VEoq4H zEenuwakIGvLzngqK2SdnzomyauGjuNjIXjglBAnAdexPxr3pcnl/aXWriDZF7F8P2n146 aQHjF9EoKya6H506ZqUMxpJe0EPx+kc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714326368; a=rsa-sha256; cv=none; b=x0LYL4G7/OkrL4AJb9CRW5gN7g9DqlpXVgPZHTpUy92Tk2+eGAuYaqZ7N3rcDq0hXWGGlU g95nVNMTyH1NI9qVnfcoIWX9iXY5eU0jk4z++XGv3BBfTQ+ZCnxsUf++U5tl/OGrklf/aY SXaQVmL4QvzdmI293thdNbIlpUhj4L0= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jAAjakTV; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.178 as permitted sender) smtp.mailfrom=ryncsn@gmail.com Received: by mail-lj1-f178.google.com with SMTP id 38308e7fff4ca-2def3637f88so37267831fa.1 for ; Sun, 28 Apr 2024 10:46:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714326367; x=1714931167; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Urp8+z3N4hHCle9QkhlH7xL+m+5t13Mh2RYPMaU6QnY=; b=jAAjakTVI3BgRjHetzxmBfpfPpMhDCp7FKhmIbFDG3l/UMRqmhwKOu5kTMyg8sxokh xeRlAPSeTJO+V8m6spY/Z/GETPmGoteyuYQHo2pku5edAdDeeTCntSwC/ZJiw/Z1l8Vs Vu02LxC3c/30Wb55pXcZqL/dD4UD6FBjGc7nNs57b+mmeohLXgYG4+J1zS1T0CtjD+a0 9oGb1BNyZGGUJi3ViLExBbvOuANKYr8H+a0VQqMzmbhjfBenFnD/1KIXCvGqgiY2kA5y BokMRRlXWvDC22n+TC5J9XGPUTW4V21bh2L2ZfzS+bjFjAdImaYbwR57hbYttjR/MMsa zU3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714326367; x=1714931167; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Urp8+z3N4hHCle9QkhlH7xL+m+5t13Mh2RYPMaU6QnY=; b=nzCT8rbguDzQuG9senChCw7qF06dhC0Ej9/vhvp2AHJJUZdF7jM2zRyd6UGuApzjF5 OYdYqiZucBjWa0EvsEYQkackm+rlE3IEhMjUApWZlyhsd7Iors5ggMS4M4/Vg4ZAsIxl fB7sXT/oRcMsk+hkhgZZ4Get2FLoAQrBMYgBRK/7xyTOj8m6L+Ap5bQT8dB69Zbunal9 cWcThZ8Uwz1BNIJbsKzJ8C8sPdD1uNc/ff2eyXo0KIqrkvTn6jaGMduZBqoNR1TFVw0q aB3Kil0pF6zFYjhNBQ+WBlM6bNDD3ol1lzw+taORUJ5MmgVtjAi9lNxa19G+oc2MhnxL tCgw== X-Forwarded-Encrypted: i=1; AJvYcCWZ9KXdBLsRVnvdSDmYlIZX0vheF0qaSyKJDFp1BE8UJV1SpUjmm5aPkNVUDnHU6Hk0dNPq9h5GftO+yiLiuXErdCw= X-Gm-Message-State: AOJu0Yz2uirlZM+Z/zk30zz9I0WuM741SG4QPVqdwDpDO0FzEI6THGqX VeNee/LbaqDjuGM4ao04klW02b1mGtV27/9pe9nxlhGSARJY89N5Gdy4R8jaIAL/F/ebGT7aNUv 2sb0kU2GkafP/J/L97d/3x2YD1E0= X-Google-Smtp-Source: AGHT+IF0hGkqlSiNpnCt1/7PlPxOXq6YivEUUbJxPWUGFtsMDlHgo9AM7RRFS9D7lB6xZ5hH4/+Ib+UmmeONZcU/zkA= X-Received: by 2002:a2e:914c:0:b0:2df:b42:84fc with SMTP id q12-20020a2e914c000000b002df0b4284fcmr2778845ljg.10.1714326366516; Sun, 28 Apr 2024 10:46:06 -0700 (PDT) MIME-Version: 1.0 References: <20240417160842.76665-1-ryncsn@gmail.com> <87zftlx25p.fsf@yhuang6-desk2.ccr.corp.intel.com> <87o79zsdku.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: From: Kairui Song Date: Mon, 29 Apr 2024 01:45:49 +0800 Message-ID: Subject: Re: [PATCH 0/8] mm/swap: optimize swap cache search space To: Chris Li Cc: "Huang, Ying" , Matthew Wilcox , linux-mm@kvack.org, Andrew Morton , Barry Song , Ryan Roberts , Neil Brown , Minchan Kim , Hugh Dickins , David Hildenbrand , Yosry Ahmed , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 7E55C40009 X-Stat-Signature: a8skrtkfj7r8mhs5bpb9836g373tyjod X-Rspam-User: X-HE-Tag: 1714326368-558947 X-HE-Meta: U2FsdGVkX18LlqeIWSC/Dyuo0kGsy8NCayn3pePl9fnkgK+H/YnBpMZx7W47nJJBKKiCGFlWcMUOYoFimVl/KDJ9NDA67FsKjlxhDbQp0L9rYnNRfB8hkrN0hm5w5Cu9qCP0pWE22i8I8wLAbObyqJMbQiTFEn6UCJraUgGY/k3iRIMEhPnCg2u/fAvWVNk6BzzrXR+4wHXCJsSK7gRgDs0rnnmCwqmlGP5RUvv6jE4UJQXbG6YR7ENOnN103t43GSxcW8wqcOUYvrG5NfEh4PJXcBxg+EVUDUmPoZhE77vTKg9dJEHOUFU4LyLi4wnIWZrCkT+wym5VTqd3omJBRIKfw6IZOmpEdwHVDJy8cLQ844rCD+uw6LQnU1SMO4gJHpkntO5Joy5i6wMQ41+6VG+vLZkVhcLASx7PugKhlTpD5qdNpr03w0iakH1hhlFiUam/Q4BALavinkwHQhOziOEUwkpaD9ZlP7HjyjMeq5dJWdKQkPch/QVeDvd99XWvCReR5ixagPK1aZ2M0Tig36PzoSN9BM512bWh+Vkq4SEDuR2iTfw1xI2Jd7Xzryb72iiJOklmyfLDc8k9dSq8SXFzm13YMkeSBvWyTmgjlPHIUqxvMaA8I6DDmm5205u/JAvO2ET6X1SCqMA/cy68PfQ1nGTZlCEGgloepA2pyHzvxCcXU2CCoy9uc22Ahj49Vr1IRc7B6ZlHXYw+0aO6aNYyQOtjwaSKCk+X13FWQbLfPlFdkh7aefBMUGkQomeWYaVdtOKmImWWI9AyFQ5ZguT6ZjCNB9KgAOgHtWH2KkM2pEobl4xv0rYAwpbR46PakAnQGLf7QZENv1Fcl8YUdiScmOmuMCHxGTbrYYAWqTsKloUo2RRtxF3MWGyE2GWofucbZg2U00JDYOsOVvJV6EMiRLulcTZBsoNJV3j1tl9LdSyK8l6U6nps3cvxliEf0uSO7BKVi9l2iQNWVh6 cZm6ohnD nN63GWde4yoEziQDoFFkWFuV6EdkvVdoREIMO X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Apr 29, 2024 at 1:37=E2=80=AFAM Kairui Song wrot= e: > > On Sat, Apr 27, 2024 at 7:16=E2=80=AFAM Chris Li wrot= e: > > > > Hi Ying, > > > > On Tue, Apr 23, 2024 at 7:26=E2=80=AFPM Huang, Ying wrote: > > > > > > Hi, Matthew, > > > > > > Matthew Wilcox writes: > > > > > > > On Mon, Apr 22, 2024 at 03:54:58PM +0800, Huang, Ying wrote: > > > >> Is it possible to add "start_offset" support in xarray, so "index" > > > >> will subtract "start_offset" before looking up / inserting? > > > > > > > > We kind of have that with XA_FLAGS_ZERO_BUSY which is used for > > > > XA_FLAGS_ALLOC1. But that's just one bit for the entry at 0. We c= ould > > > > generalise it, but then we'd have to store that somewhere and there= 's > > > > no obvious good place to store it that wouldn't enlarge struct xarr= ay, > > > > which I'd be reluctant to do. > > > > > > > >> Is it possible to use multiple range locks to protect one xarray t= o > > > >> improve the lock scalability? This is why we have multiple "struc= t > > > >> address_space" for one swap device. And, we may have same lock > > > >> contention issue for large files too. > > > > > > > > It's something I've considered. The issue is search marks. If we = delete > > > > an entry, we may have to walk all the way up the xarray clearing bi= ts as > > > > we go and I'd rather not grab a lock at each level. There's a conv= enient > > > > 4 byte hole between nr_values and parent where we could put it. > > > > > > > > Oh, another issue is that we use i_pages.xa_lock to synchronise > > > > address_space.nrpages, so I'm not sure that a per-node lock will he= lp. > > > > > > Thanks for looking at this. > > > > > > > But I'm conscious that there are workloads which show contention on > > > > xa_lock as their limiting factor, so I'm open to ideas to improve a= ll > > > > these things. > > > > > > I have no idea so far because my very limited knowledge about xarray. > > > > For the swap file usage, I have been considering an idea to remove the > > index part of the xarray from swap cache. Swap cache is different from > > file cache in a few aspects. > > For one if we want to have a folio equivalent of "large swap entry". > > Then the natural alignment of those swap offset on does not make > > sense. Ideally we should be able to write the folio to un-aligned swap > > file locations. > > > > Hi Chris, > > This sound interesting, I have a few questions though... > > Are you suggesting we handle swap on file and swap on device > differently? Swap on file is much less frequently used than swap on > device I think. > > And what do you mean "index part of the xarray"? If we need a cache, > xarray still seems one of the best choices to hold the content. > > > The other aspect for swap files is that, we already have different > > data structures organized around swap offset, swap_map and > > swap_cgroup. If we group the swap related data structure together. We > > can add a pointer to a union of folio or a shadow swap entry. We can > > use atomic updates on the swap struct member or breakdown the access > > lock by ranges just like swap cluster does. Oh, and BTW I'm also trying to breakdown the swap address space range (from 64M to 16M, SWAP_ADDRESS_SPACE_SHIFT from 14 to 12). It's a simple approach, but the coupling and increased memory usage of address_space structure makes the performance go into regression (about -2% for worst real world workload). I found this part very performance sensitive, so basically I'm not making much progress for the future items I mentioned in this cover letter. New ideas could be very helpful! > > > > I want to discuss those ideas in the upcoming LSF/MM meet up as well. > > Looking forward to it! > > > > > Chris