From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29F11C25B74 for ; Thu, 30 May 2024 18:32:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BA8966B0089; Thu, 30 May 2024 14:32:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B58E06B0098; Thu, 30 May 2024 14:32:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A20AA6B0099; Thu, 30 May 2024 14:32:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 884456B0089 for ; Thu, 30 May 2024 14:32:00 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 35990140B42 for ; Thu, 30 May 2024 18:32:00 +0000 (UTC) X-FDA: 82175906400.05.831D822 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf21.hostedemail.com (Postfix) with ESMTP id 3C0601C0022 for ; Thu, 30 May 2024 18:31:58 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=igDC0ATH; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf21.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717093918; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+wosienzw80djJd0BHBms+BIJn4jKAtii7kQ4lv2klM=; b=xvrjVrmAQzmqzhAQk109XUe+Th2DTVS6Ods2frwIrQAQJ+Jj0cfuIMRqYjxLPQfdCPJn6U H2c34SZHcdFLTKe/qcg5fuFR3hfa6oVv0q154sYnRlvKRQo6/MmycdyLG2C83t00g1qEbu pB2egZr9Nnmm0fIB+i7W+4Rzb+Ft30k= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717093918; a=rsa-sha256; cv=none; b=fuZCmAAJiGrGYEhrd7YiWdTddk4dbY9wuohkMC+Wi+oMMO1KHUJVgYlAiLZn69B+VeQ3dn dlyVUJLd/cUJLKhxEO/vrc036omBp7xyYEgTUx1g61pHr6bPBnOhJ/hTtRSq1XkE2TZ1r8 2DP2ZgL+8d11ZOSjXbqv64iRDOgsXI0= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=igDC0ATH; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf21.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 3646462958 for ; Thu, 30 May 2024 18:31:57 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0990BC2BBFC for ; Thu, 30 May 2024 18:31:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1717093917; bh=Exa+Vd5MGJ5Z8QN6/iFZfytwsDyCBIhtpCeeCY1v3eI=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=igDC0ATHmElk/YvPxUsp8b7/OXVQmV9chaffZ+IjpIh0Lnjgk8hxN2hhgFZBAinNX q+gwTamGCvmSYt+NtkUragnLvUekb8Bn7bsv0fSb9E6mdv3b5LyY8N+01Gc5b5pSQ7 NknIfeypTmRe3rfTdOsqX6Bc/GtLR9oI2p5hwER4T+3+mNZi5AHRw8UGddFi8MKTqA jDmBNkrahy0dB7TcTB0sMRCooA73bCqcdUG2fxVtJmAZjNbl2r1aO8jKodVe8QcqV2 RbPN9WQyUNziE18nvTvcNuQEwUs4VFL+Mt6lbaBotJydThEx5pVusZxFMnq4ssKyh7 1iKhmPe8ZHnlg== Received: by mail-il1-f169.google.com with SMTP id e9e14a558f8ab-371c97913cdso882535ab.3 for ; Thu, 30 May 2024 11:31:57 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCWfq0QhlW/DiMB2jx0yhPMUHM93NdKhbbYKrHcpx/EejQVWv6QN7Bu/UiEZH4eZ+lqClkhJp1D8ygaPJaw/2FE7A6E= X-Gm-Message-State: AOJu0YwXk/L0OPLbiAkxFKU1Ovc+y5FXu+E5+yq/sSCrxB1UJXadYQ8E thfojCol8k2UTxOOz0YDtjrr1Hr0huLcaRoMqJ3dwtyWqOtxXmAnl/DVmGouOI4P0E7oMIs4fA8 nOFuTemFs/sR1wrTDM9Jd4tvllwc03tVa9Zvo X-Google-Smtp-Source: AGHT+IFyhiXVZVJOvERxK1XEwIiQ/eCHWljMIp7VcDvDaSucBm6NPDO9l14qN/4XrsE2buGE40WywlDeGncp8QKLPWk= X-Received: by 2002:a05:6e02:b26:b0:374:61ee:57c1 with SMTP id e9e14a558f8ab-3747df634a6mr33671855ab.10.1717093916396; Thu, 30 May 2024 11:31:56 -0700 (PDT) MIME-Version: 1.0 References: <20240524-swap-allocator-v1-0-47861b423b26@kernel.org> <87cyp5575y.fsf@yhuang6-desk2.ccr.corp.intel.com> <875xuw1062.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: From: Chris Li Date: Thu, 30 May 2024 11:31:44 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 0/2] mm: swap: mTHP swap allocator base on swap cluster order To: Kairui Song Cc: "Huang, Ying" , Andrew Morton , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Barry Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 1qcxo9rsy9to6hhrqta9zjk3acr6xky9 X-Rspamd-Queue-Id: 3C0601C0022 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1717093918-835542 X-HE-Meta: U2FsdGVkX191SoV60hSThL9Wc7t1ue/ebktWTz8Jx82o3ObeUWjiKi51WCtswGRpHZ7FFPQfmSGD61ymgLqPCQGgmTKRIzWue98JOLjukq283X2it3TIKtPCz8rrwSo+fiQghU9ogg8IH1g1nAg7wIK5nL07F4d+G+XfKTlaS9wX5ZWOPFa5VvNyRy61rEUdByACatg8lLXBSdSVdC8zGqwV2M95HXv0dX1X3iKHp9ZsjzhsOrrRVHCnURlkse+9j/v6860JnLWqWTYaZuI3GDGDQrCR/jmaUTbfzFQY0aPPhUc5FTaYOZ43rNN92brDekueMx4Q7FUM7fC0TOClcm0XAts8Cj3BlYSiLDpgk7u7PoRvDxbd9m+5a5N3sC2ZA7iG20wkguCGP5P+jYWgDkHkBkGKtmB77JIjH0YTELHpTuBzVXwoJRoXh7Uz2hX7tTFPQn7hxAFPRIHSMaHhmbXFeGUBQGGPMhsggBtV7SzHcVcNfEeGTKqbBelElI1KO2b8WmnFJ44N7BMg8qPvEVrds/C/OVu2OriEIQtCyHVDyY1ODt+obMSJ7H2K15IFmQrORXery5I43RJ1/3LcyFF0hyryb4lVciDEQeGuE5Xo57VT2OUA7+jJPPpiXjhNSAX2ANtf18ozJC/GyJ76PhLQA6ueEhU+r4q5qLZLQWBk6lw7m2LKVELwTMH+3xUkPVs7xoKp1DUe4RCzuWHxTJSl4NxbH3JPFWfMhYsNe3+uwFLOrRumHqnrIpvIWX4WWc912r6edyVhSPnk80yPmOYvnUSQa9ttnrwr76301rKEYTctCLGDzzyvF8Gn1n68g400qDaK88RRmv0kWVtbnsyJExAfqvXNTwP02FvDskP5Ya6GNWg2xxcPAEjz1RCHZvQUXX10U/aBjpLgFSUxVHAaIYWjWW2ag+v8KbxXB30uChWeKiu9E7D7bqgdtaGJ9OJjaBksysrWMRkkY0G Y0jzc4Hi r9LNgx0TO2YAHPnBaA56r6nTm2SpbpId4D+AjNPLQ2ZyCaD4/N2WHn3TvDapTRs13zlbO4tTwjUP+dmLD2FKMQ9021gHos63q2YUnif7yDxw4WCT/k9LuSQ9Yq5L5UkSYoGFs539M72DSenPcFXSYvDiITbtV9VN7+UXVDmIyioNBWYTML3H80XedX5LqN5nvCT2Ko5yN89WRtGaJZOu/Ppy8cXtqvtn1e2o4bb+b+2R2DK4RxPkYqaeFh8cfCzps19BV0YrGDpTNkWFYro6BsNOLTcjXfFb1ONE0GGCKlauXyT7OXPKRAniPbSHpwjBue//3 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, May 30, 2024 at 1:08=E2=80=AFAM Kairui Song wrot= e: > > On Thu, May 30, 2024 at 10:54=E2=80=AFAM Huang, Ying wrote: > > > > Chris Li writes: > > > > > Hi Ying, > > > > > > On Wed, May 29, 2024 at 1:57=E2=80=AFAM Huang, Ying wrote: > > >> > > >> Chris Li writes: > > >> > > >> > I am spinning a new version for this series to address two issues > > >> > found in this series: > > >> > > > >> > 1) Oppo discovered a bug in the following line: > > >> > + ci =3D si->cluster_info + tmp; > > >> > Should be "tmp / SWAPFILE_CLUSTER" instead of "tmp". > > >> > That is a serious bug but trivial to fix. > > >> > > > >> > 2) order 0 allocation currently blindly scans swap_map disregardin= g > > >> > the cluster->order. > > >> > > >> IIUC, now, we only scan swap_map[] only if > > >> !list_empty(&si->free_clusters) && !list_empty(&si->nonfull_clusters= [order]). > > >> That is, if you doesn't run low swap free space, you will not do tha= t. > > > > > > You can still swap space in order 0 clusters while order 4 runs out o= f > > > free_cluster > > > or nonfull_clusters[order]. For Android that is a common case. > > > > When we fail to allocate order 4, we will fallback to order 0. Still > > don't need to scan swap_map[]. But after looking at your below reply, = I > > realized that the swap space is almost full at most times in your cases= . > > Then, it's possible that we run into scanning swap_map[]. > > list_empty(&si->free_clusters) && > > list_empty(&si->nonfull_clusters[order]) will become true, if we put to= o > > many clusters in si->percpu_cluster. So, if we want to avoid to scan > > swap_map[], we can stop add clusters in si->percpu_cluster when swap > > space runs low. And maybe take clusters out of si->percpu_cluster > > sometimes. > > Stop adding when it runs low seems too late, there could still be a > free cluster stuck on a CPU, and not getting scanned, right? The free clusters stuck on the CPU are a small number. Only a handful of clusters. Preventing low order swap polluting the high order cluster is more urgent. > > > Another issue is nonfull_cluster[order1] cannot be used for > > nonfull_cluster[order2]. In definition, we should not fail order 0 > > allocation, we need to steal nonfull_cluster[order>0] for order 0 > > allocation. This can avoid to scan swap_map[] too. This may be not > > perfect, but it is the simplest first step implementation. You can > > optimize based on it further. > > This can be extended to allow any order < MAX_ORDER to steal from > higher order, which might increase fragmentation though. Steal from higher order is a bad thing. Because the value of the allocator is able to allocate from higher order. High to low is always trivil, the low to high is impossible. See the other email having a "knob" to reserve some swap space for high order allocations. That is not perfect but more useful. > > So this is looking more and more like a buddy allocator, and that > should be the long term solution. > In Barry's test case, there is a huge swing of order 0 and order 4 allocation caused by the low memory killer. Apps get killed and take a while for the app to launch and swap out high order entries. The buddy allocator will have limited help there because once cluster is used for order 0, the fragmentation will prevent higher order allocation. Buddy allocator might not be able to help much in this situation. We do need a way to swap out large folios using discontiguous swap entries. That is the longer term solution to Barry's usage situation. Chris