From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7B00C3DA4A for ; Mon, 29 Jul 2024 21:56:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 536346B007B; Mon, 29 Jul 2024 17:56:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4E5A06B0083; Mon, 29 Jul 2024 17:56:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3AD2F6B0085; Mon, 29 Jul 2024 17:56:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 1E99C6B007B for ; Mon, 29 Jul 2024 17:56:18 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 308CD1606B8 for ; Mon, 29 Jul 2024 21:56:17 +0000 (UTC) X-FDA: 82394149194.21.E50FD98 Received: from mail-vs1-f49.google.com (mail-vs1-f49.google.com [209.85.217.49]) by imf22.hostedemail.com (Postfix) with ESMTP id 6A9E4C002B for ; Mon, 29 Jul 2024 21:56:15 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=F4BDIeyC; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.49 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722290135; a=rsa-sha256; cv=none; b=KL1uTeJ5Mwcf4JJNb4NgsTSuKYkTOYs6C+zgJg5EIS7uCvXMIWMG44I5LpozFz81F3swx6 CivbGWu1V0wYTXTw0A6nq/xk0QQPSADjmJn4T3co1O5GohAMGk6/hckSCl+1PyuxWxt+93 jCHDtrrshDUxw4Ub/VeKl94kh+/e9qc= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=F4BDIeyC; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.49 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722290135; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=buKYfD2vEc4bJOlfBlCE01OD2iMJ6++Hrcgkg0+V6g0=; b=lq6ubZZM0D7VWK3KZKW1gvyiIRT0j6MyqpKErlGg+hPP6DmmZukyExaU2LIWPySjwITBk/ X4fJrokIWENfJ3uUVGzonPHKjbm4X5yc7MNtfi6lNQAVf89c5tMBBl9saXE6CndOV49LI4 /p+uqqNRSIOpWbtA84LXjE5jb5bDUmg= Received: by mail-vs1-f49.google.com with SMTP id ada2fe7eead31-492aae5fd78so817790137.2 for ; Mon, 29 Jul 2024 14:56:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722290174; x=1722894974; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=buKYfD2vEc4bJOlfBlCE01OD2iMJ6++Hrcgkg0+V6g0=; b=F4BDIeyCnI1ODndkVirdF3NYpV6zVVFia5c6dRGW3BjvADjoJaVGmtiqATjtZXV/AY DHl4uv+mCs10H24eRaE1OLDHWGS8xaIBGQhwwm1YYlu+3h+BMRayhl0NtmuguQusjXgh IKKhqiGBd9VdMgZxTPBdfCHnkF2OkouaLfz+j0lkcDP7MaaCz0BHxv9jTuJzSSv9ZjdB F9fXlX3aftGtcPFh0myCaUtHIHVkbZME6ZFo5sF1y2tNPtXpuNuP/XzjuE7Y1a8noLFN D2YZrfEuUChieFQjuz9HWbDi1CloczGsm+a/jCTFXmKv/a53t0tnnraNp1HAALSYtgUc 1UkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722290174; x=1722894974; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=buKYfD2vEc4bJOlfBlCE01OD2iMJ6++Hrcgkg0+V6g0=; b=EL8rRhELBhSSg2uiy8k14XqxwDkgOSAoNlPU/fhtiOk7nQFeOpc7XQdjybQgaz2Z04 J1Iq/w2/glkaWKzVhersn5AZ8xwRtUwNB+F/exJFiGso1Qp90WGSk8tw1enJ7AT+SjJ0 90psE4TTSHyyEBcy0QZh8x94S7jnUsCjkUIn4JbfmsTtlS9WXWDq/jtKEw/xUh4yjiU5 LEkmgB/cgfGVHrf8y1YRRAir/UOfZiubHZJHWO/qyZ5OhP0MMzIounSG2G/PE16beI2I qDyo1ld2xZt0HBCdlk/V2FagJiVWpW24XgIUbgv5THFExDZpHPq3xdHZfSwbwlnjzEpq LtwA== X-Forwarded-Encrypted: i=1; AJvYcCX00Nuh2Ys36VQHMfd3NFz32hORMDPNvspAJSo/IMwa8E6tI3e4ByFOlBVhd3B8mcn/JnTYDWoP4RffIrAlwJUNuF4= X-Gm-Message-State: AOJu0YydKtldbjs/7hMjKQfYBhIXIF5hnjBA+teyWa8YRODtX0eEzDL6 W8wGN9GHED9NODwQUoxz4ud7ZDUuE2bc7EUUVjGRRpZYBqp4CI2cafNbaRZVEtWO8PhpkzVmIQv 6BLkWxv9Q2Pr8vPBSawJsMFk5eUQ= X-Google-Smtp-Source: AGHT+IHz+04Nm+a7KXjZa3Dhb86GCyTqxzeBIw35gjeoXs7Tw9QapympkWZYNkDA8b9tTILb177mq2VC3NwQ3U6nJJI= X-Received: by 2002:a05:6102:3589:b0:48f:e5d1:241d with SMTP id ada2fe7eead31-493faa40575mr11169360137.14.1722290174336; Mon, 29 Jul 2024 14:56:14 -0700 (PDT) MIME-Version: 1.0 References: <20240726094618.401593-1-21cnbao@gmail.com> <20240726094618.401593-4-21cnbao@gmail.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Tue, 30 Jul 2024 09:56:03 +1200 Message-ID: Subject: Re: [PATCH v5 3/4] mm: support large folios swapin as a whole for zRAM-like swapfile To: Matthew Wilcox Cc: akpm@linux-foundation.org, linux-mm@kvack.org, ying.huang@intel.com, baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, hughd@google.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, mhocko@suse.com, minchan@kernel.org, nphamcs@gmail.com, ryan.roberts@arm.com, senozhatsky@chromium.org, shakeel.butt@linux.dev, shy828301@gmail.com, surenb@google.com, v-songbaohua@oppo.com, xiang@kernel.org, yosryahmed@google.com, Chuanhua Han Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 6A9E4C002B X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: pzdswe8msmeds1fxopho7oafjso47cna X-HE-Tag: 1722290175-861294 X-HE-Meta: U2FsdGVkX1/LM+o83gD8Ooeq70iMXDUzX0FdTrPF1C6q0kxlxa/a/ReHMuL9MeQTjoz4GsdZFLlhNeX9tAfsfBeasp6OX0Gyaguz72K1A/v0lCmfuZ3I0KKoQCioF6C9UtE+GYf89f8rTDjwVqDFuYyFj+Ll2anI4h+e2vaMWXJGLTRi+6lhyuA7aodgCJTGAk6tXuYUSZdbYlU9Qkb3Xf0oH68N9GW40VDwInhH8n0dU+qPYoGV7VzvOMOjfgwcf/zlpa1qjpX1wk9wJd5aF6WAcmt3Wi+nyhqiycHnTMa60VGkA0k+yjYRmFqfVujY93lZ/7e6yIfLwZqUqL44HITd3m4AzgmXTz1Xd/oB/gTmjN5+azjdK76i5quNb4mjnT+PPe0nZ5gWwdyyDUjwr5IixyEQ9qngxq35ur2J/RjiZAwJK2tc0g+7GsAPTdda1dMhqul2mI7BhpH2oL0f518BB+MMyBQ2417wccZjEPPj9MS6JNpWruZWjN6fICI2BRBupaWzPCQLouxkiAd0e5WJ0ajNwq/9zTjlox8uvmIGpvXnmZOHdcKpBC6U8v9V/gQO7HKhlFv/PoE20ja7NqA/q8aJMqAIiRLDSJ8f6z4vQ6rpqPL1KUo2wKWiaU1KR8b+5w9OG7u5KITaYNHNa4hPP6FGDVNY9lOkB2PnEKV9Bl89MNw6ZXCQJag89ZctdUm0CpC3hPMLIKRJxRr7Q4J1WVXANBptXzb86fGSdOhzUWWYBoyCLfgug4M4FCnWhmt5rIqqz5Tjw2xCNMt4mwJQ3aAKIfW3MvYVeRulBLS82XQewgjE5gDyXfwnckLhyapVjXRGXjHy2Brk+VPY/+XvOXvenl33dC1cocz27xSbWDX8IsP/HVh4qFqO157lfnocoWCiDIa1eNmczkOsgl2JhIJzmDaWqO8tAdoWVdhQaWuRppMRExwrAB0HN9GmiKvEtdSbCKZ7T77W9nc oOQVN/Px L6xt7vspyFdGThS81rrhAxhuhZxyw7GWNDxfRsm7cK7Z6er0q/id0CIiF3X2l5k+vorqXN75TmJxQemK6V1faGwdMayITURpKAbeTQDqqtTY5vCvjleMu0wIabH167xpFNIk9x1yEa9tDlp3/FhcvMKqXLHOC9s+ccwqxpCGRbsz9orREwnr7mIntPY/HPO5sdNOTV43PiDXS0S7shJoez4zqLpyDPwJvWbFm+MQbcRH85ZWnzRXBGo1diTETICy3dXY1WTQDjwLyiF9taKT2SeAwWX5buXwZymOFnszhmNsPAWJwfev32zXgbyZbL1kFpjToPQ11gLo2d18EU3bQ67x87JQLOKsQ+GGOPwCPZ2amP6F3ko2vSPameoZcDYa47LLO X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jul 30, 2024 at 8:03=E2=80=AFAM Barry Song <21cnbao@gmail.com> wrot= e: > > On Tue, Jul 30, 2024 at 3:13=E2=80=AFAM Matthew Wilcox wrote: > > > > On Tue, Jul 30, 2024 at 01:11:31AM +1200, Barry Song wrote: > > > for this zRAM case, it is a new allocated large folio, only > > > while all conditions are met, we will allocate and map > > > the whole folio. you can check can_swapin_thp() and > > > thp_swap_suitable_orders(). > > > > YOU ARE DOING THIS WRONGLY! > > > > All of you anonymous memory people are utterly fixated on TLBs AND THIS > > IS WRONG. Yes, TLB performance is important, particularly with crappy > > ARM designs, which I know a lot of you are paid to work on. But you > > seem to think this is the only consideration, and you're making bad > > design choices as a result. It's overly complicated, and you're leavin= g > > performance on the table. > > > > Look back at the results Ryan showed in the early days of working on > > large anonymous folios. Half of the performance win on his system came > > from using larger TLBs. But the other half came from _reduced software > > overhead_. The LRU lock is a huge problem, and using large folios cuts > > the length of the LRU list, hence LRU lock hold time. > > > > Your _own_ data on how hard it is to get hold of a large folio due to > > fragmentation should be enough to convince you that the more large foli= os > > in the system, the better the whole system runs. We should not decline= to > > allocate large folios just because they can't be mapped with a single T= LB! > > I am not convinced. for a new allocated large folio, even alloc_anon_foli= o() > of do_anonymous_page() does the exactly same thing > > alloc_anon_folio() > { > /* > * Get a list of all the (large) orders below PMD_ORDER that are = enabled > * for this vma. Then filter out the orders that can't be allocat= ed over > * the faulting address and still be fully contained in the vma. > */ > orders =3D thp_vma_allowable_orders(vma, vma->vm_flags, > TVA_IN_PF | TVA_ENFORCE_SYSFS, BIT(PMD_ORDER) - 1= ); > orders =3D thp_vma_suitable_orders(vma, vmf->address, orders); > > } > > you are not going to allocate a mTHP for an unaligned address for a new > PF. > Please point out where it is wrong. Let's assume we have a folio with the virtual address as 0x500000000000 ~ 0x500000000000 + 64KB if it is swapped out to 0x10000 ~ 0x100000 + 64KB. The current code will swap it in as a mTHP if page fault occurs in any address within (0x500000000000 ~ 0x500000000000 + 64KB) In this case, the mTHP enjoys both decreased TLB and reduced overhead such as LRU lock etc. So it sounds we have nothing lost in this case. But if the folio is mremap-ed to an unaligned address like: (0x600000000000 + 16KB ~ 0x600000000000 + 80KB) and its swap offset is still (0x10000 ~ 0x100000 + 64KB). The current code won't swap in them as mTHP. Sounds like a loss? If this is the performance problem you are trying to address, my point is that it is not worth increasing the complexity for this stage though thi= s might be doable. We once tracked hundreds of phones running apps randomly for a couple of days, and we didn't encounter such a case. So this is pretty much a corner case. If your concern is more than this, for example, if you want to swap in large folios even when swaps are completely not contiguous, this is a diffe= rent story. I agree this is a potential optimization direction to go, but in th= at case, you still need to find an aligned boundary to handle page faults just like do_anonymous_page(), otherwise, you may result in all kinds of pointless intersections where PFs can cover the address ranges of other PFs, then make the PTEs check such as pte_range_none() completely dis-ordered: static struct folio *alloc_anon_folio(struct vm_fault *vmf) { .... /* * Find the highest order where the aligned range is completely * pte_none(). Note that all remaining orders will be completely * pte_none(). */ order =3D highest_order(orders); while (orders) { addr =3D ALIGN_DOWN(vmf->address, PAGE_SIZE << order); if (pte_range_none(pte + pte_index(addr), 1 << order)) break; order =3D next_order(&orders, order); } } > > Thanks > Barry