From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48077C4332F for ; Thu, 17 Nov 2022 00:13:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B48CF6B0072; Wed, 16 Nov 2022 19:13:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AD21D6B0073; Wed, 16 Nov 2022 19:13:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 972A66B0074; Wed, 16 Nov 2022 19:13:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 80AB56B0072 for ; Wed, 16 Nov 2022 19:13:36 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 4F233AB655 for ; Thu, 17 Nov 2022 00:13:36 +0000 (UTC) X-FDA: 80141010432.20.26AD490 Received: from mail-vs1-f52.google.com (mail-vs1-f52.google.com [209.85.217.52]) by imf22.hostedemail.com (Postfix) with ESMTP id EB5A9C0010 for ; Thu, 17 Nov 2022 00:13:35 +0000 (UTC) Received: by mail-vs1-f52.google.com with SMTP id a6so10071vsc.5 for ; Wed, 16 Nov 2022 16:13:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=p60mQQt7lKMxdUbWTYIfqr3zseHJU2f5gBu7KHIlyNs=; b=oh+2IZDFhML8W2mNekls3YoR4RZHfwqfgMnQHIihfqc5JjIFJKPHD1trCMZSXLAo+w ZdrNZoUdnvee+GJulKtAN442NxLY8YHnBTFz7q2T90Ej+beB2CSSMciBJLEf7a/A9940 ZmjP1H+DXX4icUNUKnz+budLdrIm6sxNknoclH1ZCYZar7PnvzIvl86wsjm2mVtGB4GW zhpcz/cINqkPDuM7RF5L2Qls/OYghd8auslsN5fU6AJOwdyxjkWLMuL2otnFg8GW8Wm+ vfMOcf7DdjmhHHdl92kDYqIzLAeEZKHqddghkgobVVrsbjPtb+x4gJ4M2ZBdNBjz8s+e DiXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=p60mQQt7lKMxdUbWTYIfqr3zseHJU2f5gBu7KHIlyNs=; b=MSvsdW9N87/WKXF9eKuWWGC2ar2mZM4dN0GbLFPHN9R0yRcrx6CKz5XB3i9+fBENP4 CrwG4IEAiAKKk0uOwx8FvbwLqnZ40hQIJ/TuiMkUrEGfrYDpgR+qCKeZ8gl2WLppLl2K CV1dCghf/LjmVsCavv0FlzhrxTzk0aehjUjWA27+U+NHR69I9qIFl41bT6bAB29AbHpJ vHClSjkRX+aSGWiIvTuWrONwRHvgFhfdGUxsDaLre7tS10b3C7NSgAcHt+AbFaeO25++ ehXUfVpul4Yo6coI+hW+DqCADe88bjf1EDXWxrNAi2BqPRxd2vd9PnzQG0Xbet6LOLus 219g== X-Gm-Message-State: ANoB5pnEJzmtayPxxepqwYQpaFWpPo4V6A4krwAmcdS3ktnPO5DPtMAU B+KYN8jpXtDBqfQ/ZAEjvHLJUt5DmSO/nPCZCjBeO6Uv4kU= X-Google-Smtp-Source: AA0mqf7zzMhnkZD2orWAdoC70bDZGLdJRyQa/M6s1YpmLH/P5cVMIYmRzvt2yvU0Vvw/MzHa/pK2FoMBA0PxZ65yuAY= X-Received: by 2002:a67:fbd6:0:b0:3ac:38c7:1bdd with SMTP id o22-20020a67fbd6000000b003ac38c71bddmr473021vsr.9.1668644014969; Wed, 16 Nov 2022 16:13:34 -0800 (PST) MIME-Version: 1.0 References: <20221116013808.3995280-1-yuzhao@google.com> <20221116145952.3a88ac84ea0b6c5dba1056df@linux-foundation.org> In-Reply-To: <20221116145952.3a88ac84ea0b6c5dba1056df@linux-foundation.org> From: Yu Zhao Date: Wed, 16 Nov 2022 17:12:58 -0700 Message-ID: Subject: Re: [PATCH 1/2] mm: multi-gen LRU: retry folios written back while isolated To: Andrew Morton Cc: linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1668644016; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=p60mQQt7lKMxdUbWTYIfqr3zseHJU2f5gBu7KHIlyNs=; b=AegBa8bTkLA+1VFkFUbeIWn4YtR2V/lloTuqf7yLqe2orlvehI9FislI7/nvNgljAHfgSg 08uAgV4bAIVDqlpToXRS9F9yGPlqUkohtr5+iF38t+8cTXgSOXT5pAcJDiCGWaHizUmasn ToGhs+pT/cq+DJFfpRzPxa4or8SB8YI= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=oh+2IZDF; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of yuzhao@google.com designates 209.85.217.52 as permitted sender) smtp.mailfrom=yuzhao@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1668644016; a=rsa-sha256; cv=none; b=7S5cILzrS6GrfL2KGyUjwTV6sTNJCoOfattanlwdm3WKHyYjzEMBa1u54lB1Xajlr0gR/f px21uXK36ul+VxCCPjQpwLRC9IN0vjp9P6pt9cP1yDTpbfx6jBQeyNkVHywv2LKk3Xt12V /xxkjtjuaIm/YkB8+EtoJo9JN6tEba0= X-Rspam-User: X-Stat-Signature: zaqnwn5u4duu6hxt4x3jmpce3zec74w4 X-Rspamd-Queue-Id: EB5A9C0010 Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=oh+2IZDF; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of yuzhao@google.com designates 209.85.217.52 as permitted sender) smtp.mailfrom=yuzhao@google.com X-Rspamd-Server: rspam07 X-HE-Tag: 1668644015-764627 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Nov 16, 2022 at 3:59 PM Andrew Morton wrote: > > On Tue, 15 Nov 2022 18:38:07 -0700 Yu Zhao wrote: > > > The page reclaim isolates a batch of folios from the tail of one of > > the LRU lists and works on those folios one by one. For a suitable > > swap-backed folio, if the swap device is async, it queues that folio > > for writeback. After the page reclaim finishes an entire batch, it > > puts back the folios it queued for writeback to the head of the > > original LRU list. > > > > In the meantime, the page writeback flushes the queued folios also by > > batches. Its batching logic is independent from that of the page > > reclaim. For each of the folios it writes back, the page writeback > > calls folio_rotate_reclaimable() which tries to rotate a folio to the > > tail. > > > > folio_rotate_reclaimable() only works for a folio after the page > > reclaim has put it back. If an async swap device is fast enough, the > > page writeback can finish with that folio while the page reclaim is > > still working on the rest of the batch containing it. In this case, > > that folio will remain at the head and the page reclaim will not retry > > it before reaching there. > > > > This patch adds a retry to evict_folios(). After evict_folios() has > > finished an entire batch and before it puts back folios it cannot free > > immediately, it retries those that may have missed the rotation. > > > > Before this patch, ~60% of folios swapped to an Intel Optane missed > > folio_rotate_reclaimable(). After this patch, ~99% of missed folios > > were reclaimed upon retry. > > > > This problem affects relatively slow async swap devices like Samsung > > 980 Pro much less and does not affect sync swap devices like zram or > > zswap at all. > > As I understand it, this approach has an implicit assumption that by > the time evict_folios() has completed its first pass, write IOs will > have completed and the resulting folios are available for processing on > evict_folios()'s second pass, yes? Correct. > If so, it all kinda works by luck of timing. Yes, it's betting on luck. But it's a very good bet because the race window on the second pass is probably 100 times smaller. The race window on the first pass is the while() loop in shrink_folio_list(), and it has a lot to work on. The race window on the second pass is a simple list_for_each_entry_safe_reverse() loop. This small race window is closed immediately after we put the folios that are still under writeback back on the LRU list. Then we call shrink_folio_list() again for the retry. > If the swap device is > even slower, the number of folios which are unavailable on the second > pass will increase? Correct. > Can we make this more deterministic? For example change evict_folios() > to recognize this situation and to then do folio_rotate_reclaimable()'s > work for it? Or if that isn't practical, do something else? There are multiple options, none of them is a better tradeoff: 1) the page reclaim telling the page writeback exactly when to flush. pro: more reliable con: the page reclaim doesn't know better 2) adding a synchronization mechanism between the two pro: more reliable con: a lot more complexity 3) unlock folios and submit bio after they are put back on LRU (my second choice) pro: more reliable con: more complexity (within mm) > (Is folio_rotate_reclaimable() actually useful? That concept must be > 20 years old. What breaks if we just delete it and leave the pages > wherever they are?) Most people use zram (with rw_page) or zswap nowadays, and they don't need folio_rotate_reclaimable(). But we still need that function to support swapping to SSD. (Optane is discontinued.)