From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D07C4C6FD1F for ; Thu, 21 Mar 2024 18:36:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6445B6B0092; Thu, 21 Mar 2024 14:36:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5F5366B0093; Thu, 21 Mar 2024 14:36:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 495F36B0095; Thu, 21 Mar 2024 14:36:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3532D6B0092 for ; Thu, 21 Mar 2024 14:36:14 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 0C12312081B for ; Thu, 21 Mar 2024 18:36:14 +0000 (UTC) X-FDA: 81921901068.28.9D54920 Received: from mail-lj1-f180.google.com (mail-lj1-f180.google.com [209.85.208.180]) by imf02.hostedemail.com (Postfix) with ESMTP id C867F80022 for ; Thu, 21 Mar 2024 18:36:11 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=NS4Rt7vx; spf=pass (imf02.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711046171; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OYHW2im/7U2eInaWrdKWsru+Lb393o8hz20jHiFRKek=; b=5LzXlvLH+L4Haca5I0cBPiCrIL4m+ozzFpiMTtpwSrCbhSUYtc7CD/sxLFU/QxYGTTgD0v f0eDuKc4Uh9uJWSTvLYEyufXBomBJQVqy28ObcRr9vaZQRC1wmRLR/s6o+EJpfA99qRMNS xarJLNkvn4DJMx/DThlKRf0nZQcoQDA= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=NS4Rt7vx; spf=pass (imf02.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711046171; a=rsa-sha256; cv=none; b=BOVnP8XcC/trkf4JZpwiEpOCX+wDPhDPhMKDbp5o9QWTaxmCJ01SGz8/mWitYwZWvfMKXI 7uEeG4dn+V3Dn0FgLZ9Wr3LonnikzaNBcu++3KXmST9L+wPwUfCcozPUKCApwNSGSOG5Gz XPvT5NfFXQVB2I9P6xtp+MONdqeAQcQ= Received: by mail-lj1-f180.google.com with SMTP id 38308e7fff4ca-2d46c44dcc0so15025751fa.2 for ; Thu, 21 Mar 2024 11:36:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1711046170; x=1711650970; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=OYHW2im/7U2eInaWrdKWsru+Lb393o8hz20jHiFRKek=; b=NS4Rt7vxXXqVBgK+lYr59afp0f4U/028UYqFs8j6qoUxAaQ7b/e32bTzP+d5aoyllL TrJ+ynczsnnGKHaKRGR8HPbTWZktxn6P9a/OqU82DCaA69RvHynB+tCMAKuhETi6eKPa YQrPN4S5STiySLQ6+m5y1bWFRwfrbmv5KK0LRG7mpaS94LCP3Q+PI0HdQElRlWhJ4dqd ZYPUd14u75q5TdSDFRa6v7LEM8UvLteiS299cOeqvsZa+bl/isfuxDZlkZll1uX2gJ0z ZsKPv/ZrevuqkuSFmMyLNDs4BlORbhQryB773zW0MpU5KygnA9aTGzqtE/wvoTKJkKI9 i/cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711046170; x=1711650970; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OYHW2im/7U2eInaWrdKWsru+Lb393o8hz20jHiFRKek=; b=q3rj4yDpajImsEnqKoCNm1pGqAmdVOrjVtvqKoDNOlkcYUw9qHd8ExOH6Bwp0zTxPq SItmwFBS/9uoEZLJt2nVT43pB2942P29jGFF0XZalSfcM2CIR9Df3JZ77Uh5jgJ8dpzX JBzSUFAiPuESPMHPhpb1JgvOHetLP1Pj7Gi1F5lAobZU39hUnmF20O97vsBKy/xHyT3b Jd8SAYlFJ5p4eWH6lwa8KHRZbYH5zHKXYejziXgeeGMdTejdDtQwM0SP6/2oUp+UOboP tTvO5pSb9rqB39HRMbZnuJKh4r9LJJbhW3KXzw3mTJE9g0xxvmAx6t9p+kG0bMj+uK08 ghtg== X-Gm-Message-State: AOJu0YyGqGYYJYQej9bvCzmDrRIXLq93QG/Rvl6MS5QAxcHumc0CebJw 52bZ49xJOzSIJME1jqVp9NxTGgGmYrvQOOf3wihVa2VpK9lPj1WtRoHXUcjEOf5b2COBwISTqit pV87AEBmIQXI8PVJp0spKF7QX450= X-Google-Smtp-Source: AGHT+IH5zSN7HMRx94vNa7NxxF1rY4ak+DSJW/1cvfXlwrhalqT1PKrc39Y7uWo4N6iLqV05t9okNOcU7hhYB2pG3CY= X-Received: by 2002:a2e:9784:0:b0:2d4:ad34:85a7 with SMTP id y4-20020a2e9784000000b002d4ad3485a7mr245084lji.29.1711046169621; Thu, 21 Mar 2024 11:36:09 -0700 (PDT) MIME-Version: 1.0 References: <20240319092733.4501-1-ryncsn@gmail.com> <20240319092733.4501-5-ryncsn@gmail.com> In-Reply-To: From: Kairui Song Date: Fri, 22 Mar 2024 02:35:52 +0800 Message-ID: Subject: Re: [PATCH 4/4] mm/filemap: optimize filemap folio adding To: Matthew Wilcox Cc: linux-mm@kvack.org, Andrew Morton , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: C867F80022 X-Rspam-User: X-Stat-Signature: oqdz5rbk98jaqo435t8kjouz6pifnuoe X-Rspamd-Server: rspam01 X-HE-Tag: 1711046171-748292 X-HE-Meta: U2FsdGVkX1+LVLWzxuPf+OgMlNU1xCG66XMKOFu33Jg1h7KYEHZRfqdSZbhXNppIRHIy3sULn4rhVAu4D6/OGoA3JRjWKwZDTGAOMAhqHNx+Ax8KK70SRPfXIjqPs/BPV2y6VEvAaY3tmHHHIlx9FDj9uMYcN/cOccuUh2iWWYKQaxRG0X5700Nezn/2lzgt9ic0wZUCiUPHj1nFZA/h0BE/38TpELE0jle35a3waWmpTF6y53VyMTMDUVJtEw7Ho/TuRKiqK3xh4/Kk8GZcTwHhiIkS4G4CE5/sEFzkjQt+puZkEPjDh2Li6K/EK97lhgIcJng8eXOqknEjLXaXsH5KC/6QYLippjaKg5Q45zRJlFzIQ74HlVRciwfA5r6xgnDfi4YyMwZIgfi136Ia3w2Q3unllS94ogmuY1K+ih1YARBGXpRVHhgaBBHdaBYvbUrtHAkp7ucWhcqhrOmfEEybjlTzUh/MjbOZlu+EO3N139RPygEguhUkTqMDuCl5bdJCzHS7HyOQP0Gn5dGLA2mzJ5SNLHmq1HjWbLvhz2pv3bU1t/0HqUwxU6nAlemLVtX/6bXWEA+WVPRikDuIJ4jvSyBH87QCGhkpVSJa11u8YH5oT2ncn7gornWD4OIST3vb6R9F24qwFOJMtoul8MLC4IZLuRtrOhBTSMUEk2ycfaIpZe7G/xZAgZXvGxngv8bAqZKBWlkfQ0bcsdhfdku+Sxf3f/o/D84MbOKtWqZsFaqa8P5jAn9HgaA68sXSCmgxzpA5sdWr35V0LgYMlRf+XJz2gss9U4ZB11f9pNWSuVc9kCkhoSWSdo3DepCaxprZZSvHmQoa2DIZQf38JIzGTz+CbIS1Y1705i+ImjjAIUcvw3SlT656xnizXqEuOdfOVM/VTSpiXYD4Smd2H9JL+MX5G0jiIqWX5CxuT0oWYg6MlC4UxChiuZ8YoqOZGJQ2F2Xn1+VUkvKdmeb VgV/50Lx mieQGV5PAtBlSzIqOGe1fdDyEhVSWjUuPEGyHzQBtCXdaVtG6/PWI8NiIHkWqYzpqgSA2Yh9e7IoDUN7J5NkNvtAfB+wxT2p+yFJqFGsze9KoeDF5QbmUfaNg+0MvVZTkZw8sray5fkxxm8GNMSBAWvL9AKjsLHUGDZ4QgcKMYOhIfdHW/18byk2tBbOUroisgJdTvza8YLB94poGWC+wNfTcHz5A9WSn/O35IavF6m0eJqq1eBNM6h8QMPZP4SYitptaq1Nia39JmLmhdawwttSd6NPmTKV1ydnY/HMug+CswHgA7j3kKWo8fiwloqAVyhYOCM1Fo/6YhJcf8TU/8/Wh9s7UZ+CALYERZ/La5D91g+wgH5EtJMu8V4SGlmixPHaSMAcOMcHXz9h6BRexm8xw4uAgyl5vjH0wDbKV1lhLIIrHZtWcS+1xKs6cWd9GscRgS8LKOFgQ/EP56CZXyb+hBUnaaubbGl+qk/6+fHUCvquQsIkyCNGXdQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 20, 2024 at 5:06=E2=80=AFPM Kairui Song wrot= e: > > On Wed, Mar 20, 2024 at 6:20=E2=80=AFAM Matthew Wilcox wrote: > > > > On Tue, Mar 19, 2024 at 05:27:33PM +0800, Kairui Song wrote: > > > From: Kairui Song > > > > > > Instead of doing multiple tree walks, do one optimism range check > > > with lock hold, and exit if raced with another insertion. If a shadow > > > exists, check it with a new xas_get_order helper before releasing the > > > lock to avoid redundant tree walks for getting its order. > > > > > > Drop the lock and do the allocation only if a split is needed. > > > > > > In the best case, it only need to walk the tree once. If it needs > > > to alloc and split, 3 walks are issued (One for first ranced > > > conflict check and order retrieving, one for the second check after > > > allocation, one for the insert after split). > > > > > > Testing with 4k pages, in an 8G cgroup, with 20G brd as block device: > > > > > > fio -name=3Dcached --numjobs=3D16 --filename=3D/mnt/test.img \ > > > --buffered=3D1 --ioengine=3Dmmap --rw=3Drandread --time_based \ > > > --ramp_time=3D30s --runtime=3D5m --group_reporting > > > > > > Before: > > > bw ( MiB/s): min=3D 790, max=3D 3665, per=3D100.00%, avg=3D2499.17,= stdev=3D20.64, samples=3D8698 > > > iops : min=3D202295, max=3D938417, avg=3D639785.81, stdev=3D52= 84.08, samples=3D8698 > > > > > > After (+4%): > > > bw ( MiB/s): min=3D 451, max=3D 3868, per=3D100.00%, avg=3D2599.83,= stdev=3D23.39, samples=3D8653 > > > iops : min=3D115596, max=3D990364, avg=3D665556.34, stdev=3D59= 88.20, samples=3D8653 > > > > > > Test result with THP (do a THP randread then switch to 4K page in hop= e it > > > issues a lot of splitting): > > > > > > fio -name=3Dcached --numjobs=3D16 --filename=3D/mnt/test.img \ > > > --buffered=3D1 --ioengine mmap -thp=3D1 --readonly \ > > > --rw=3Drandread --random_distribution=3Drandom \ > > > --time_based --runtime=3D5m --group_reporting > > > > > > fio -name=3Dcached --numjobs=3D16 --filename=3D/mnt/test.img \ > > > --buffered=3D1 --ioengine mmap --readonly \ > > > --rw=3Drandread --random_distribution=3Drandom \ > > > --time_based --runtime=3D5s --group_reporting > > > > > > Before: > > > bw ( KiB/s): min=3D28071, max=3D62359, per=3D100.00%, avg=3D53542.44= , stdev=3D179.77, samples=3D9520 > > > iops : min=3D 7012, max=3D15586, avg=3D13379.39, stdev=3D44.94= , samples=3D9520 > > > bw ( MiB/s): min=3D 2457, max=3D 6193, per=3D100.00%, avg=3D3923.21,= stdev=3D82.48, samples=3D144 > > > iops : min=3D629220, max=3D1585642, avg=3D1004340.78, stdev=3D= 21116.07, samples=3D144 > > > > > > After (+-0.0%): > > > bw ( KiB/s): min=3D30561, max=3D63064, per=3D100.00%, avg=3D53635.82= , stdev=3D177.21, samples=3D9520 > > > iops : min=3D 7636, max=3D15762, avg=3D13402.82, stdev=3D44.29= , samples=3D9520 > > > bw ( MiB/s): min=3D 2449, max=3D 6145, per=3D100.00%, avg=3D3914.68,= stdev=3D81.15, samples=3D144 > > > iops : min=3D627106, max=3D1573156, avg=3D1002158.11, stdev=3D= 20774.77, samples=3D144 > > > > > > The performance is better (+4%) for 4K cached read and unchanged for = THP. > > > > > > Signed-off-by: Kairui Song > > > --- > > > mm/filemap.c | 127 ++++++++++++++++++++++++++++++-------------------= -- > > > 1 file changed, 76 insertions(+), 51 deletions(-) > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > > index 6bbec8783793..c1484bcdbddb 100644 > > > --- a/mm/filemap.c > > > +++ b/mm/filemap.c > > > @@ -848,12 +848,77 @@ void replace_page_cache_folio(struct folio *old= , struct folio *new) > > > } > > > EXPORT_SYMBOL_GPL(replace_page_cache_folio); > > > > > > +static int __split_add_folio_locked(struct xa_state *xas, struct fol= io *folio, > > > + pgoff_t index, gfp_t gfp, void **sh= adowp) > > > > Thanks for the very helpful review! > > > I don't love the name of this function. Splitting is a rare thing that > > it does. I'd suggest it's more filemap_store(). > > Yes, the function name is a bit misleading indeed, I can rename it as > you suggested, eg. __filemap_store_locked ? > > > > > > +{ > > > + void *entry, *shadow, *alloced_shadow =3D NULL; > > > + int order, alloced_order =3D 0; > > > + > > > + gfp &=3D GFP_RECLAIM_MASK; > > > + for (;;) { > > > + shadow =3D NULL; > > > + order =3D 0; > > > + > > > + xas_for_each_conflict(xas, entry) { > > > + if (!xa_is_value(entry)) > > > + return -EEXIST; > > > + shadow =3D entry; > > > + } > > > + > > > + if (shadow) { > > > + if (shadow =3D=3D xas_reload(xas)) { > > > > Why do you need the xas_reload here? > > This part of code works based on the guarantee that If there is a > larger entry, it will be the first and only entry iterated by > xas_for_each_conflict/xas_find_conflict. I added an xas_reload is here > to ensure that. But on second thought, this seems not needed indeed. > > Will it be better if I write this part this way? > > + shadow =3D NULL; > + order =3D -1; > + xas_for_each_conflict(xas, entry) { > + if (!xa_is_value(entry)) > + return -EEXIST; I noticed this should release potential alloced xas data, will fix this in = v2. > + /* > + * If there is a larger entry, it will be the first > + * and only entry iterated. > + */ > + if (order =3D=3D -1) > + order =3D xas_get_order(xas); > + shadow =3D entry; > + } > + > + if (shadow) { > + /* check if alloc & split need, or if previous alloc is > still valid */ > + if (order > 0 && order > folio_order(folio)) { > + if (shadow !=3D alloced_shadow || order !=3D alloced_= order) > + goto unlock; > + xas_split(xas, shadow, order); > + xas_reset(xas); > + } > + order =3D -1; > + if (shadowp) > + *shadowp =3D shadow; > + } > Besides this, this should be OK? I think I can add more tests for xas_for_each_conflict and xas_get_order to ensure this works, need to export xas_get_order as GPL symbol for that. > > If there is a larger slot, xas_for_each_conflict and check above > should catch that? > > > > > > + if (shadowp) > > > + *shadowp =3D shadow; > > > + } > > > + > > > + xas_store(xas, folio); > > > + /* Success, return with mapping locked */ > > > + if (!xas_error(xas)) > > > + return 0; > > > +unlock: > > > + /* > > > + * Unlock path, if errored, return unlocked. > > > + * If allocation needed, alloc and retry. > > > + */ > > > + xas_unlock_irq(xas); > > > + if (order) { > > > + if (unlikely(alloced_order)) > > > + xas_destroy(xas); > > > + xas_split_alloc(xas, shadow, order, gfp); > > > + if (!xas_error(xas)) { > > > + alloced_shadow =3D shadow; > > > + alloced_order =3D order; > > > + } > > > + goto next; > > > + } > > > + /* xas_nomem result checked by xas_error below */ > > > + xas_nomem(xas, gfp); > > > +next: > > > + xas_lock_irq(xas); > > > + if (xas_error(xas)) > > > + return xas_error(xas); > > > + > > > + xas_reset(xas); > > > + } > > > +} > > > > Splitting this out into a different function while changing the logic > > really makes this hard to review ;-( > > Sorry about this :( > > This patch basically rewrites the logic of __filemap_add_folio and the > function is getting long, so I thought it would be easier to > understand if we split it out. > I initially updated the code in place but that change diff seems more > messy to me. > > > > > I don't object to splitting the function, but maybe two patches; one > > to move the logic and a second to change it? > > > > I can keep it in place in V2.