From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F8BBD10DDD for ; Mon, 2 Dec 2024 10:09:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B2B5C6B0082; Mon, 2 Dec 2024 05:09:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ADBAC6B0083; Mon, 2 Dec 2024 05:09:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9A3296B0085; Mon, 2 Dec 2024 05:09:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 7B0B16B0082 for ; Mon, 2 Dec 2024 05:09:05 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id EBBA4C1B05 for ; Mon, 2 Dec 2024 10:09:04 +0000 (UTC) X-FDA: 82849595472.30.29713B6 Received: from mail-lj1-f182.google.com (mail-lj1-f182.google.com [209.85.208.182]) by imf17.hostedemail.com (Postfix) with ESMTP id 244554000F for ; Mon, 2 Dec 2024 10:08:53 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=JGbrmaUZ; spf=pass (imf17.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.182 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733134131; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=R4NNvOJJAFb7gWh45cMCB0OahqHzhiov+bSRfiIxLhM=; b=DNI6B12XSd6CxaSMoFvdcYhfLu4BIjnORiIBuwyVtLr0JOqIhyI7GpbvAhc4PfK6ne0DJx 9Mu3Y6pUxXeNs86KFXVOgFsjg99VAXlgztFVGR/vb4q9qJp0sQCvfQjafzaI1Gh5J4oFYZ Ta1UNKpTidpUUBv5DVaDKZsrt0XWtF0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733134131; a=rsa-sha256; cv=none; b=Lmft8aBecEv7fGngGmA4l1Eht6i+y/+sQdMSjVnDsoJFO+3hbjcSr2KeLxliZ7K+SmdJW4 Qo6tpyTRnPdYjKdLKPC9UGjIAcWUSE/8LGM5g+OYrquyw/SFK+n1mST5bqvVl4Tkxiktpd LBaAe5x5g5sysW/Z/AypZrCfUFlwdGo= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=JGbrmaUZ; spf=pass (imf17.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.182 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-lj1-f182.google.com with SMTP id 38308e7fff4ca-2ffa8df8850so43540211fa.3 for ; Mon, 02 Dec 2024 02:09:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733134141; x=1733738941; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=R4NNvOJJAFb7gWh45cMCB0OahqHzhiov+bSRfiIxLhM=; b=JGbrmaUZ+mxBVis6170V/BDWPqSC8EFbWN84KbLUI6xxiUXO7oE072ofQraoc7jnKh j5FeGsThxFeA7v8hIg270W68EO3v+ahcjCMtwf5zFOWuoYhkLB9YqNCDU8YJLae01gQ3 VnF7mrcqesISGl6on4GCuPXT0aJwNDLMNwtD4RcppjPQ7HqNu95Jqa2kMznWyBOgoF6Q ZKwboHIy5YpGvVEQiV5losvIbW1TzvKHRYW1encFSWjcdloGYFlpN/+ITuFmNjkKuWiE SaLfGhIsBarIAwz08rVpdP0sySdeiQx0oVJVLvnnGgD2t+CItU0I9Tm1KaEjOLp3CulU 7nyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733134141; x=1733738941; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=R4NNvOJJAFb7gWh45cMCB0OahqHzhiov+bSRfiIxLhM=; b=SFTULywtpR0Dl75X/yLyvXqRSz8YzTmh1VAtlWOhf6Tx4VUMHAh1r6FrwcK/JdejMI l+qdJ0x9zRiGFVvnOnGPwjGFZJcKc8tnBAuvzQRcdP87TdsBQd69BOy6I8u+eFKT8Zgh czLD21127CpUcEw6Vch52F8FxpMxZ8mEi9RPlp12D5n1/KA9a/DU+XL/lHnPcu8Cytlm yqhqG6nou9xCreYCLJFJ/h7a5+r6AEKzT6PyaBZLqUEPPVS5E0p1zumeYMZ0eUYt5zOZ mgj8+KAhC6lPY+I/iLviq3HT8O+OQpUgK35mLBTa4c5GguYPtPoWVpVymWrMhOXCkzS/ Kwww== X-Forwarded-Encrypted: i=1; AJvYcCWBMaw72uMimnJUqfzpKXWcAYv/2Uq7+pXyzj7+4TINpkE1+PpOnP0USIhrlzbfDr6gWERJSBdXfQ==@kvack.org X-Gm-Message-State: AOJu0Yz9K4jgwGjP6LH1SKwwSlt2kxlQSk/ZFqP4n6w/X4h8L1QGNYO2 0Yek0aNRYrIzMOYsFXixX9zmWPojg5MSyhu9VTq7sWwodsMfRNXkt7KxTVSBHxD/7DDNEwVzdwh fTPk4x7uErxPzPQ9t1ZMKNQRrxt8= X-Gm-Gg: ASbGnctJ7VB2oYkGfu5gabhtiXVR+g7hTHZVI1rt+LEkqyKgoQzeoeIpXX2TQLMZxqW 8Hc8XSgfd7YANkSIxvW7fbKNzbTex54s= X-Google-Smtp-Source: AGHT+IG26j68pI4k1yVg7x/t1okfJCIfJVjhCHEFTgI6YjHyL8yJfau7Pddyfxa2H15uxcTjc35rs/eZZYLehrc/xQI= X-Received: by 2002:a05:651c:2225:b0:2fb:45cf:5eef with SMTP id 38308e7fff4ca-2ffd6120fa0mr118846671fa.30.1733134140934; Mon, 02 Dec 2024 02:09:00 -0800 (PST) MIME-Version: 1.0 References: <20241127054737.33351-1-bharata@amd.com> <3947869f-90d4-4912-a42f-197147fe64f0@amd.com> <5a517b3a-51b2-45d6-bea3-4a64b75dfd30@amd.com> <2220e327-5587-4d3c-917d-d0a2728a0f73@amd.com> In-Reply-To: <2220e327-5587-4d3c-917d-d0a2728a0f73@amd.com> From: Mateusz Guzik Date: Mon, 2 Dec 2024 11:08:47 +0100 Message-ID: Subject: Re: [RFC PATCH 0/1] Large folios in block buffered IO path To: Bharata B Rao Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, nikunj@amd.com, willy@infradead.org, vbabka@suse.cz, david@redhat.com, akpm@linux-foundation.org, yuzhao@google.com, axboe@kernel.dk, viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, joshdon@google.com, clm@meta.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 244554000F X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: mxybkf3dwj6crjxkex8hfue35zi3u5um X-HE-Tag: 1733134133-760628 X-HE-Meta: U2FsdGVkX1816GwfkjH+Rdjgmbowf6oTx8kmrk4A90CLHaR6wVmQRYEVpR8ylM8apKQoUd4yw/HPOQ9uvtIxnyS/WvgK9aUCEjh9Fqq6xGW2gm+67yvpe+ce6PBRCvyawkWf83NIdFvKB35L3N/lilG0s2jthlZc3/VbT1F6XWW1sP1qwUuJykRPCUszOEcIz2sZzNDYDxrjtzUrAgckxAkU4UM7O/fa5NuYLWZL+36cKTDrdgQnGkv5ed1xJqroVznhXS2JB+EU6AME6s9jxLiSE0Yjc5lqycggmTCB6IOVQMLFznJSJJd2TsRP6EckdjCxzCz69W1GcKm+s+BnybpZi51kz+cNalPh8PR+OqVh80XSRv8MiP1awblelVELXbm+YTrLt1WR9ybuDwqrtJidC2YEjH2hZfYGlCxsUukLYNucJQ/ULnSjuEApNXUPBbGgtAjh8xutXS9WbNgfDj468Ju+KxfE6z2oEQZhvv1Y/K1eV/Yl9JA651dUCQCBzMbSN7ac0px+KFTIlWyp4R+ihQTCsrFEY8s2Gs8JFogp63SUjV3AxeHq78T6w9vaCCJN8J1WLOGTFS63OJknIdv+QCu9ncmHKIzeGJf/t3yO8iikPpHw4amD39+QtyZ/LogXSIZ/NZbrchTkO6MR41RyUElf9Iss8lJ29Rh4yaeFMk+fBbdJBthE3IVNDdFPCTeMDwjOUomQyoVjwqAkKZfm4yX0rqaq7oo2hOC+lM8z69xOkvwEI+LAhq+g7QjH0IrH0iRe0ikmKPaxID2TJB4DdaVo83Eq3pPHZhRXCmygH+Yj9D3jnC0+OGNfa8mhBz5cMzwZNvALPy0E5HPWCN4MnDzoepaxX5sY808ViR/NdNAm5OE+GupIGqPtouBd9jhYsaL2NWFvkEqLvfiBfApef2n+Ozu5xKVVEv4Q5NUjJVKBkpUy89kfO82ctjZ7V3oAx7IFBYJMOFGMh3X XNaI6GGZ KIFkn7vK4pjPNX5vUKYWWNFoPeqWzNsQrfE2q6cjpVvpqLfoiweW871dY5+DU5X94zo9WbpnkZd+pseluUuV8aiCzwXfIjmGwhNw0MGW+T9A8U2fsfn/Fj6JYXIVTB7mju40F5ow+/5Bful9A026XQL22rO+asSus9KbV2VlE995+2Uus1KOpCoFlBQx0BQpI6u0e5CqXaD9QNKUhPsgUR8cLd5ddedzpWRS4B4uGlTATfomhBHoM5UiVC/UV+6Av4pYuewAtkTVXFvoMNRxx1FtefzB1wLVy8znkF6me4kZXULa0nQEdWW0JtR6gDGYj7sUG9gXxD4eeXx7BfcWLKe0Zkt75CQRrpcnmVuo0hWPryQqCgPGA9ev+VQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Dec 2, 2024 at 10:37=E2=80=AFAM Bharata B Rao wro= te: > > On 28-Nov-24 10:01 AM, Mateusz Guzik wrote: > > > WIlly mentioned the folio wait queue hash table could be grown, you > > can find it in mm/filemap.c: > > 1062 #define PAGE_WAIT_TABLE_BITS 8 > > 1063 #define PAGE_WAIT_TABLE_SIZE (1 << PAGE_WAIT_TABLE_BITS) > > 1064 static wait_queue_head_t folio_wait_table[PAGE_WAIT_TABLE_SIZE] > > __cacheline_aligned; > > 1065 > > 1066 static wait_queue_head_t *folio_waitqueue(struct folio *folio) > > 1067 { > > 1068 =E2=94=82 return &folio_wait_table[hash_ptr(folio, PAGE_W= AIT_TABLE_BITS)]; > > 1069 } > > > > Can you collect off cpu time? offcputime-bpfcc -K > /tmp/out > > Flamegraph for "perf record --off-cpu -F 99 -a -g --all-kernel > --kernel-callchains -- sleep 120" is attached. > > Off-cpu samples were collected for 120s at around 45th minute run of the > FIO benchmark that actually runs for 1hr. This run was with kernel that > had your inode_lock fix but no changes to PAGE_WAIT_TABLE_BITS. > > Hopefully this captures the representative sample of the scalability > issue with folio lock. > I'm not familiar with the off-cpu option, fwiw does not look like any of that time got graphed. The thing that I know to work is offcputime-bpfcc. Regardless, per your own graph over half the *on* cpu time is spent spinning on the folio hash table locks. If bumping the size does not resolve the problem, the most likely contention shifts again to something else. So what we need is some profiling data from that state. --=20 Mateusz Guzik