From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB9E2C3DA49 for ; Thu, 18 Jul 2024 07:52:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 519BA6B0083; Thu, 18 Jul 2024 03:52:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4C9E46B008C; Thu, 18 Jul 2024 03:52:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 369EB6B0095; Thu, 18 Jul 2024 03:52:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 189136B0083 for ; Thu, 18 Jul 2024 03:52:54 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 7010480DA3 for ; Thu, 18 Jul 2024 07:52:53 +0000 (UTC) X-FDA: 82352107026.24.B461E30 Received: from mail-lj1-f175.google.com (mail-lj1-f175.google.com [209.85.208.175]) by imf27.hostedemail.com (Postfix) with ESMTP id 5445D40003 for ; Thu, 18 Jul 2024 07:52:51 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b="SxRBt/T1"; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf27.hostedemail.com: domain of wqu@suse.com designates 209.85.208.175 as permitted sender) smtp.mailfrom=wqu@suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721289126; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ItcZOxEWC9jKArqH07Fv+He9OxE1gAM7tmpwWQe9zzE=; b=OHOmNGxl0Qad4d8QGwYr27XnVV8RTOA0E98raM6npYY6I2G2DG+Qa3LEdJFFp9sudRpSIe j3c+C+P+04REV3LNZKsytKaM2Q68iFXszCGLlsjTUp6xHpQLF2sZUz4RZxg8y8HD3kXIq7 /88dMNwe/+Rj677xs4uUcZAPxeuOgS4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721289126; a=rsa-sha256; cv=none; b=LSYflQqyZgWiBOfKN08dBH3yyNEYJpNwSKrr1tcaxGT+bY7gZXPXYEgFkw23wDkhHQlceM hTamBwYx3pBfQNmDNBL6SyCelydliiNTPqbGbdJa9/VOgqWiYo5+WPrrPnUyYsnwYkLwLb Ou880lgwjuzl5MNmxXQDUk8qeC7iCe4= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b="SxRBt/T1"; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf27.hostedemail.com: domain of wqu@suse.com designates 209.85.208.175 as permitted sender) smtp.mailfrom=wqu@suse.com Received: by mail-lj1-f175.google.com with SMTP id 38308e7fff4ca-2ebe40673d8so6860571fa.3 for ; Thu, 18 Jul 2024 00:52:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1721289169; x=1721893969; darn=kvack.org; h=content-transfer-encoding:in-reply-to:autocrypt:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id:from:to:cc:subject:date:message-id:reply-to; bh=ItcZOxEWC9jKArqH07Fv+He9OxE1gAM7tmpwWQe9zzE=; b=SxRBt/T1qDAYcTU58b34LzK/ON05UB0tFMXMh+gKIIpux7cNN2HCQjZK8NhzychJDm qotFtXXgYwxA3GfO6C94k09tYC3Qh0MF6lci76t7GSf+qm1mYaIoumIDO/58YuPeFHmr uSWZAL8e8XrACsrHmxjaFACZEg+jxJF2yfSl+AaZSy+l4C0KIIYXAA+pWyQKrfjIl43C zWNoN3IMLdDDKavjcDZXRoX8x7d8iiNWb5t2PRzaix1wKid/Ac+rLhB+Tc3FjaqcSKcd /hi6ZApzznA3V8O7bcixIkUBEdCcMSLMyuDcB0MgB1O97sA91HuMErAaahpF1VJM6wT8 nAMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721289169; x=1721893969; h=content-transfer-encoding:in-reply-to:autocrypt:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ItcZOxEWC9jKArqH07Fv+He9OxE1gAM7tmpwWQe9zzE=; b=kFnlJVE3/n/myqvJrPOanC/pYlRSlcj177LV8hK5R5tGCOZ8GiJe4bwv5BzQC9C3Z3 aa5IUjpNx3SDYmNW+Cnm0KUpwk8TLbXoUbfY6VK26DNGxK3csfQwLCGAUvcOHBe5bgJ8 gMJ315K77uZL4oQ6Mbv5m7kg5pCWvo0ZhuBpOcVziDY1e3iUkM0g0bACyw8/fn3nUhKU 38tm0xgGuNZUICLg3C8oJgiwAMFSK2iltvHawDITNpZyAx3oox6xMsBdzxkfG9eQdsCg uSXw/Yjq2Bxd8vO9MDhz2MfXwmIkTfRyfCA6nr/Yi8yfK041tFRRZxXot6q0t0OqPfPy 3Xiw== X-Forwarded-Encrypted: i=1; AJvYcCW0FcKfCR5VEYlZkCkd+V4JvQ7HTTuGKZPb8aDDiTN54QjSFGiVoXjJhC09G8Y1SKZ/0f6O/UXZwrW5m+2k+kg4CUk= X-Gm-Message-State: AOJu0YyKIx62CTGIA9cBtBVljmiRJs8p1xqN9eMBStXFYi2mS+POFKTS 4EPA5MzHkBcLgfNMQxeR6UlYNarKcJxAHZxNtTgriUA8sWuUqt4urGaEw538kDsWgfZo/wE8fc1 LguM= X-Google-Smtp-Source: AGHT+IHSLlJlVxe4hTKCuBdztBtQFfrHPSw+bG3aGP5TG/PL63MfSusrvKjUmhO3C9+6OMznKQ/l0Q== X-Received: by 2002:a2e:bc0c:0:b0:2ec:56b9:259b with SMTP id 38308e7fff4ca-2ef05d45b55mr12993411fa.49.1721289169134; Thu, 18 Jul 2024 00:52:49 -0700 (PDT) Received: from ?IPV6:2403:580d:fda1::299? (2403-580d-fda1--299.ip6.aussiebb.net. [2403:580d:fda1::299]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2cb772c1ce1sm32210a91.7.2024.07.18.00.52.44 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 18 Jul 2024 00:52:48 -0700 (PDT) Message-ID: <3cc3e652-e058-4995-8347-337ae605ebab@suse.com> Date: Thu, 18 Jul 2024 17:22:41 +0930 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 0/2] mm: skip memcg for certain address space To: "Vlastimil Babka (SUSE)" , Qu Wenruo , Michal Hocko Cc: linux-btrfs@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Johannes Weiner , Roman Gushchin , Shakeel Butt , Muchun Song , Cgroups , Matthew Wilcox References: <8faa191c-a216-4da0-a92c-2456521dcf08@kernel.org> <9c0d7ce7-b17d-4d41-b98a-c50fd0c2c562@gmx.com> <9572fc2b-12b0-41a3-82dc-bb273bfdd51d@kernel.org> Content-Language: en-US From: Qu Wenruo Autocrypt: addr=wqu@suse.com; keydata= xsBNBFnVga8BCACyhFP3ExcTIuB73jDIBA/vSoYcTyysFQzPvez64TUSCv1SgXEByR7fju3o 8RfaWuHCnkkea5luuTZMqfgTXrun2dqNVYDNOV6RIVrc4YuG20yhC1epnV55fJCThqij0MRL 1NxPKXIlEdHvN0Kov3CtWA+R1iNN0RCeVun7rmOrrjBK573aWC5sgP7YsBOLK79H3tmUtz6b 9Imuj0ZyEsa76Xg9PX9Hn2myKj1hfWGS+5og9Va4hrwQC8ipjXik6NKR5GDV+hOZkktU81G5 gkQtGB9jOAYRs86QG/b7PtIlbd3+pppT0gaS+wvwMs8cuNG+Pu6KO1oC4jgdseFLu7NpABEB AAHNGFF1IFdlbnJ1byA8d3F1QHN1c2UuY29tPsLAlAQTAQgAPgIbAwULCQgHAgYVCAkKCwIE FgIDAQIeAQIXgBYhBC3fcuWlpVuonapC4cI9kfOhJf6oBQJjTSJVBQkNOgemAAoJEMI9kfOh Jf6oapEH/3r/xcalNXMvyRODoprkDraOPbCnULLPNwwp4wLP0/nKXvAlhvRbDpyx1+Ht/3gW p+Klw+S9zBQemxu+6v5nX8zny8l7Q6nAM5InkLaD7U5OLRgJ0O1MNr/UTODIEVx3uzD2X6MR ECMigQxu9c3XKSELXVjTJYgRrEo8o2qb7xoInk4mlleji2rRrqBh1rS0pEexImWphJi+Xgp3 dxRGHsNGEbJ5+9yK9Nc5r67EYG4bwm+06yVT8aQS58ZI22C/UeJpPwcsYrdABcisd7dddj4Q RhWiO4Iy5MTGUD7PdfIkQ40iRcQzVEL1BeidP8v8C4LVGmk4vD1wF6xTjQRKfXHOwE0EWdWB rwEIAKpT62HgSzL9zwGe+WIUCMB+nOEjXAfvoUPUwk+YCEDcOdfkkM5FyBoJs8TCEuPXGXBO Cl5P5B8OYYnkHkGWutAVlUTV8KESOIm/KJIA7jJA+Ss9VhMjtePfgWexw+P8itFRSRrrwyUf E+0WcAevblUi45LjWWZgpg3A80tHP0iToOZ5MbdYk7YFBE29cDSleskfV80ZKxFv6koQocq0 vXzTfHvXNDELAuH7Ms/WJcdUzmPyBf3Oq6mKBBH8J6XZc9LjjNZwNbyvsHSrV5bgmu/THX2n g/3be+iqf6OggCiy3I1NSMJ5KtR0q2H2Nx2Vqb1fYPOID8McMV9Ll6rh8S8AEQEAAcLAfAQY AQgAJgIbDBYhBC3fcuWlpVuonapC4cI9kfOhJf6oBQJjTSJuBQkNOge/AAoJEMI9kfOhJf6o rq8H/3LJmWxL6KO2y/BgOMYDZaFWE3TtdrlIEG8YIDJzIYbNIyQ4lw61RR+0P4APKstsu5VJ 9E3WR7vfxSiOmHCRIWPi32xwbkD5TwaA5m2uVg6xjb5wbdHm+OhdSBcw/fsg19aHQpsmh1/Q bjzGi56yfTxxt9R2WmFIxe6MIDzLlNw3JG42/ark2LOXywqFRnOHgFqxygoMKEG7OcGy5wJM AavA+Abj+6XoedYTwOKkwq+RX2hvXElLZbhYlE+npB1WsFYn1wJ22lHoZsuJCLba5lehI+// ShSsZT5Tlfgi92e9P7y+I/OzMvnBezAll+p/Ly2YczznKM5tV0gboCWeusM= In-Reply-To: <9572fc2b-12b0-41a3-82dc-bb273bfdd51d@kernel.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 5445D40003 X-Stat-Signature: 3pecnqas7whehf5cp994ymc1xjx3r8ot X-Rspam-User: X-HE-Tag: 1721289171-630816 X-HE-Meta: U2FsdGVkX19uUGnTQ5h7xZZQfWEqokCGYX8HcrlWXGI5ebyedsBDPIYuSsrZxGPuQLhl+3LCgyvaEEY1SsDhBjqAYl5tL4/v+mFYxoZ6+lG8x4HXZoNCoFFCBhz37RP2LKbNw4hDYdGUwt1iDwMk+9i8U2woDUj2VzbQG7UW3+mDIY++30x0uRaPJMdaS8YfG0s3moBqXqGIEAMtCMShCy0AA6aQzIPhNnqZRiiRINKot0PXWm691NOLtThempzXASaE1J6HUjWUNsPqCKe5hLL+wOGvm86OgT8Qoc0lNUVyahZHDvgHhedJ2DkfxhP2rn5uaeOnCMQp5ltlkabSpNUQjIrcF+F5JQNXfXLE7P5b94kPDKY5ZTM7Tvq/5Z7EjsUdGV4ziipaC3yJtlUiM6Ia7mng+AH3Hbrr//kufxLyfvwjc8W6tCvRzQwAR1+dEx0eTUruYQz67lVLuC/QyBBgV+lbwW+gypOisuO9T6rEqU9ZVSM8bkMPgzbfJXl5g0tflb9MpqOksiMAWGeZIUYJIc3ePOMRZ8by3yeWbeUDYMVNb6IrmvACIjXO889j8SITsQa16sFLh5WOjelwgA3yphtNh0pxA+6H+XT4wFFoUjXyKkaiUTmbYvRErBxl80VSR9DqmH2lguHGFmGGN202CEWVd3NYd8Qd1AUzgkKMVrk4jlN72VqwlWILJ7yBiTuyjop1d3bU2hRpdj2yNHIv/2yGnKVnEyY1l1tPhW2K4UGiYhhMggx8lX+9Xezzfv4BFKYldKK89e8C4tISjKZB4nNiqyS3PsjqwuA9/pprxEMqLlZO6xx+6jYaLFcoZ2Z22Ic9//bqSENmrnBta4FVtSWCKodlhRAIRSL6pjTzgwyYrSnCbGXo9pVZtJ3adEF5yYsChPG9N6NqqM9CQe+ELWB7pw53HT9ULEBTpqE77YcbQtEhZfJ7QcNZnQt0dUmujrDm0BxIJycHV8e ax2oB3ZB gO9JirWe3Zzv51H1ZGHigSZiGd/ut1AexN0aDqA+5Xkv8NSEQu2SKZ/H6mgdKNwdqYQQ4JjjukNsihNqMyVnuET/0VtjWDQjuwUYhdGYcuMwDNU5vYzTBGyM7/bs4F/r5RlB7t53M9S+BycWoZhPD5RbvErK4SRRULK26f35j3zyC+WP3TkipMwADKGxRHtXHpZfJ0zgVcFi8rHnAxCuvwOv2OL0AX6xLOxa0n6rjvqAPOcczbvuVyWVH58VRZHSbQi5xNG+BRzo04TBIEF0AESyd9OhPz6USu6Y/g9GWzbGC8Lk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2024/7/18 16:47, Vlastimil Babka (SUSE) 写道: > On 7/18/24 12:38 AM, Qu Wenruo wrote: [...] >> Another question is, I only see this hang with larger folio (order 2 vs >> the old order 0) when adding to the same address space. >> >> Does the folio order has anything related to the problem or just a >> higher order makes it more possible? > > I didn't spot anything in the memcg charge path that would depend on the > order directly, hm. Also what kernel version was showing these soft lockups? The previous rc kernel. IIRC it's v6.10-rc6. But that needs extra btrfs patches, or btrfs are still only doing the order-0 allocation, then add the order-0 folio into the filemap. The extra patch just direct btrfs to allocate an order 2 folio (matching the default 16K nodesize), then attach the folio to the metadata filemap. With extra coding handling corner cases like different folio sizes etc. > >> And finally, even without the hang problem, does it make any sense to >> skip all the possible memcg charge completely, either to reduce latency >> or just to reduce GFP_NOFAIL usage, for those user inaccessible inodes? > > Is it common to even use the filemap code for such metadata that can't be > really mapped to userspace? At least XFS/EXT4 doesn't use filemap to handle their metadata. One of the reason is, btrfs has pretty large metadata structure. Not only for the regular filesystem things, but also data checksum. Even using the default CRC32C algo, it's 4 bytes per 4K data. Thus things can go crazy pretty easily, and that's the reason why btrfs is still sticking to the filemap solution. > How does it even interact with reclaim, do they > become part of the page cache and are scanned by reclaim together with data > that is mapped? Yes, it's handled just like all other filemaps, it's also using page cache, and all the lru/scanning things. The major difference is, we only implement a small subset of the address operations: - write - release - invalidate - migrate - dirty (debug only, otherwise falls back to filemap_dirty_folio()) Note there is no read operations, as it's btrfs itself triggering the metadata read, thus there is no read/readahead. Thus we're in the full control of the page cache, e.g. determine the folio size to be added into the filemap. The filemap infrastructure provides 2 good functionalities: - (Page) Cache So that we can easily determine if we really need to read from the disk, and this can save us a lot of random IOs. - Reclaiming And of course the page cache of the metadata inode won't be cloned/shared to any user accessible inode. > How are the lru decisions handled if there are no references > for PTE access bits. Or can they be even reclaimed, or because there may > e.g. other open inodes pinning this metadata, the reclaim is impossible? If I understand it correctly, we have implemented release_folio() callback, which does the btrfs metadata checks to determine if we can release the current folio, and avoid releasing folios that's still under IO etc. > > (sorry if the questions seem noob, I'm not that much familiar with the page > cache side of mm) No worry at all, I'm also a newbie on the whole mm part. Thanks, Qu > >> Thanks, >> Qu >