From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA9B7D68B30 for ; Thu, 14 Nov 2024 15:23:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3BB2D6B0089; Thu, 14 Nov 2024 10:23:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 36B706B008A; Thu, 14 Nov 2024 10:23:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 233836B008C; Thu, 14 Nov 2024 10:23:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 0472C6B0089 for ; Thu, 14 Nov 2024 10:23:58 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 72E9A141235 for ; Thu, 14 Nov 2024 15:23:58 +0000 (UTC) X-FDA: 82785068688.13.744A3EF Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf08.hostedemail.com (Postfix) with ESMTP id 009BF160023 for ; Thu, 14 Nov 2024 15:23:25 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="pK/pFmjZ"; spf=none (imf08.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731597660; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JkQZN70M+PYcyWdpyLtOCx2vhQJTOB9r3LipKwHXqb8=; b=qOrFF8GUqn5gMTlk3DCaPFuR3nngm8Z68dmaKRT+kKllFeXx5a5YVPyoFZ5f7XllXayVx/ ji3ZZ+MeVJilQUV6L0p9eZ/92gofVanZIjq+Bpa3sIYzGgOxDVF6r6KpWwrvnwxSvEJXtN eWqc6eGtnt3Ll9o1aImliNXGqeM+2Kc= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="pK/pFmjZ"; spf=none (imf08.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731597660; a=rsa-sha256; cv=none; b=3jpRIzsGs2WnOLwVXetDRUe2OvKqZscLDjIuLdSjQ9Jc5bTqOaOMBXGkIUhBSlOH8nkrP2 Mp4hpSkqGQUWz3o/hh5UUuI27Qb3qEP0+RBHNHGCNQy62JJTFCrFegW+ZpxFv4/Zc/bX8b DB4WliLu/wQ7plA8JgUQO1CZDQAb4I4= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=JkQZN70M+PYcyWdpyLtOCx2vhQJTOB9r3LipKwHXqb8=; b=pK/pFmjZayrAntQzAg5qvWDkAu VcKtUnxIoRRQT6ZhpvjKYtskVGtak7pjz/wB1gmByPe5RFQic3D8j+mmZMWTlW6hCYMJi0q8kWUQZ XeIUkHmTInAn+TDYJe5Pa1BGyH1fkobTBJWvg4oFj0uzKTpGYPrgtHgciO9MNKrn42Hzw+XzrvWjm oESZDt5GEmChPpa/Sa1x+0LOtOacsJkg/S18L/uhEd/G5VCWG+PtCOmWuaRvaneENxSGshCzwlP81 hfolDEeLfKxVqKy/9cpIbDI4Kvzwv0Yq/Mmmy0gDXXtz/jJBumXQkOzjIbkMieOL+oZGFZ8qubT9K MqD67jwA==; Received: from willy by casper.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tBbhE-00000000yYf-33TE; Thu, 14 Nov 2024 15:23:48 +0000 Date: Thu, 14 Nov 2024 15:23:48 +0000 From: Matthew Wilcox To: David Hildenbrand Cc: "linux-mm@kvack.org" Subject: Re: PageOffline: refcount, flags and memdesc Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 009BF160023 X-Stat-Signature: wfaja8khexk9mfsju5i37ayafrzpn31o X-Rspam-User: X-HE-Tag: 1731597805-337069 X-HE-Meta: U2FsdGVkX1/WrZiPQHeBvuQyJ/wNyokwDXtIxFstUnXtYCXaUwYF69O7plEhsJyLYtSzuHa6p0tfP88oINvPB0rsqNDDPpofUukOOs1M18ob4mb+4AaL5Iea6GDzFKW6JY+fSDXFtPcK09RUgFtn/YWeFWEnh/F9LjUK7hzDbABG8KDeXBDRJhDBj0ywwKWBZ0QjRT3YAJVtar58nWKalg68H1MRb9rR7UxByC2Cexs92Ps4Yn64RaGLNREbtubQfPQUyDR1xzogg2yPirh8sOftktix7d85rkUlQPbCj91bjCkD+kELoojlVbMywx9xa9GA7VCTe+bYGLHoFPy5DP09Ci4zLb28RV7C/ysk+5o6f7yYATJKNBamFC81Cl1u9divg1bLHptPHiL+BjRAPXNKY0u9nOmFRvZQ4/0+VJQnXTwjCNaDoY+8e0STybyEiuYYwXCMhYpG9QGuyVqk164jNiH9/Hp5HWJiR7FvCAexKoz++qJk7Djind8elY1z2oifvVJDRLZNyLvIZrBzyC7VYHRT3hvqpMi25uyqNrcbTA2eXVh57annckP3+QkXMUTudX0PahBw/a/XzKMKc3Fug1PLxVSYLc1OSvSmHlP83r3PM09Qrqnxe3SPRaSVCAbaqVoUEcWqADMk/cbXHOwvXnnGx0EB+QvunuYncwg8KHGgGly/VNjVOzNaI829hXCcH3GMsD8gGyYpwmBwbzpF8RSiYK68cjaMevJRR3AeFRXnOChHfjJ3hlAgJjf4Xq1gnV5Q8seBTAVrMhmNnxglz25uZd8BbxyaStBW/kSqzXFpwT2ILrG7G9Qqf/4D1bYqsakhGvoD5geexNzBVwXtgySCCK+VPiNMXC9fgMOmN82TvBz803bR7V3/vOaSN016jyHEIkPYt83eCO9rAhLYpv025xnrbHlEmxYwRvNBnDSJQX4aK3XZ0AVoF6rNhs34j7kSqAw47VvKd+L VKBFKO43 nvOOdBRKYPFKUQWIFxkvjUOThGqGBHoIYhzUckVuQSclNiUsysP9AUVBDzgQsmDj1yFFHWgRUemCNcvjvkQLWRPLexRj818TWdsYDd4gb4pHmcuOqZI6jBDdrKOse+hH6QN+2cEENF19BYJeVNeuGoVWwoUBu/wj9PGlvYWX6gmjOV9a/+uObTPy9kH6G4kTunOeUFLWJTbnQynvD6w+u4PvACIn+IIN9sTb2thf7FWQNjXxNGh1qpAYvRFHk0Gge2iFVLKXzchb7o4Djc4c3qsY3ZhuvVWtzTLWd+gxT6ajGsWCc9mBtliRcT5UXzyr1VA2HlPoUcoL3CVLHi1+1W3qwivIoIjEY8foY X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Nov 14, 2024 at 12:18:15PM +0100, David Hildenbrand wrote: > I'm currently staring again at PageOffline and wonder how we could prepare > it for the memdesc future, and if we can remove refcount handling. Thanks for bringing it up. As a memdesc, I currently have PageOffline as being type 0 (Misc), subtype 5 (Offline). That's bits 0-10 and then bit 11 is for "may be mapped to userspace". Bits 12-17 are the order. With the top bits being used for section/node/zone, that could be 25 + 12 + 3 = 40 bits, so we'd have 7 bits remaining for use as flags. > I'd like to stop using the refcount for PageOffline pages, and keep the > refcount always at 0. I think this makes sense. > But the refcount, it is currently used to detect whether we are allowed to > offline memory blocks that contain PageOffline pages, because only selected > drivers support re-onlining. Well, and it is used when returning the pages > to the buddy where free_page()/free_contig_range().... expect a refcount of > 1. > > Further, virtio-mem currently uses the PageDirty() bit to remember if a > PageOffline page was already exposed to the buddy before, or if we must use > generic_online_page(). > > For now we would need the following information, that could be stored in 2 > flags, leaving the refcount at 0: > > (1) Was it obtained from the buddy or never exposed it to the buddy > > PageOffline() && PageOfflineNeverOnlined() > > (2) The driver does support actual memory offlining+reonlining, they can > be skipped when offlining. > > PageOffline() && PageOfflineSkippable > > > But when allocating/freeing pages we would still mess with the refcount, > which is bad. > > We could have a dedicated interface for freeing them, where we abstract the > generic_online_page() bits, and leave the refcount at 0: > > free_offline_page() > free_offline_page_range() > > And > > alloc_offline_page() > alloc_offline_page_range() > alloc_offline_pages > > I'm not super happy about the "alloc/free" terminology, but nothing better > came to mind. If I resurrect https://lore.kernel.org/linux-mm/20220809171854.3725722-1-willy@infradead.org/ would the frozen terminology work for you here? > There is one complication to sort out: balloon_compaction.h supports moving > PageOffline pages, and seems to use the page lock, page refcount, page lru, > page private... which is all rather nasty. I wonder if these should get > their own page type, like PageMovableOffline, and we'd mostly leave them > alone for now. This would mean that virtio-balloon, vmware-balloon and ppc > CMM would keep doing the old refcount-based thing but with a new page type. It's fairly clear to me now that we have a sane story for moving file/anon folios. The current way we handle movable pages looks mostly insane because it's hammered into that framework, I think we need something entirely different to handle movable non-folio pages, but I don't know what that story is yet.