From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79471C27C53 for ; Fri, 7 Jun 2024 16:56:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CE8506B0096; Fri, 7 Jun 2024 12:56:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C6E856B009A; Fri, 7 Jun 2024 12:56:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AE7FC6B009B; Fri, 7 Jun 2024 12:56:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 8F7B06B0096 for ; Fri, 7 Jun 2024 12:56:10 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 437D1140231 for ; Fri, 7 Jun 2024 16:56:10 +0000 (UTC) X-FDA: 82204695300.03.C30405A Received: from mail-vs1-f42.google.com (mail-vs1-f42.google.com [209.85.217.42]) by imf13.hostedemail.com (Postfix) with ESMTP id 8781C20002 for ; Fri, 7 Jun 2024 16:56:08 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=euvgs+Jb; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf13.hostedemail.com: domain of fvdl@google.com designates 209.85.217.42 as permitted sender) smtp.mailfrom=fvdl@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717779368; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/8vmPr/U3SOxUzw+yKkrbrSVi4WiBXuKae4RE7ZMD7A=; b=oorIz3CNZ2qzXwTqzX4C1W7i/poAZHsTA8K55hq582DxhW86LSAgzTAXn8hk79Ktf/AINb pXHXDtGdtstfnYWzhsW91I2OAuPD1GH47sU4Xg5/q6lj0sTjTDV+F0hIiG/Cer0Kkr+O85 GsUhDV+smuuJkfgaVM1lWDK2h7EyynE= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=euvgs+Jb; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf13.hostedemail.com: domain of fvdl@google.com designates 209.85.217.42 as permitted sender) smtp.mailfrom=fvdl@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717779368; a=rsa-sha256; cv=none; b=xDs8Pte9BYHxdGDDBFt3KWC8dqCrk504hnlIF9EPocTjFj5IfCMKeENEDCbPPMWQItNtTi FPTNJxz1ooz7cUMGzXU9dhQhDpObpAnV8XwIf+kEEUpDkLhzmFYayCuQ/QN4ekzx2npODx fRlTMvalO8/77npX74PCG9Fl6JLcQkk= Received: by mail-vs1-f42.google.com with SMTP id ada2fe7eead31-48c2d353b01so284053137.2 for ; Fri, 07 Jun 2024 09:56:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717779367; x=1718384167; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/8vmPr/U3SOxUzw+yKkrbrSVi4WiBXuKae4RE7ZMD7A=; b=euvgs+JbIElD/B4I6z3CZpxOXGyx7WFBuavW17pLRzWlhquu9QGgD+m1I4fE4panUM gpgCImiltqSulA4KtDaz0Fc1HrSYa6A818i6hu806KfYKaWh9kge+KgF97eMdFgDT2E2 8Q+AKkJmrC6vUU272BgtL253ykD3doVPS1h8t336yJGhOxV9S4VtQP/Y9gDePD03Wsep 9cFxS3OzDARUngEcmYuO4jgtlxJ3DhaoGGwkrIKZ8xssnTDceFqBnHvuqCjEPP3nclbx V02nGCzxmvAHvrowVNOsIio03YXIgSAPx1bbCq0MdyLALsNnlqzgSfrn1koMlkAfygqu W7xA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717779367; x=1718384167; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/8vmPr/U3SOxUzw+yKkrbrSVi4WiBXuKae4RE7ZMD7A=; b=GiXgTpxGWTIg3HDy3dgCf6PhWYFKOtZANn0qUOLpeQ+jGDi/u6JQrHTFdhADnOtRMr uGc+mJ4fwTuLPL22GUo9osEsCKeK+TLqFqTf5rMndk8lL4luX0V9Rt6NRSDJYCzmqk0y kBgbr23hvQlv6qiuuF7gvZels6nSus3+yD8xGxBiw04gxKUtrOGKeIz/2LWwbsIxJSm7 wajjOSz0CPJ44bw+9s+lJmIHdUfy44kivP4xfFPJvs1NM4n5cmDkZgwxo5ZrsXlF672v /I4In/0GIUr8W0Dw+UHi+8SdvjSJAMapLx9aCrPeS9gnRgcJjGWw+hiICsw/md0POhq3 MKMg== X-Forwarded-Encrypted: i=1; AJvYcCVa268dgMtYK5mhBi3QOWa2dqTiI+pf2qbQtyGQFQWVHjJUsb9Yc8jwE5Tv2M4gJyT2jNh8mB7f6N2gLojQJ7cO7Q0= X-Gm-Message-State: AOJu0Yzyvj+5L4JgUkmYBIOp2SrR+6uig94rvh18sBsOipOYwVFFhBSj a8Eh3/M5VMLJO15cZBY5ggzJQmOeBkWr+E9R8RoxTkcyv0S2yh+ZKDkkkecKJi0V3fe9tfEPuSm Od+Q1dIgByeruxuh1WpgZzU7qCx+fqgKShhxJ X-Google-Smtp-Source: AGHT+IHQxzVTprFPbLQ57YE1AQRfmd8LyuSfQ2VqUUO1mF+4n+YkqBqwxGBOejxvOZHd9fjgZ0tdMWBbAXNS0VMHz7U= X-Received: by 2002:a05:6102:17cd:b0:48c:526:8ae with SMTP id ada2fe7eead31-48c2752821emr3455949137.2.1717779367409; Fri, 07 Jun 2024 09:56:07 -0700 (PDT) MIME-Version: 1.0 References: <20240113094436.2506396-1-sunnanyong@huawei.com> <20240207111252.GA22167@willie-the-truck> <20240207121125.GA22234@willie-the-truck> <908066c7-b749-4f95-b006-ce9b5bd1a909@oracle.com> <917FFC7F-0615-44DD-90EE-9F85F8EA9974@linux.dev> In-Reply-To: From: Frank van der Linden Date: Fri, 7 Jun 2024 09:55:55 -0700 Message-ID: Subject: Re: [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize To: David Hildenbrand Cc: Yu Zhao , Muchun Song , Matthew Wilcox , Jane Chu , Will Deacon , Nanyong Sun , Catalin Marinas , akpm@linux-foundation.org, anshuman.khandual@arm.com, wangkefeng.wang@huawei.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 8781C20002 X-Stat-Signature: 8fb8rb8st7asf9wndmcp1x5xhzy4qw1q X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1717779368-757785 X-HE-Meta: U2FsdGVkX1/47By717FMxGhVTw9V2/FaL5Ei0b4VGTS4TNE2EdoQI0/4cE5CeKMJ6OMHXR33JNYzRMYtJ8Pp7RVM6jf4Fo+lelnM+QfiS7itBltAtfU22FmwfVAD8Co58VBaETDoDEARvKZ97JNi/SDJhQw29YZWYhMYX6u8mjrvJwxqoSjxihsAmvYbXe4S0vCoO2IciyT40n3gXhXEbvwOV5ELw7BK07JuhWvaaG7mgDRrKq/oKW1loOKPrx2T91R0R02me7DUCZVAF7qJU4yRlc6T5JgovxnCqgsi2cjTJTT7ag6YzkhCZnWZz/1dY/Jyjwsfu0BvwB9aUGeN8fw5yVQZMxibUkYZyTvnu+XYmc0ULxpREUrWzH9wr0r4b3giiiDxtLp0V5LlkvN83eoj3QFIO8HvsoKqQGOdSLFMavSZHG+MG92QoMnVXyB9RexZAC7mxJBF3NYSVs8QuWQuRPjQqE703D86JGiNEtsUWmiCdOmTd+zX43DadwmWn4Oi6tCxUAMFGTqWuq3nGAI4jwz5URkYEFiPqHQNKBBoB7GzQBr/W3VEJpluJXv4vSnBLs3muJo7Iv8vfRXxkBYys6i2lnZtyVA4F5cJS7rZkC7h/+r4xqbxGFaGc6WGPhiPYMkRZ6eWWj5oPLh7CE9yxYJk0AoIladsJ7NNxVmFfB8qkBrO+j08vAr2rTTtg4+xkggWC5uL/dK/Cns1Ohi7BNPKF9klRG9KlM/cV2xqpkb8zZmnToyi9xrovh4TYmjoeNT9DFonhS+y4YErhttrid4VQqi8V2oFSgfaATP6nVqz4VoLXziMNKSOE3Gtjcp0TKVTLre3DvDr8k0tK7jeDLXtBh4oApRdK0un8S8Lf4Og+BR49iOnjSvRhc6+MJ5/3lc5528HuwBSpOR63M+B3KeDLMawdMnEZBL7z8PAAKNa/4VXk4E+GXPCsbuf71Rgjagmb2K9zEZ3OVJ EUPu5yu1 VEYDUc9L/nslXnP7OQcGcohG/rGIbdmImPXkAzsH7ah5UD/s12tjIwnNgSBY8e4h/HK7wzuqPIaq/y0yMT3bPAJVO7pZ7C5i/FO3HPZyK6oQ16BS0tsFqw7JOhzKHA84eZb4TU3nml7m/tXwkLNy4GiOG1msmzGZnmfFPq2V5cB7jFcaP7GiB4R4Xj7jxrSsR+deOx1xm4uK8AY6IBTuhYj4YvSTMpD9WZEDTawOTS5ZfTCpIPwYIaLRKhA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: I had an offline discussion with Yu on this, and he pointed out something I hadn't realized: the x86 cmpxchg instruction always produces a write cycle, even if it doesn't modify the data - it just writes back the original data in that case. So, get_page_unless_zero will always produce a fault on RO mapped page structures on x86. Maybe this was obvious to other people, but I didn't see it explicitly mentioned, so I figured I'd add the datapoint. - Frank On Thu, Jun 6, 2024 at 1:30=E2=80=AFAM David Hildenbrand = wrote: > > >> Additionally, we also should alter RO permission of those 7 tail pages > >> to RW to avoid panic(). > > > > We can use RCU, which IMO is a better choice, as the following: > > > > get_page_unless_zero() > > { > > int rc =3D false; > > > > rcu_read_lock(); > > > > if (page_is_fake_head(page) || !page_ref_count(page)) { > > smp_mb(); // implied by atomic_add_unless() > > goto unlock; > > } > > > > rc =3D page_ref_add_unless(); > > > > unlock: > > rcu_read_unlock(); > > > > return rc; > > } > > > > And on the HVO/de-HOV sides: > > > > folio_ref_unfreeze(); > > synchronize_rcu(); > > HVO/de-HVO; > > > > I think this is a lot better than making tail page metadata RW because: > > 1. it helps debug, IMO, a lot; > > 2. I don't think HVO is the only one that needs this. > > > > David (we missed you in today's THP meeting), > > Sorry, I had a private meeting conflict :) > > > > > Please correct me if I'm wrong -- I think virtio-mem also suffers from > > the same problem when freeing offlined struct page, since I wasn't > > able to find anything that would prevent a **speculative** struct page > > walker from trying to access struct pages belonging to pages being > > concurrently offlined. > > virtio-mem does currently not yet optimize fake-offlined memory like HVO > would. So the only way we really remove "struct page" metadata is by > actually offlining+removing a complete Linux memory block, like ordinary > memory hotunplug would. > > It might be an interesting project to optimize "struct page" metadata > consumption for fake-offlined memory chunks within an online Linux > memory block. > > The biggest challenge might be interaction with memory hotplug, which > requires all "struct page" metadata to be allocated. So that would make > cases where virtio-mem hot-plugs a Linux memory block but keeps parts of > it fake-offline a bit more problematic to handle . > > In a world with memdesc this might all be nicer to handle I think :) > > > There is one possible interaction between virtio-mem and speculative > page references: all fake-offline chunks in a Linux memory block do have > on each page a refcount of 1 and PageOffline() set. When actually > offlining the Linux memory block to remove it, virtio-mem will drop that > reference during MEM_GOING_OFFLINE, such that memory offlining can > proceed (seeing refcount=3D=3D0 and PageOffline()). > > In virtio_mem_fake_offline_going_offline() we have: > > if (WARN_ON(!page_ref_dec_and_test(page))) > dump_page(page, "fake-offline page referenced"); > > which would trigger on a speculative reference. > > We never saw that trigger so far because quite a long time must have > passed ever since a page might have been part of the page cache / page > tables, before virtio-mem fake-offlined it (using alloc_contig_range()) > and the Linux memory block actually gets offlined. > > But yes, RCU (e.g., on the memory offlining path) would likely be the > right approach to make sure GUP-fast and the pagecache will no longer > grab this page by accident. > > > > > If this is true, we might want to map a "zero struct page" rather than > > leave a hole in vmemmap when offlining pages. And the logic on the hot > > removal side would be similar to that of HVO. > > Once virtio-mem would do something like HVO, yes. Right now virtio-mem > only removes struct-page metadata by removing/unplugging its owned Linux > memory blocks once they are fully "logically offline". > > -- > Cheers, > > David / dhildenb >