From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0DC8C25B75 for ; Thu, 6 Jun 2024 08:30:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6978E6B00A3; Thu, 6 Jun 2024 04:30:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 647396B00A4; Thu, 6 Jun 2024 04:30:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4C1CA6B00A5; Thu, 6 Jun 2024 04:30:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2C63F6B00A3 for ; Thu, 6 Jun 2024 04:30:32 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 96D218157B for ; Thu, 6 Jun 2024 08:30:31 +0000 (UTC) X-FDA: 82199792262.15.30948E9 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf12.hostedemail.com (Postfix) with ESMTP id 4785C40024 for ; Thu, 6 Jun 2024 08:30:29 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=evGDKRNX; spf=pass (imf12.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717662629; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3Cz1uU2NyZQqvmRwUEa7bRb467YlxBktFO04VlGLtzA=; b=ZAko00rKN2NoQk8S5soldzHC6ObTo004huctnnnPjZxMtqf8ahpypGUJp8PTUrvFNevW+6 3pc/zEARmkRmaK5kBarQxm2oNTfRiU99i20P4BC2iNOBOi7WKf35olciYnILjWKunUAyFb P+Ye2weRzfr3d0FviNPqe/LY1sa639Q= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717662629; a=rsa-sha256; cv=none; b=AolzWFdWf5oTn2oAqjofO/NqWMHq2jQ0psvgNcENMAHf9lHSM5CLkbJuMQysZFLT2aQZ5w DjNVVbAnPYvmIfOqfzzgP/E1cipYNifi35GUVGWdG06dvYs5ozzjPXgp7Wan6AlGvpuyai 5WKvqaebj5EptPUFQ0rL8KQhTHwzjPk= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=evGDKRNX; spf=pass (imf12.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1717662628; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=3Cz1uU2NyZQqvmRwUEa7bRb467YlxBktFO04VlGLtzA=; b=evGDKRNXkkMaS237D2Kh+D9q4TTcjeAxy62unmM6TOiEsf7oJlrDdlku1PJscbB71Ve2Iw vJ5XxcE4arh0Z/cONRdO67Sw1gGmi7bnRF0ol1PEwl0jDt7+gfABFFkUCHIiMdWgGl6iQr fpm/oqr3RSExXghVAOeNvtySlKMUqjs= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-605-DUQh_lehNCOMj0sBbjp7Nw-1; Thu, 06 Jun 2024 04:30:26 -0400 X-MC-Unique: DUQh_lehNCOMj0sBbjp7Nw-1 Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-42120e033e2so5988455e9.0 for ; Thu, 06 Jun 2024 01:30:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717662625; x=1718267425; h=content-transfer-encoding:in-reply-to:organization:autocrypt :content-language:from:references:cc:to:subject:user-agent :mime-version:date:message-id:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=3Cz1uU2NyZQqvmRwUEa7bRb467YlxBktFO04VlGLtzA=; b=woaBukV+ukxO/gmdXlIuR629O3SrLbrYvmRw1TNuhCouIVcEvQRmHvbAaSdlMG0988 nB/bpOgCc0tfY3mta5itVZrrN7rIMrS6InM7cRobQ7MNv66JJavH6W04wyqIcj3Ei5gM /5XFRFCWHqkv5eBcOwdpLtlr3z1j/qnJmJKkJt2iHWvHjFPDfwsyDOnDMZLbSQlZ9xod DYObAfdXDH7ydyKADuBD8a+eNAwrr6y6mmacYB1tjjtBjaeVbjOaA++VwVPD8p6pEEO5 3ztjEuwTnDPxUuPJ4vKUCPkk/rb5OC6icTL423zMQCu2a1JOXMwL5jukq8OXI0XvCTHN jSsA== X-Forwarded-Encrypted: i=1; AJvYcCW7rIY8/kmwfIdCTf1yYUlR1J6s58Sxt4IvtUkiICj/OJSDQWQrLbTS5N2F8zJ3Z56GglK59nYPGuEsxmkxH5fZSwQ= X-Gm-Message-State: AOJu0YwooS5fI0KMIIUnXctdfzMYpSr+TZUxFw7VBAiAtbQw72pL4LqI CDddkbsoUNgk0CNya1IoBoT7BHfbAjS8UyfLBGlFS9xnB1uGAC8t34fRoJWnN7AdUPQt1MYx2Wu H5x+7Z3NdiK+ptnv6kbPkeAVhDZRyQ7n5FkFV3ocnw8nrpNWx X-Received: by 2002:adf:e8d0:0:b0:356:2afb:7a62 with SMTP id ffacd0b85a97d-35e84068ab4mr4114537f8f.6.1717662625323; Thu, 06 Jun 2024 01:30:25 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHqMP6fUFyuO3Pdl6y6nz9tj+3ZkFKr+W+nlwDJvBrDwLUKlZM2oPDBMP2k9VSxD6eAyklg+Q== X-Received: by 2002:adf:e8d0:0:b0:356:2afb:7a62 with SMTP id ffacd0b85a97d-35e84068ab4mr4114504f8f.6.1717662624830; Thu, 06 Jun 2024 01:30:24 -0700 (PDT) Received: from ?IPV6:2003:cb:c710:8800:a73c:ec5b:c02c:5e0b? (p200300cbc7108800a73cec5bc02c5e0b.dip0.t-ipconnect.de. [2003:cb:c710:8800:a73c:ec5b:c02c:5e0b]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-35ef5fc0ccasm910536f8f.103.2024.06.06.01.30.23 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 06 Jun 2024 01:30:24 -0700 (PDT) Message-ID: Date: Thu, 6 Jun 2024 10:30:23 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize To: Yu Zhao , Muchun Song , Frank van der Linden Cc: Matthew Wilcox , Jane Chu , Will Deacon , Nanyong Sun , Catalin Marinas , akpm@linux-foundation.org, anshuman.khandual@arm.com, wangkefeng.wang@huawei.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20240113094436.2506396-1-sunnanyong@huawei.com> <20240207111252.GA22167@willie-the-truck> <20240207121125.GA22234@willie-the-truck> <908066c7-b749-4f95-b006-ce9b5bd1a909@oracle.com> <917FFC7F-0615-44DD-90EE-9F85F8EA9974@linux.dev> From: David Hildenbrand Autocrypt: addr=david@redhat.com; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63XOwU0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Stat-Signature: 47skkurayn681yqnrwj741mix9jcc96d X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 4785C40024 X-HE-Tag: 1717662629-565387 X-HE-Meta: U2FsdGVkX19C3seTLnzkWkplQjLZL2iMxQlYgUAVIpMTk6SnTfH1bmawxLA/61LKtQIjPJpjlNarZjVJX1gExlvnWEei6ducgssfn/nOqHkihN3Cjk5sjhmgRQLDvWunmP7VzbBQdx1UO86PM0WXD+Ci3D02Vaj0bmVPu/wlNrRwyrm4ZED6kftHYNuuigqoLiTq1dKkg+LBLWvnVoq1N5F7n9GH2UMVGERNQB8/j0EfqeaJAwGBDRDNlFjzbIUQRmWqZO0DuiBIbFiOan1Vy4KdPXk1tcVLJ9/imHlzadVBjGOMk78YVkYIpBOW2dFVuBHgengZ0UZpuotRULu0GVHi5g4Gtm+vXZkBidTlscH8IyHntzOwgt/L4t3eG6Go6OIl8zX4oMFyxveuBHXpzd3gKHX3UP/jJ7bpP91zbamHp9egFU9bBuoAzrv/CZGID0FRimJVklkVSlkUbXtfvxsjFyCx+uVuq/yMbLbbVMfSU0k7Rsvmn7NacxQwoV/3f+PcjP3u6nxreVj0TnFbmXB7tgQyHeEW9jak/k6PesdVyKFWyDOOGPIme0QP8VT1miUJkw79/VFau7VODh5bEk7M+8orVtl3Sfp0GaziEwONCN8aPuklOFqsZQgSeZj6klm+fLFt9FYDRLgoKYnm2KPvy+Vyr2QaMPXG2sByOC23Ulq+G6968st0LxQAFPX0HpBNekwDGHvZdonI8OFLuvGJgYH4WcokAbZP8IC7a9vDGvUnvNqPHcXTiJ4N+3/LEH+WlkbqLpATJC4LKx5NxQ/DZYr9pK9lLIsAqztNm+IoqorK/buTkHtK63bx7rvAtHLAf8LN3gTQc0HM4bjwsiWkdpbknbsGSZWMFYw5A4GzSVv63hoqYif1MjQvrkLAxcuzP//vwTqvoGwUFVM+EDGXXL/XBIbki7AEcPFGAqCKTGQIYgqeK8bTPP18wx+fE0ZgxdcO9FSQ4Mp2gh5 e3MsKsGb 5XnLKKBlyNkvzcGWydx78i8wLQnX7cne/MKZrF3J0RRPg5s6Ki5PG7DaKXSqbtIG/MSnr+x8uiVfVVYRRljU6L2TSCgnlNi1/dKd1aFhn8egyuwvgcbusrsFP0oSHwganb5hSkgHgGMHyfL8spIK6pBixEifAA415vZOe/+dR2i/uFQB/kytImAmdh0ZFY3uiNUUPlUSD8QHg5sBc/LXfCLoiA1LJe2qwf6eGsOIBNnUTccSXF1KZnfRtgYwm5ZKGLDob9rPivIZ4ltwB6OTNaoF7ukFtHx0TH8Okct92VIvRZOiRm+pMa1w1jTWJFILY3wnJrohXpjSaWXskuFmiZa5YMCbxGnZQcG0yCsP9wDmcgDvY1aY3zJONh1cXdBC5PGxHNbrs9KGXZbK1ziAeRsfso+D/BQD/g3uatt/yvn2sdm8wlS2rpDHOgaDV4XGT4enzZzzG9VZFYk3Zx/4mrZ0ygfd5TZoJKaXwGNwaJ+i1c6rIQGebIfZAgzgVc/RtfLGbtujnK2FP1JMr0LUzU/bxmw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: >> Additionally, we also should alter RO permission of those 7 tail pages >> to RW to avoid panic(). > > We can use RCU, which IMO is a better choice, as the following: > > get_page_unless_zero() > { > int rc = false; > > rcu_read_lock(); > > if (page_is_fake_head(page) || !page_ref_count(page)) { > smp_mb(); // implied by atomic_add_unless() > goto unlock; > } > > rc = page_ref_add_unless(); > > unlock: > rcu_read_unlock(); > > return rc; > } > > And on the HVO/de-HOV sides: > > folio_ref_unfreeze(); > synchronize_rcu(); > HVO/de-HVO; > > I think this is a lot better than making tail page metadata RW because: > 1. it helps debug, IMO, a lot; > 2. I don't think HVO is the only one that needs this. > > David (we missed you in today's THP meeting), Sorry, I had a private meeting conflict :) > > Please correct me if I'm wrong -- I think virtio-mem also suffers from > the same problem when freeing offlined struct page, since I wasn't > able to find anything that would prevent a **speculative** struct page > walker from trying to access struct pages belonging to pages being > concurrently offlined. virtio-mem does currently not yet optimize fake-offlined memory like HVO would. So the only way we really remove "struct page" metadata is by actually offlining+removing a complete Linux memory block, like ordinary memory hotunplug would. It might be an interesting project to optimize "struct page" metadata consumption for fake-offlined memory chunks within an online Linux memory block. The biggest challenge might be interaction with memory hotplug, which requires all "struct page" metadata to be allocated. So that would make cases where virtio-mem hot-plugs a Linux memory block but keeps parts of it fake-offline a bit more problematic to handle . In a world with memdesc this might all be nicer to handle I think :) There is one possible interaction between virtio-mem and speculative page references: all fake-offline chunks in a Linux memory block do have on each page a refcount of 1 and PageOffline() set. When actually offlining the Linux memory block to remove it, virtio-mem will drop that reference during MEM_GOING_OFFLINE, such that memory offlining can proceed (seeing refcount==0 and PageOffline()). In virtio_mem_fake_offline_going_offline() we have: if (WARN_ON(!page_ref_dec_and_test(page))) dump_page(page, "fake-offline page referenced"); which would trigger on a speculative reference. We never saw that trigger so far because quite a long time must have passed ever since a page might have been part of the page cache / page tables, before virtio-mem fake-offlined it (using alloc_contig_range()) and the Linux memory block actually gets offlined. But yes, RCU (e.g., on the memory offlining path) would likely be the right approach to make sure GUP-fast and the pagecache will no longer grab this page by accident. > > If this is true, we might want to map a "zero struct page" rather than > leave a hole in vmemmap when offlining pages. And the logic on the hot > removal side would be similar to that of HVO. Once virtio-mem would do something like HVO, yes. Right now virtio-mem only removes struct-page metadata by removing/unplugging its owned Linux memory blocks once they are fully "logically offline". -- Cheers, David / dhildenb