From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 573B7C3271E for ; Fri, 5 Jul 2024 15:49:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DF59B6B00A0; Fri, 5 Jul 2024 11:49:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DA5DF6B00A1; Fri, 5 Jul 2024 11:49:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C94876B00A2; Fri, 5 Jul 2024 11:49:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id AA5F66B00A0 for ; Fri, 5 Jul 2024 11:49:20 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 32EE78022E for ; Fri, 5 Jul 2024 15:49:20 +0000 (UTC) X-FDA: 82306133280.02.730CB1C Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf22.hostedemail.com (Postfix) with ESMTP id C7407C0007 for ; Fri, 5 Jul 2024 15:49:17 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of cmarinas@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=cmarinas@kernel.org; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720194538; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Y3TM9a+vahUd2OZWWjbK8Q2DuWYIOpqWaUcrwZ9bEx8=; b=FcbHd32CJdoBwx/wFNykHXiOhiqH+WOt+XqqF75AIwNLVG230zHgTP4AdRtxxN3s+Ujcbk lqBVvTnOGkyC0afhf+ZC4bI09s44YpzjqaT1vzrSqj/4ao43V7Ya+NZHObKx7LPOM+SZgS UwT6InbQiAAyPdZj42NWxVQI7UjyzY0= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of cmarinas@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=cmarinas@kernel.org; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720194538; a=rsa-sha256; cv=none; b=kl7xmu/gKiNBYZfAqboxoLHQhCzCbMYqoUhAeZnUWOq/YDNJ0kPVw9nzsOJ6YhKMkWce4G uO521bcnQWePoGZaAFd103dvnmfC/hRC0Vl33xXQCY+m6brHkASXaF6iR9d1KuYqJ324et xxLQNbqGl/rN0igGzJSwRPzT3PpktFU= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 74D4ECE3E48; Fri, 5 Jul 2024 15:49:13 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7B508C116B1; Fri, 5 Jul 2024 15:49:10 +0000 (UTC) Date: Fri, 5 Jul 2024 16:49:08 +0100 From: Catalin Marinas To: Yu Zhao Cc: Nanyong Sun , will@kernel.org, mike.kravetz@oracle.com, muchun.song@linux.dev, akpm@linux-foundation.org, anshuman.khandual@arm.com, willy@infradead.org, wangkefeng.wang@huawei.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize Message-ID: References: <20240113094436.2506396-1-sunnanyong@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Stat-Signature: 357tto46ahpcdn3uregbzkhyht6wqato X-Rspam-User: X-Rspamd-Queue-Id: C7407C0007 X-Rspamd-Server: rspam02 X-HE-Tag: 1720194557-561078 X-HE-Meta: U2FsdGVkX1+21JlCAw1C7t/t2stSLv0UumJ7vekC7qC51RxDxsohW0Oo2cUOF+3Kdts0FrxXTn0nDc/1H47lRPdSiWUSdk79VHOec+p0LuqTLaoZ/0Lc7UThinfKxeNwyPqJ/HRuzQRVTn2bm2U70QKfrw+x+J/0COtF4/SrgPh9z0vbwY82ohyzs4FSWJUgctG6pF1IRxu3c2eA8QpEw1WEm9qZXv9x5QRbSIcv7LRyrgJh5TvILzU5yh6rrn9wZyD0ls+CifkF/5SLRSXoM0SPowmJVmZegoSrVnGdXcw1n/NAH+U6p+8wZNCIMh5Z3HZFUC/Q8rq5o2gvbcDEYfpycPN0L4eJF95aZYcTj/YT+Xiaoec5uzR8DUwqDtwiKi+KS0epotwTTOXJQ5h9+z7ArNrG4IENh4mJG7lao8JJUak4trm4G8Ac1xfLTLu/nUuhH3E2QvBsE6N1T6wjXzSzzhEOPxAHHRzMo2G90Cou9vZ1eTDlKsL/S89cVW9qPRpDjcTj9vX8pQMOoA66Isdd5rTNyz6NfazDjttChMVe1gQslpXxvw5V8k6HZiXOj9vEEYrB7snV+XbfOPzXVCkVqcckJslf554B7GlINCMahGm0K867/4u+NPuzM/MqkyLVgZrEkcK3fLMpLVoueGT/KDpoZ+gA8VCMz2i6Gj7sSLNxt3djeSvp/WiEqexeyPf9jai/FHh4gawN/d1+xJ6BjpCh7BVqxhI1sB15Jw6jUeu5cdhnVaKiYzkyDrDW3VB19gSy8l8GuwQ2kgVNSjokBf4G3Ut+coK0FyJ2lO3+4HlCKCTM9Jsbxs92N347XNUFWm42+u84McTUYQ+7s9mBm/7+UttpL7EiOcgTWkJBPQCK/OVixw6U8kwd39hSiHXdCt21ronEYcIqGUL4GE+CFzdaKyleICO0XWobLxcIgUeFH5vAxlStgm428ioE8jSTYDyPTzTSHLMQKK6 XibXnSzY VYYQZqXJaxoElKSCx8KEdhJ+GfXAOR2eOke1qVsDROjYgYw8YnAPg54454MqM7yYVNJGw6oXNB+BDdsezS8+HNnIS7X0hdAOaLSuWcE7k6o7PHF7brdr8vkTET/+O/5GCUw+0fdh2O6EHWH2SIprH06v4kh+iQchCAMW+z6B+GqmONcLCLMEoJVJfYIMlbLZPyqYCiQov9pdc5EDqzP+kmGrXl3djiDwBX1SzJcKilE/P+ygyIaZlbtFDb4DvLUFBA9+q+a4zrfiX3oIEgTDq1VDwDZuAr5vre1MBJcQEYF35l2KJRK1weMR+RHFeMlhlUkE+LuL35YC2sd9ueiZikWkBfJqc+8VWpAnU7ARCTnU2/KguPFeU3W9GAXenvo484rCVqcIyBdSOZqItOZQE55wO6F+2eq5imcRcXJ1n2KORG01iAi2UoslRWw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jun 27, 2024 at 03:19:55PM -0600, Yu Zhao wrote: > On Wed, Feb 7, 2024 at 5:44 AM Catalin Marinas wrote: > > On Sat, Jan 27, 2024 at 01:04:15PM +0800, Nanyong Sun wrote: > > > On 2024/1/26 2:06, Catalin Marinas wrote: > > > > On Sat, Jan 13, 2024 at 05:44:33PM +0800, Nanyong Sun wrote: > > > > > HVO was previously disabled on arm64 [1] due to the lack of necessary > > > > > BBM(break-before-make) logic when changing page tables. > > > > > This set of patches fix this by adding necessary BBM sequence when > > > > > changing page table, and supporting vmemmap page fault handling to > > > > > fixup kernel address translation fault if vmemmap is concurrently accessed. > > [...] > > > > How often is this code path called? I wonder whether a stop_machine() > > > > approach would be simpler. > > > > > > As long as allocating or releasing hugetlb is called. We cannot > > > limit users to only allocate or release hugetlb when booting or > > > not running any workload on all other cpus, so if use > > > stop_machine(), it will be triggered 8 times every 2M and 4096 > > > times every 1G, which is probably too expensive. > > > > I'm hoping this can be batched somehow and not do a stop_machine() (or > > 8) for every 2MB huge page. > > Theoretically, all hugeTLB vmemmap operations from a single user > request can be done in one batch. This would require the preallocation > of the new copy of vmemmap so that the old copy can be replaced with > one BBM. Do we ever re-create pmd block entries back for the vmmemap range that was split or do they remain pmd table + pte entries? If the latter, I guess we could do a stop_machine() only for a pmd, it should be self limiting after a while. I don't want user-space to DoS the system by triggering stop_machine() when mapping/unmapping hugetlbfs pages. If I did the maths right, for a 2MB hugetlb page, we have about 8 vmemmap pages (32K). Once we split a 2MB vmemap range, whatever else needs to be touched in this range won't require a stop_machine(). > > Just to make sure I understand - is the goal to be able to free struct > > pages corresponding to hugetlbfs pages? > > Correct, if you are referring to the pages holding struct page[]. > > > Can we not leave the vmemmap in > > place and just release that memory to the page allocator? > > We cannot, since the goal is to reuse those pages for something else, > i.e., reduce the metadata overhead for hugeTLB. What I meant is that we can leave the vmemmap alias in place and just reuse those pages via the linear map etc. The kernel should touch those struct pages to corrupt the data. The only problem would be if we physically unplug those pages but I don't think that's the case here. -- Catalin