From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6B19C3ABAA for ; Fri, 2 May 2025 15:19:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 43EC16B0088; Fri, 2 May 2025 11:19:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3C9636B0089; Fri, 2 May 2025 11:19:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1F2CB6B008A; Fri, 2 May 2025 11:19:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id F00D66B0088 for ; Fri, 2 May 2025 11:19:03 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 7A88F1A1DE0 for ; Fri, 2 May 2025 15:19:04 +0000 (UTC) X-FDA: 83398325808.17.C6F5181 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf22.hostedemail.com (Postfix) with ESMTP id 03EABC000B for ; Fri, 2 May 2025 15:19:01 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="M74/+n5b"; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf22.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746199142; a=rsa-sha256; cv=none; b=j0LMUi8m4gWTuEaWQQdrw7tWiUj+4hqFXA+SILv/+U0UbIVK6J8UpPLLJpx8VOK8UGBkFO yKcDWheS0bLZLHYUoHzvumATXXo6xj3Z58T4NWCI58i8NqKgMT8uXEJPjn0X4UMnIMubO1 11S8Bi/edATSpEP+XWbOrpiOL/YlDbs= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746199142; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Us6VhmJrM3rYoUULahwLW5nyfA5jGnxxOG/f0HkFpxY=; b=yhIiD1Ulh4lbNpnqLeFQG0WqqHn65taajhhRGCJIhRT66b0puysDz7FBFI2JP0i2zlBNZf m8fF5beas3FO6LxhYyjZcPX9hk1v9LxjTkBu8k3rQI61xTt1yUCMlHAbkpPHNIXW0kuevU wIgqj/BhDR4hkP5tfSHav8o2r1iwTGo= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="M74/+n5b"; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf22.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1746199141; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=Us6VhmJrM3rYoUULahwLW5nyfA5jGnxxOG/f0HkFpxY=; b=M74/+n5beqkE64gr7CHVMJyA3QamHYd8Sczv0zrNoo5aXCR60v3f+R29SVk+HyFS1mPdrL QpFhVcqFcuzuqmxnyNawGkYKUT5EXY+rZ1ed+EDbTgF3NLG+YKSIZ+et6c/I63awy99Kij GjT/dCOh/Ci+Rw3CSdZdZuHbx+tlIkk= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-478-10abgKMOMsa6BvpCK6dlNg-1; Fri, 02 May 2025 11:18:59 -0400 X-MC-Unique: 10abgKMOMsa6BvpCK6dlNg-1 X-Mimecast-MFC-AGG-ID: 10abgKMOMsa6BvpCK6dlNg_1746199138 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-43d0830c3f7so14746755e9.2 for ; Fri, 02 May 2025 08:18:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746199138; x=1746803938; h=content-transfer-encoding:in-reply-to:organization:autocrypt :content-language:from:references:cc:to:subject:user-agent :mime-version:date:message-id:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=Us6VhmJrM3rYoUULahwLW5nyfA5jGnxxOG/f0HkFpxY=; b=ZSmno/6AgOaFhxdhKwJKKZO7rovCZkZjzC9f3lCS71xlAJ7pNn+YYcQ2ustkEHsbwm FzDoh7VSB3+3YIF1ePZWHWkDUa2wHlxCs47Rb4cvEJiWgwVsJzCruIofYLcLLNU0h2rI oWPtZwA6SvZpvH2E8bws8Qu8sukRt4o+T2l/8wdYq+HxalL1vsCu5ldb+yL247U6dUl9 IA9saLAv5bs/LM3Hpgs8edkGECz3Tg0qNXzOm1VIeOc7jakhd4g9hXm9BeNKH6Oj+PmD onrn9IT+Pe6RU8t4Lij9hz7/ZTCRVqzWuHutd3PrnQQyUE1Fv0Z2QnYNR60ORdjYkJD1 B62A== X-Gm-Message-State: AOJu0YxY9LUVkVduiQaRLM1IhN7Il86PWG7Oymjb8p7XL2WgDUUa2w9F /rrr1HViVAcy094yxO5BS5d+XE57sKp8F1tkOE3p58AaPBLglOC6FzpNaH1B16Ml/LDKAKeTtfc jl995aqo4lXL9X7XoQlAyh3gk7ac2bhr6/Sc8mPebKTGebO6y X-Gm-Gg: ASbGnctdOtgfDEW4im+hKdADvK+8smOeoPeIaHHUUEQIfM0iDedxtCxFQEFBldwndbO kze4Bmw+UdKEepma8fH4iHbdfS57dzeruD4kFEpR2VFN3Qdo5qKp/s6qCo7syV7YtIIzKFWy6fr XutPsJYihUfsoeSk22lixDny08Hs0OLH04DvQ7W/r4E0Z5tjl7xn1xaotaKdU+VQPWHPwYOQ2Yn J/BnyveWrlGIkgdudIdMk007Q+Bj7z8uroV8vvPpuUlN5yS0g2BOApxcPfm5pqZGfKAAGuOFqt2 efPLezcnRe+hnKyV05os1pTMaZ1DUdLB7yzyw1XofVjyuL9kE+Dn09PsbbzCLmiAp+ZFvaI77tS x3uQJkIKdO3isYr0mOh2FosHgv1EPhxdAZVQdgGs= X-Received: by 2002:a05:600c:3155:b0:440:6a1a:d89f with SMTP id 5b1f17b1804b1-441bbea0e1dmr29719575e9.4.1746199137921; Fri, 02 May 2025 08:18:57 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEnS/LGhMWVDo80l+sUfcRD3o1SQptcjxMx9v1ekJms8PTk9f2hteIrO5vQZg96jMx3hY+Jlg== X-Received: by 2002:a05:600c:3155:b0:440:6a1a:d89f with SMTP id 5b1f17b1804b1-441bbea0e1dmr29719235e9.4.1746199137505; Fri, 02 May 2025 08:18:57 -0700 (PDT) Received: from ?IPV6:2003:cb:c713:d600:afc5:4312:176f:3fb0? (p200300cbc713d600afc54312176f3fb0.dip0.t-ipconnect.de. [2003:cb:c713:d600:afc5:4312:176f:3fb0]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-441b2b28732sm92473995e9.37.2025.05.02.08.18.55 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 02 May 2025 08:18:57 -0700 (PDT) Message-ID: Date: Fri, 2 May 2025 17:18:54 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v5 07/12] khugepaged: add mTHP support To: Jann Horn , Nico Pache Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, akpm@linux-foundation.org, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, ryan.roberts@arm.com, willy@infradead.org, peterx@redhat.com, ziy@nvidia.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com References: <20250428181218.85925-1-npache@redhat.com> <20250428181218.85925-8-npache@redhat.com> From: David Hildenbrand Autocrypt: addr=david@redhat.com; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63XOwU0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: QZn0ngYUbukO9kDFGYsaNQPoWo6Q_PBCB4WQUzTDzd8_1746199138 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 03EABC000B X-Stat-Signature: 5bmtd8nkxxu9k1tuxjnkajazey8aqywy X-Rspam-User: X-HE-Tag: 1746199141-133872 X-HE-Meta: U2FsdGVkX1+sJvLjcxRMdOEn0TDzvUN6B3rv5UQ/XhVCYU4+GxYCSjUvlfoIfTKk8ookMQPyrS/ie2doAt9V66xQPwvHNa0dtJN1uSJIccgdyP00GQ8QM1gjVxW+CBV+Z9Gu/VB0Ccbd9RVCdmgkDD65aXOho1xx3ZmWruyvgLmgAM6Xl1iI4mBftxs6T2CvqAy/b4jJCFl42AXucM2CuvXtgN6odqMSuuoSXdpF9BMCrIT1vAcNch54n6PrXVull1Zwhpw5hsRwbZ9EBg/UXKctzBlyznsM6fwd6P7Oyt69FMI8n2ZtdpDMGg1VIkQ/Ssp75nvmQsdMzCAPq0/AwXrGfM9xwX0ly+hrhk7pqjcSTyh3YNcAOpS+WDK6QWPsH7NKxfffO2EDusBbzInQLwbLMrqzoTZn5S6jgJYnFcc4IYwGU0JnC56Qy9byH/PZzjLj6zgcpRIcTDuOLLCEx5UNn0yJ9YawcnMlwOKt42GWIxBSjfe/4L73cwBfRQ8MDZaamd2T6PTPLs449yDFGYQiFYS/6+52RoWsSRqxcAvKeXzBo6oj0oVEHzsMGg297ekgjJvE01T8+h4i09irrDo5bYzYVwUQnHaikQjmdg584K9UBumq4I7D3JhJ3rmW73BIstXqqmNymKZIEtigzWQfJpzcfJvQakrHDAEIaUc/Xg6Xi7i2MTar3WCfr9jbfp0VOHUwyLJIXeulihjOR8fu56ygHtFAN9R3sQkzkg0qu622RTLgglz3VgZtj5SyQ4wuf/CRZmkO2+a6zYG0fZ4ajOrnIqWcwV4ZJsyxqvvkLVd3zu3RqzRnqRDX2SfTdTTMh2pN1jNdlPn34cip0wf4ly6L9FzkXGZJKSF0e6tOrnSaBFyL6/F7EsPiKd42gxgVDGTNjiA7xhvNVcVzUWOhYO/ib88/WvOwctP5CYamk4yg9gIY1CUCAjtHgHUX4ukvjzfWeAp4TYUduwx nRAeiVlT QnVNAnvudJQGe5Yxly2XjDW3+rM1O3igieppgq0dxPFFu0Hyx8rMuZ0GMwMp0nhaLF9ykKQRyy/FhaLroMae9l/OnglAedb6S5ct7XE5Ar79evA42OF1UhwJvi3zarmXHVeLTFF1VD98Y/baotZazBiG+3xvfarfWkVuW+BXR9AUUJ4mi2MdPv00txWNEUflkO1iaNTVq+NXnne9WA/q05R3boDc6IOyaCSkAjWXsKzcuzD8MO7Ljzi5q175RE8/eqmhbDBGzVG/UXDlJZ/b3hLbz/EjrBY9SljQcPgCRsnPjCLQYDRVUKVfmGR0fYOuj4gbIY05q9eui9Fg6Jla6aevoST7ui5sDVd6dCqJ4N0gWRgnH/iy0EI6DnCOdO/Zzvl/iBk/KdzlCWupuY5e4FHxuQMxJITT/dE9ixMCirDixWpiKrqzvW+KGurW7p4ZtlBpjkrOi0RqPLxLbG2oPMaSxpIqBKQdWRRPOBaA4QEoSpQVFR8iiY0NokmI3FGQoM9JQpDD20+p4bLPeR+oB/nsGggA2VNlT9nIYyqba75uCOGc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 02.05.25 14:50, Jann Horn wrote: > On Fri, May 2, 2025 at 8:29 AM David Hildenbrand wrote: >> On 02.05.25 00:29, Nico Pache wrote: >>> On Wed, Apr 30, 2025 at 2:53 PM Jann Horn wrote: >>>> >>>> On Mon, Apr 28, 2025 at 8:12 PM Nico Pache wrote: >>>>> Introduce the ability for khugepaged to collapse to different mTHP sizes. >>>>> While scanning PMD ranges for potential collapse candidates, keep track >>>>> of pages in KHUGEPAGED_MIN_MTHP_ORDER chunks via a bitmap. Each bit >>>>> represents a utilized region of order KHUGEPAGED_MIN_MTHP_ORDER ptes. If >>>>> mTHPs are enabled we remove the restriction of max_ptes_none during the >>>>> scan phase so we dont bailout early and miss potential mTHP candidates. >>>>> >>>>> After the scan is complete we will perform binary recursion on the >>>>> bitmap to determine which mTHP size would be most efficient to collapse >>>>> to. max_ptes_none will be scaled by the attempted collapse order to >>>>> determine how full a THP must be to be eligible. >>>>> >>>>> If a mTHP collapse is attempted, but contains swapped out, or shared >>>>> pages, we dont perform the collapse. >>>> [...] >>>>> @@ -1208,11 +1211,12 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, >>>>> vma_start_write(vma); >>>>> anon_vma_lock_write(vma->anon_vma); >>>>> >>>>> - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, address, >>>>> - address + HPAGE_PMD_SIZE); >>>>> + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, _address, >>>>> + _address + (PAGE_SIZE << order)); >>>>> mmu_notifier_invalidate_range_start(&range); >>>>> >>>>> pmd_ptl = pmd_lock(mm, pmd); /* probably unnecessary */ >>>>> + >>>>> /* >>>>> * This removes any huge TLB entry from the CPU so we won't allow >>>>> * huge and small TLB entries for the same virtual address to >>>> >>>> It's not visible in this diff, but we're about to do a >>>> pmdp_collapse_flush() here. pmdp_collapse_flush() tears down the >>>> entire page table, meaning it tears down 2MiB of address space; and it >>>> assumes that the entire page table exclusively corresponds to the >>>> current VMA. >>>> >>>> I think you'll need to ensure that the pmdp_collapse_flush() only >>>> happens for full-size THP, and that mTHP only tears down individual >>>> PTEs in the relevant range. (That code might get a bit messy, since >>>> the existing THP code tears down PTEs in a detached page table, while >>>> mTHP would have to do it in a still-attached page table.) >>> Hi Jann! >>> >>> I was under the impression that this is needed to prevent GUP-fast >>> races (and potentially others). > > Why would you need to touch the PMD entry to prevent GUP-fast races for mTHP? > >>> As you state here, conceptually the PMD case is, detach the PMD, do >>> the collapse, then reinstall the PMD (similarly to how the system >>> recovers from a failed PMD collapse). I tried to keep the current >>> locking behavior as it seemed the easiest way to get it right (and not >>> break anything). So I keep the PMD detaching and reinstalling for the >>> mTHP case too. As Hugh points out I am releasing the anon lock too >>> early. I will comment further on his response. > > As I see it, you're not "keeping" the current locking behavior; you're > making a big implicit locking change by reusing a codepath designed > for PMD THP for mTHP, where the page table may not be exclusively > owned by one VMA. That is not the intention. The intention in this series (at least as we discussed) was to not do it across VMAs; that is considered the next logical step (which will be especially relevant on arm64 IMHO). > >>> As I familiarize myself with the code more, I do see potential code >>> improvements/cleanups and locking improvements, but I was going to >>> leave those to a later series. >> >> Right, the simplest approach on top of the current PMD collapse is to do >> exactly what we do in the PMD case, including the locking: which >> apparently is no completely the same yet :). >> >> Instead of installing a PMD THP, we modify the page table and remap that. >> >> Moving from the PMD lock to the PTE lock will not make a big change in >> practice for most cases: we already must disable essentially all page >> table walkers (vma lock, mmap lock in write, rmap lock in write). >> >> The PMDP clear+flush is primarily to disable the last possible set of >> page table walkers: (1) HW modifications and (2) GUP-fast. >> >> So after the PMDP clear+flush we know that (A) HW can not modify the >> pages concurrently and (B) GUP-fast cannot succeed anymore. >> >> The issue with PTEP clear+flush is that we will have to remember all PTE >> values, to reset them if anything goes wrong. Using a single PMD value >> is arguably simpler. And then, the benefit vs. complexity is unclear. >> >> Certainly something to look into later, but not a requirement for the >> first support, > > As I understand, one rule we currently have in MM is that an operation > that logically operates on one VMA (VMA 1) does not touch the page > tables of other VMAs (VMA 2) in any way, except that it may walk page > tables that cover address space that intersects with both VMA 1 and > VMA 2, and create such page tables if they are missing. Yes, absolutely. That must not happen. And I think I raised it as a problem in reply to one of Dev's series. If this series does not rely on that it must be fixed. > > This proposed patch changes that, without explicitly discussing this > locking change. Yes, that must not happen. We must not zap a PMD to temporarily replace it with a pmd_none() entry if any other sane page table walker could stumble over it. This includes another VMA that is not write-locked that could span the PMD. -- Cheers, David / dhildenb