From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB36CC46467 for ; Thu, 19 Jan 2023 22:35:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 364956B0075; Thu, 19 Jan 2023 17:35:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F7476B0078; Thu, 19 Jan 2023 17:35:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1B5D56B007D; Thu, 19 Jan 2023 17:35:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 0B5B36B0075 for ; Thu, 19 Jan 2023 17:35:52 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id D0DB1C0DC8 for ; Thu, 19 Jan 2023 22:35:51 +0000 (UTC) X-FDA: 80373007302.07.5B8E0E0 Received: from mail-wm1-f49.google.com (mail-wm1-f49.google.com [209.85.128.49]) by imf16.hostedemail.com (Postfix) with ESMTP id 0441D18001C for ; Thu, 19 Jan 2023 22:35:49 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=PXnUA2TM; spf=pass (imf16.hostedemail.com: domain of jthoughton@google.com designates 209.85.128.49 as permitted sender) smtp.mailfrom=jthoughton@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674167750; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+5uMI4CJ0tmLZKY5iYY8/PVPFxzdMy3XqhHCLQ4x93U=; b=krdl6cmuL2hLrrlEcVpYUR1rHCi2rw4+h4MNJGaRgH7+H/KwjTCdQtR+5miw0Gh0QaHZab H41eQNcS6k9eTu+/qKQIlIPmsOXWtfHajMcjDmnLgqjFxk9oT+2rUPXhG+Q2+aLaKxBM2w QfX8X4w0moKjiMBUSdjUfTNT9PFLJa4= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=PXnUA2TM; spf=pass (imf16.hostedemail.com: domain of jthoughton@google.com designates 209.85.128.49 as permitted sender) smtp.mailfrom=jthoughton@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674167750; a=rsa-sha256; cv=none; b=k+q/cC/PVm6Z+hS9jKyW4K9mrNF8T3HTgrdHskMIGIQZk25sysRCUZiM0lMgz0FhaN5MvE aB7Z6fiRbyyp4q8AQxXz4TPm6OA/Q461LY7Hn+m/Wd42mgTML8pmJrgvZvl93OWeiYojFL TnrY4B8lMFI/CTZCe4GHAJyM1wG56kA= Received: by mail-wm1-f49.google.com with SMTP id m5-20020a05600c4f4500b003db03b2559eso2374652wmq.5 for ; Thu, 19 Jan 2023 14:35:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=+5uMI4CJ0tmLZKY5iYY8/PVPFxzdMy3XqhHCLQ4x93U=; b=PXnUA2TMT1ZJGGkhQcewcqvTOgWQghOf9G1hAWc2/ueZAI+bUtmtWRyV5MSLHHbGZe q9ybynQ/kknfNfIE8kCpZkjGKoodgl6KuQt/wNR5rIKGRPZk8Cb603+HbtkdnlTdWL2r PIoaDLZqbxYf/xH7ABz24Zxsnrf6DoHno40tZfYjVT93J3+bLaoc9/q/HD4eN2+U2QPu vopVl4hLKKeHaYK0uNq7eBK6k+aEBkvZuATrd8tAeaaDbxOv5BoGVxwmtTryT5x+kzoV mJw04yaPLnAKOgiUbUIFcItq1R3BcA2lvYh5bc6ZF9Z93M+TwKwaEcXDijZyiGpGD76h pXFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=+5uMI4CJ0tmLZKY5iYY8/PVPFxzdMy3XqhHCLQ4x93U=; b=qyuoPPaS8PG5ELzvCQMcWDd8Bbd1MjyYacRS042zNZpQFtT5PKLyLBmFfbnbvZKYFV ueNRMThyRv8zVLN7QeUOwMwg2aD2wI/F40hEaCuPZI6vnC6BjJtigcUeqH3mAgMoCGYY tUvhHgfbYzlmwo2A7hAdxFcFADSA3XtLmQ2yEDNV9Y1StzQsTz3Vhvf02kpDPz05url6 +81WtfrMUxANNjIVdULH0aZ7K9FhOe29BYr6PvMKsiTVmSNkbCSQEjBvAxYPaeEWpWlR B7GTw9MJ4++zXwAiA1scZbr91Hp4ZrZNWltkHzqEiwFiIvMsWq8xa7Q9wUlaxaslBt/q juNw== X-Gm-Message-State: AFqh2kpPLieSgyWuvhgUtZ8NyKDhPqey/DP450/1qohLK0LM3S6x1AKc gkpOX+TA4wAAhZa1O5fEwrWrKvSIM8O+QiKAk2MIAA== X-Google-Smtp-Source: AMrXdXszZA1YB3+62ISSDQ0qN+Gew0cUy7aPrrR3EadtMSVxYbsrN1yH+woao6S148BuxneRIO+eolk3FYMMbWZQVDk= X-Received: by 2002:a05:600c:3095:b0:3d9:7950:dc5f with SMTP id g21-20020a05600c309500b003d97950dc5fmr668501wmn.120.1674167748498; Thu, 19 Jan 2023 14:35:48 -0800 (PST) MIME-Version: 1.0 References: <06423461-c543-56fe-cc63-cabda6871104@redhat.com> <6548b3b3-30c9-8f64-7d28-8a434e0a0b80@redhat.com> In-Reply-To: From: James Houghton Date: Thu, 19 Jan 2023 14:35:12 -0800 Message-ID: Subject: Re: [PATCH 21/46] hugetlb: use struct hugetlb_pte for walk_hugetlb_range To: Peter Xu Cc: Mike Kravetz , David Hildenbrand , Muchun Song , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 0441D18001C X-Stat-Signature: okeqt1o4nydr8sto4f8a7zwxeo61kawq X-HE-Tag: 1674167749-448923 X-HE-Meta: U2FsdGVkX1/wMMYdTd71GeHJrqpNDAYbpOCJi3h/tnk4o3Wsfhm8s+mlf1UFwOyk5myzKK8mRBJO9bGxme3eH0SnNuT0sYCjXmYKWvwqN//U5vISBWea1EqxcJEJK9stylnauKcI9jOMYetQqe3VaWF4VTJElDWhDN+9ylOkim4QVtK4Xvjlez21JGpUJtd4iF5Yx4ZN0N5ppgS0UiIINfDqn7YXrtfVKlvGw9AMmgqJqx1HnGETWCWg0zTW+46zvf0V1fIbJk9RsP19M2m0IkxXTC5CVT04qEDsuKn4WiY2DNxpcu5m1bwmLdH+hrPXOY4dcvjDGYs6dKdyH2fs3SEVevKCwoCqnD9us1D4Idk4PJh2GRmirQAw/2olTNhlG7fCesEO34JTpUzmmiZfLTEZuas72WAk4fgz5JaAhQFgJSVwhJpyCCXYD2QMl83Arym2Q2Ltjw8FF1JVCx9dbkTrGPGfJVgIlCPpr7m8L5Rt5gf3KIZSqnhte60uDdC8M5GqAdMuI6LIQIsf+mpcXgLmbDT1L5bPrS7qV/JHvOv/uRiAqaR9BjVJAf8CjrvWnI8/e0HdRADpx4Pb5RmWSF9S7xsT+FhAOP9NxU4U9iISYkwAsxag/fLS4koXYqIkRtQ98pSL94wC1QOyFAFddauxMnHtwXgPC7GeMUHfHYz93gx7fT1PodJRzNTLSlpv5r11O/rnUwWkwtZL/WSKuZWdxvaLvKHHLIfPPq1MtUocwZVTuE0e2L7BFPgFt5yXGL4wpuzegNziFRl6J2fgdaz3UaXEF3L8BeguWqryUwT3q7L08GqilpUIICRSPm1h4ZuDxlKmdUeTVaSi/t5vvfepCBJanYOmqFiKOUeyEHRCvvXBQZjh4lRFmbP7/azMVUKRvuGzVLyC1E2eM19RLN9yO8jEh9mCG3o0PPyLVpdSb/loF5L6fzWY2jrzH7UUx5ynIQNEwsclnRHRHuM K9TCU99/ 4ZoDqPEeiteOf07w7gVhoQGUUfllwUEmqDlIzl9An+BNGbxUG38jjn6PQ1OymNPAOgkU1eXstq70xEOTexs2gIEQzqSGNf2a/1XH8hMtGCs6wCC4+41mxj/eU5sdYnRse1t7VgH6GJBS+4rdjAwUt9Y0httQ9Qc9NVoakIAFpMxCscWuUuy/HIa+74Sv09nXx8zpYhRrl4oMjOEGeqSVzJHVRRxwQWu6fknvWrX/LSbY2RUOwNKPdr3AJMmQRSYjOMwG9/8nW2NLs00XmrFXJgjJebvNQpWmYgRfrRG/uEYzhHSGUrmtXnJ2C4w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jan 19, 2023 at 2:23 PM Peter Xu wrote: > > On Thu, Jan 19, 2023 at 02:00:32PM -0800, Mike Kravetz wrote: > > I do not know much about the (primary) live migration use case. My > > guess is that page table lock contention may be an issue? In this use > > case, HGM is only enabled for the duration the live migration operation, > > then a MADV_COLLAPSE is performed. If contention is likely to be an > > issue during this time, then yes we would need to pass around with > > something like hugetlb_pte. > > I'm not aware of any such contention issue. IMHO the migration problem is > majorly about being too slow transferring a page being so large. Shrinking > the page size should resolve the major problem already here IIUC. This will be problematic if you scale up VMs to be quite large. Google upstreamed the "TDP MMU" for KVM/x86 that removed the need to take the MMU lock for writing in the EPT violation path. We found that this change is required for VMs >200 or so vCPUs to consistently avoid CPU soft lockups in the guest. Requiring each UFFDIO_CONTINUE (in the post-copy path) to serialize on the same PTL would be problematic in the same way. > > AFAIU 4K-only solution should only reduce any lock contention because locks > will always be pte-level if VM_HUGETLB_HGM set. When walking and creating > the intermediate pgtable entries we can use atomic ops just like generic > mm, so no lock needed at all. With uncertainty on the size of mappings, > we'll need to take any of the multiple layers of locks. > Other than taking the HugeTLB VMA lock for reading, walking/allocating page tables won't need any additional locking. We take the PTL to allocate the next level down, but so does generic mm (look at __pud_alloc, __pmd_alloc for example). Maybe I am misunderstanding. - James