From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BB32EB64DC for ; Thu, 15 Jun 2023 08:30:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A7D866B0072; Thu, 15 Jun 2023 04:30:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A2D056B0074; Thu, 15 Jun 2023 04:30:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8CDCA6B0078; Thu, 15 Jun 2023 04:30:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 7C82C6B0072 for ; Thu, 15 Jun 2023 04:30:03 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 454ED40A81 for ; Thu, 15 Jun 2023 08:30:03 +0000 (UTC) X-FDA: 80904309486.07.E56DD28 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf20.hostedemail.com (Postfix) with ESMTP id 08EC41C0009 for ; Thu, 15 Jun 2023 08:30:00 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=TRFyNz4S; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf20.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686817801; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=S5cBvLhM4P4TlEAczujm93mn0s99WR5JLl2Nb6k/a2w=; b=58COv525I0CRRK0uYoAr6QwNS97RUHlePVpSS5gf0GuqQlGPx7e5W+kf6fYz1lkj+rb5TA KfC7Nij+Lui508lktwLFiJw2fLT6cBc/kshm9sdX8uML3VwxQSHp7cTp99W2IIBrjR+Pu4 nb/nTiUjKHntoekpC/EKTsRHQ52O4Ww= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=TRFyNz4S; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf20.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686817801; a=rsa-sha256; cv=none; b=w3SaOSxeO/nMhpSBfE4pk/RYl+TX7yPQTFsx3nTz91cMBH8BX2A2S2WsqjPExebpsEXoXE pr9jDcWDS8oOu2UIrh0AeBT8wypHivm44/AW582aCZkkmSWt41A8MLFqZQQonpk/lanqeu tpVwP7g8jkG/S/sskpOeXTUvaIb47lE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1686817800; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=S5cBvLhM4P4TlEAczujm93mn0s99WR5JLl2Nb6k/a2w=; b=TRFyNz4SVaaHeUmlYt71icwWIoRSICJuwF+6PHZBNvAMslkhhpHfyVUW1UOKaUYum88L1y aXUQ0q3Re3jCfxguhQbotvzvXr0zdyOjgNMxdoKvX6y7tFVtuv2YDy+94v0EZM16WjHa8E 2+O2eNrxFxIMEqQm6vVBuauROuQ89wc= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-435-jZC6iZkjN0WDUlEyXWeuRA-1; Thu, 15 Jun 2023 04:29:59 -0400 X-MC-Unique: jZC6iZkjN0WDUlEyXWeuRA-1 Received: by mail-wr1-f71.google.com with SMTP id ffacd0b85a97d-30fbb0ac191so230762f8f.0 for ; Thu, 15 Jun 2023 01:29:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686817798; x=1689409798; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=S5cBvLhM4P4TlEAczujm93mn0s99WR5JLl2Nb6k/a2w=; b=azID7SGtNCoMuNzNeKTZkE1vQOc/iX/3C7V5+0AYEfe+f0iII0dCarzmWfZvBOGKkT O5gQcFPTyUylLe60wUmipxn7gW8XLnst52H+XApYw2CPI6HDrl5fdaS4YHReb/FddM8L OCFRuMrf77Z2osPndzpROZ/p8REb+ETtKAjNkO4nqCzBm5arTMHXS2wKxMZ7C1F0J8VQ yLtW0VswzUbI42HUuQ/SPpq260v3SOu3402iACj4QkZBhqaOna1DGGOX5Z5vVHpw0O5D fHvtR5Re5NNUsyoWFrYanXc7d1q5R4IajOQC6SHlxsZ6F8C2gj0wEJivFZrd/gIfTCt6 euhw== X-Gm-Message-State: AC+VfDx2UJYyMxFA/6xvzE6guRO1Ij/99CrQcNkBJUvVRbz76dpSSISD nkoe+6SZVmHBr13tM4znDLfkeJRgQy1GSY3hl5NVVz5EYj4Hkkdk1/AJAxEbkCAQN1G0TluuaAy KKxwyx1wFIBE= X-Received: by 2002:adf:e5c7:0:b0:30a:e3bb:ba8b with SMTP id a7-20020adfe5c7000000b0030ae3bbba8bmr3331346wrn.29.1686817797829; Thu, 15 Jun 2023 01:29:57 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5cj9jKGDTq/VIh6W8MeA/NLrum9D974Fw/orwFsk+LBgIN2Tpzl808wqdtoP1LeEsmxDjyAQ== X-Received: by 2002:adf:e5c7:0:b0:30a:e3bb:ba8b with SMTP id a7-20020adfe5c7000000b0030ae3bbba8bmr3331322wrn.29.1686817797358; Thu, 15 Jun 2023 01:29:57 -0700 (PDT) Received: from ?IPV6:2a09:80c0:192:0:5dac:bf3d:c41:c3e7? ([2a09:80c0:192:0:5dac:bf3d:c41:c3e7]) by smtp.gmail.com with ESMTPSA id k6-20020adfd846000000b003111025ec67sm2508708wrl.25.2023.06.15.01.29.55 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 15 Jun 2023 01:29:56 -0700 (PDT) Message-ID: <141b7088-684b-32dc-efe4-03713d38ae28@redhat.com> Date: Thu, 15 Jun 2023 10:29:54 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 To: Michal Hocko , Mike Kravetz Cc: David Rientjes , linux-mm@kvack.org, James Houghton , John Hubbard , Matthew Wilcox , Peter Xu , Vlastimil Babka , Zi Yan References: <20230614230458.GB3559@monkey> From: David Hildenbrand Organization: Red Hat Subject: Re: [Invitation] Linux MM Alignment Session on HugeTLB Core MM Convergence on Wednesday In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Stat-Signature: fw18e8587qmeknq8x9sgu5qnby9f6hin X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 08EC41C0009 X-HE-Tag: 1686817800-612126 X-HE-Meta: U2FsdGVkX1+3MLzllaE+bwJgd301Hd7sZzM1yV3oeH3gnYcH23psUVtgIyX2VA0x1BlTVjULPpJktb+KiEjNBOC066wcSdro0HQBi6kln8xRRNyKSfGOxofvCwe8jBfDNCubQb8LtN2YdwIeKyydMMbwe0zuUBHjxowHN8XWtTJdBZ0deTPs3qPMxAqfxt7JI9R0D6i5Iida21Kv8j8jvugB8mchkb9M31pvaGVOV3n/N/Skev+f46djRtkvNuBrqIhTKwvHZbTiJXuvE6UtRTK/LzZzv7QwVY/DWViv7/uo6xEVlpLk3uf+nd0+F5DQ41qKxHBazgE5hqFlP79Kb4Juk+4KQg/iFSzyXZv5fG0RVXjD9+vbseXMXv2tNmCH+05N6xoZS7AEbDpIMENcbLO0d+YX+Ye4qC/3fP6Sp2EqO2A0qaKKrkCzLYdilfINvkT4PKq679YdRw+cNrLvXDS3v41TAUjq+OUaYwUIaTOCxFvv3ATrDVTjocTQB/GxzTle/DOWo6N3EnwRNiQ2UMavm3kf91NufVSxFyzxkA7QCAf4wcljBc7abROI7X6vWJXV7LfXOy6ix/aafoyuIotaB7BR/GjYp8L+UOwdz1gul7fisy7uXOJnGhYTgsEGrJaElMV9sSylMh4Hl01MFLALUVEaIR1w7sTH21ltzlk1i1KIuWhnjSd/HF7G4DEs4w4K1q/VveZG+h6PG2TgLCaz45fsmrzfk1J4Z7wiLqASc8pNmR+0QAR2nI1c3GDp/MdRLlutAUpzJBzIwYTcTrdpRBUHK3DiKHaan7MQ34uYWY1yOrcZY9RUPrzYqVu14mkDAGdbMxXWhjbg/xQ5ffQ46jUBsp+/6nqIKaEvw3wtBh64VGuW8uHxaaXVfihS+r5Cdk4NA5w6c10fS/QQyLPElFdEfA+qWeGbgQpcYsUDpMmAZ6vL8jTpJg4gDLCgTspswgnFdtyIrkMj/1a 19FqZ9V9 bk5SkJUWK6mZBxCi41XkuB/07FQOT1oYDwQtk05gkOYGFtroTiADmSBomsUQEfDE58GJmu+GZz3ddHuFiBcB3tRrfASf4NSNImHQ/BhW6VhJoyRd9vb9+QmtzBRc7RFtAI775rdz7yWJqYp8Aix1J082p+sS2Mta7PZXbVj34Ivz1izDQfPgWgNDFt7dvFZf34xhw4YkVlps/QBbZq2U/o/y/lYCLoGdbtrXCr3qUKwmycloVYCDZw2/ghv/5vsmEXkPyfVanRyflXdg4lSxbwR6bKabCX8kPPOtDUhl+FwwSaTn2aodNWmnh31Np5Zz2wt+5kZngmp6Qq/g0ltIPOeCvau5zcK5beUYvZuI7q/hMLfcDrnVGmTxIYWXp/bH53qdKfv+N8XG2u4/R556vOn7+uqqzO40+z0m/VlPBgBQNH4FycdICtmhXNA+RyGuFTftWehxK8H6Dfg40XnY2G16eTq79dlAq1zxSR2SEzwXvhdPX+mQMlYZf+AVkI3QZ5jie6fQDDfS67ZwWPGrxjCfvYA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 15.06.23 10:04, Michal Hocko wrote: > On Wed 14-06-23 16:04:58, Mike Kravetz wrote: >> On 06/12/23 18:59, David Rientjes wrote: >>> This week's topic will be a technical brainstorming session on HugeTLB >>> convergence with the core MM. This has been discussed most recently in >>> this thread: >>> https://lore.kernel.org/linux-mm/ZIOEDTUBrBg6tepk@casper.infradead.org/T/ >> >> Thank you David for putting this session together! And, thanks to everyone >> who participated. >> >> Following up on linux-mm with most active participants on Cc (sorry if I >> missed someone). If it makes more sense to continue the above thread, >> please move there. >> >> Even though everyone knows that hugetlb is special cased throughout the >> core mm, it came to a head with the proposed introduction of HGM. TBH, >> few people in the core mm community paid much attention to HGM when first >> introduced. A LSF/MM session was then dedicated to the discussion of >> HGM with the outcome being the suggestion to create a new filesystem/driver >> (hugetlb2 if you will) that would satisfy the use cases requiring HGM. >> One thing that was not emphasized at LSF/MM is that there are existing >> hugetlb users experiencing major issues that could be addressed with HGM: >> specifically the issues of memory errors and live migration. That was >> the starting point for recent discussion in the above thread. >> >> I may be wrong, but it appeared the direction of that thread was to >> first try and unify some of the hugetlb and core mm code. Eliminate >> some of the special casing. If hugetlb was less of a special case, then >> perhaps HGM would be more acceptable. That is the impression I (perhaps >> incorrectly) had going into today's session. > > My impression from the discussion yesterday was that the level of > unification would need to be really large and time consuming in order to > be useful for the HGM patchset to be in a more maintainable form. The > final outcome is quite hard to predict at this stage. > >> During today's session, we often discussed what would/could be introduced >> in a hugetlb v2. The idea is that this would be the ideal place for HGM. >> However, people also made the comparisons to cgroup v1 - v2. Such a >> redesign provides the needed 'clean slate' to do things right, but it >> does little for existing users who would be unwilling to quickly move off >> existing hugetlb. >> >> We did spend a good chunk of time on hugetlb/core mm unification and >> removing special casing. In some (most) of these cases, the benefit of >> removing special cases from core mm would result in adding more code to >> hugetlb. For example: proper type'ing so that hugetlb does not treat >> all page table entries as PTEs. Again, I may be wrong but I think >> people were OK with adding more code (and even complexity) to hugetlb >> if it eliminated special casing in the core mm. But, there did not >> seem to be a clear concensus especially with the thought that we may >> need to double hugetlb code to get types right. > > This is primarily your call as a maintainer. If you ask me, hugetlb is > over complicated in its current form already. Regression are not really > seldom when code is added which is a signal we are hitting maintenance > cost walls. This doesn't mean further development is impossible of > course but it is increasingly more costly AFAICS. > >> Unless I missed something, there was no clear direction at the end of this >> session. I was hoping that we could come up with a plan to address the >> issues facing today's hugetlb users. IMO, there seems to be two options: >> 1) Start work on hugetlb v2 with the intention that customers will need >> to move to this to address their issues. >> 2) Incorporate functionality like HGM into existing hugetlb. > I fully agree with all that Michal said. I'm just going to add that I don't see why anyone would look into a hugetlbv2 if we're going to use the motivation of "help existing users" to make hugetlb ever-more complicated and special. "existing users" her even meaning "people use hugetlb for backing VMs. Now they want to get postcopy working with less latency." -- which I consider partially a new use case. So working on adding HGM and concurrently starting a hugetlbv2? I don't think that will happen if we decide on adding HGM and proceeding with that reasoning about existing users. As expressed yesterday, I don't see a fast an clean way to make hugetlb significantly less special (thanks Willy for the list of odd cases). Sure, we can talk about adding pte_t safety, but I don't really see a way forward to unify page table walking code that way -- there are still the (PT) locking, PMD sharing, PTE-cont special cases ... but sure, if anybody wants to work on that, why not. Having that said, like Michal, I acknowledge that it is Mikes call regarding the hugetlb code. I, for my part, will push back on any added core-mm complexity that adds more special casing for hugetlb. Maybe there are easy ways to integrate it nicely and that is not really a concern. Note that while we've been discussing how HGM would already interfere with core-mm, we've not even started discussing how actual MADV_SPLIT/MADV_COLLAPSE/page poisioning ... would affect core-mm and require special-casing for hugetlb. I, for my part, will explore a bit the mapcount topic (as time permits) and see if we can come up at least with a unified mapcount approach (e.g., sub-page mapcount?). But I suspect even figuring that out will take quite a while already ... -- Cheers, David / dhildenb