From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2AFF0E7F141 for ; Tue, 26 Sep 2023 22:26:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 98D628D005E; Tue, 26 Sep 2023 18:26:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 93D0F8D0002; Tue, 26 Sep 2023 18:26:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 805458D005E; Tue, 26 Sep 2023 18:26:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 6CC578D0002 for ; Tue, 26 Sep 2023 18:26:37 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 445B7B40A1 for ; Tue, 26 Sep 2023 22:26:37 +0000 (UTC) X-FDA: 81280184034.06.60FE541 Received: from mail-yw1-f174.google.com (mail-yw1-f174.google.com [209.85.128.174]) by imf26.hostedemail.com (Postfix) with ESMTP id 7D5EA140015 for ; Tue, 26 Sep 2023 22:26:35 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=eiVoOvkL; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf26.hostedemail.com: domain of hughd@google.com designates 209.85.128.174 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695767195; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QYbFIlMORbziITyrsaSLyTK41iB8rjgVNurdwEqb6c0=; b=J1DVsLWt2czdDOt6oCDqvkBr10JVA2W4ndiNkMRjr5P2kEENgGshn9q7ttCDoXj2cd5qnw kt+xq8zIPJ0JcpH0VqiFEu25+LCNBcCrh3N07VuM8Bm+FurpZ+AsNNtMtzC+iteszxbj6w uAGJhKYQCVbyl+FRRlVYpZ8qA9MtFNo= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=eiVoOvkL; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf26.hostedemail.com: domain of hughd@google.com designates 209.85.128.174 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695767195; a=rsa-sha256; cv=none; b=xzh8DMUWJ6L3spAgi6snay+2k51t/QJhtqMoUtbWYkddW07GpCv59lmcg1UFBSJELmFt15 /u3TiKRJTJn99WZ6cZQrwtVzgMUw+AORspMG3q64G7i9epPIbc0R0P2/3Z1GtUa/kKnlgL G/C8OpTE90mIKKeNj8BUfL7cr32wFwk= Received: by mail-yw1-f174.google.com with SMTP id 00721157ae682-59b5484fbe6so121759277b3.1 for ; Tue, 26 Sep 2023 15:26:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695767194; x=1696371994; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=QYbFIlMORbziITyrsaSLyTK41iB8rjgVNurdwEqb6c0=; b=eiVoOvkLgpng7WctBOh7f/qAlHwdnyrvkA5L3waikhra3rJOUlNRCiACq4pmGFAqgI klngKcB6RRqwMeUZkSVrmwfjTkv5nYwICuBqehIjxurz29AujNHjxNXV9g+eMyGViKEP 30IToeoVJl8R03+FrqUP4cs1X3FgnM1mv9jpolfa+fiJTJLiFevw0+GllrX7xNJzDLi3 d9xmLMtKK1w3W972illg5owX8Dm5JSNrWHk0Vg+tBviE2lb6r8zVFQPKrjhFcARTGlTr cEReX0DNCgxy2a3VqknF2vVS1iyAfcBLXIuuGwqtpMnAtb6ZR/ZSIUz2opPhodKENbZ3 2z/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695767194; x=1696371994; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=QYbFIlMORbziITyrsaSLyTK41iB8rjgVNurdwEqb6c0=; b=mYc99LQmtcUSXNNMWuENF5HBm+6kLaAbahRCqa6/q/YBrykz04dFsNiDCAWoO8je02 LQaRmZDd/P3fSFuRO3DUSSpvfZe5J73X1qjl1PrBUI2OLd1m5bvajiWtsjAZNUQbtyNr 1R8pD0GLiK2i4ZTjr8CDlp0t5okyUtbnEo8TxzaKR5xol4Z3/Mt2sC9bsQC0lkc2qkbS sYshqieToksZ6OtoS31jvtVERl76JlyhqiHuTz0zXvbzqbdAQjk/S1jipGW1EPpcLQXT pms4cucQDErq8t9IhWqmMjfmqG6KBrI/5O1xF53AsrwSvFrbZ1zHZuMDRs02gTzrqaIh ruSw== X-Gm-Message-State: AOJu0Yx2+2gLTl4gfslQl2D62LTnHJzV8fQs9MAt4rPeRhuU7OQkm8Wf dIj5B8g0IqiHAgC6fRigN/w63Q== X-Google-Smtp-Source: AGHT+IF9jPJKKbPbLozWLtjId6Ao3wXXOxJwCf/sIy78tp0PT65c4W1ooGQqDIcwlOgnemDGDZ/qxQ== X-Received: by 2002:a81:484f:0:b0:59f:4c3a:711d with SMTP id v76-20020a81484f000000b0059f4c3a711dmr296396ywa.11.1695767194061; Tue, 26 Sep 2023 15:26:34 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id r188-20020a0dcfc5000000b00565271801b6sm3271891ywd.59.2023.09.26.15.26.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Sep 2023 15:26:33 -0700 (PDT) Date: Tue, 26 Sep 2023 15:26:30 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andi Kleen cc: Hugh Dickins , Andrew Morton , Christoph Lameter , Matthew Wilcox , Mike Kravetz , David Hildenbrand , Suren Baghdasaryan , Yang Shi , Sidhartha Kumar , Vishal Moola , Kefeng Wang , Greg Kroah-Hartman , Tejun Heo , Mel Gorman , Michal Hocko , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 01/12] hugetlbfs: drop shared NUMA mempolicy pretence In-Reply-To: Message-ID: <45aa39c0-9b14-3e5-a81a-70a6403a8432@google.com> References: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> <47a562a-6998-4dc6-5df-3834d2f2f411@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Queue-Id: 7D5EA140015 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 8ogtckx7e4t4kxcqgu7pkcnxpysra189 X-HE-Tag: 1695767195-991022 X-HE-Meta: U2FsdGVkX1+PxdGJTCwJaRfT0A6jSOWQpoYGjLD6TGqIfv9m/eANkb68vLCWmIAa1T1lGQFrKfxLQeajN2AFwD3ZwiI8DviQbELg0IlylGLYMq6eY6YiF1WYStrFXkKGRsmG0pCvoQxucYAL3ZjUrg2z48dsu/N+tvTR/F0u70+JX2YIP1OaGXGrTmOMD0tpHk3Q+1LHjAnl4tAldRJT0uwc4q52L6J+6wg23STyhPODVWkp4UcUPT3FG4DEXW3yDv51xQN4EbxQz25SBTkc5tZcsJFBe5yeWa6HJfd7Wjhh1g59sXX819QiNnf6/T0OFFWnp/EA/y6GFEAbhpkXVWgJTkBDdQu1iuJ76cEqbSnJLJBxBV4v+Lnx9OqJTGcP55xfwYXp1R0Oea81uzxEBYrRfkEDNyL/D0KZ08uxMfdi3tUZFv56i2Y5gF+nLHJUgBlJeFT3A70RXuyHXXb2Gsf0fCCmfxeVcfLl0+m8dRuaBRqU1a9+y8SPzBIEq65W7vKDBWCAKFv1K9NyxRlQdxl04oxcK1AZncoFXGbEnfWrG9/7qgic83T4eQikwXm+K1j7Hx13uXA4kf1bFctHl2+uvpcBjPdyqjE9LDtzhL/lM0EkhgbUOYuD75ONwMIqZkvIwOfZzIFDoUoJ5+HAzuAOjR1mNHfan09Vad0Qa9jiFCl2t7RVvaYQpBlIt3P2UIBMdUIbrXmE8gXmsBSIgCORa6SU1tvKgTV5N8QQGssH0g/My++lOttO2w2BKF7a0wdGgcYuzWwB2i0CCOP5sn8v2cNGVj3wJSE7DjEv//AWaW0tdnrlKTvcr/k5OinXQFYdGTt4C5P+XEFCfGCkuj+4LAU8IgsnNJ941OA2z9KClRYH+IHA/CuxjrGVXMBxE1zslw6Nx3ZZ+W3UQDImrooVvpvdu9JXJgpozYvnNSwf/MjbVQFYpwU8ofpiSHbd+ITW0u/01Hq/zHPsD2k F+1jqWKq 2Ppzl89Gl1zgLs+YUnaMoDBVFWqbZ3+3Z1bfai6nceP2VAuhC5/Z3qW+xrGbFXsO0Bh7zJ7NOMgTRzEKhWSzuQVJ5m1sPAlyumRhgNg4pGfGmIyQa4m78LQmpkButpJXHeg8ogkjBMOZVxXl0ssNwKNoqPfOz2wecSuk4OSwrAmTO3V2zdZJMbckQ2qDfqiI275X264VvRPFlx6GtYDpuMnyWjn5VmJ+OmFU8lZnug8MJPFQv9q/BwvTuxYUEOIXcVQpTT8Q7tJz5b03FTFCFasaWPDhggFIGw/Ywfs5HrI8bYnJVG8FJhOBDdu/FsoQlW5ucAwys6vYJkVaLJ6RGVwSR9Bazgzhr3RwOHy+s2KR1Mf7KKZ2xvJjI8BV6IOc0LKlQj4g8zscDZTPyiPrrVfYZcvp6VuB942X6kCGTfsIby/v44W7sRHanjA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 25 Sep 2023, Andi Kleen wrote: > On Mon, Sep 25, 2023 at 01:21:10AM -0700, Hugh Dickins wrote: > > hugetlbfs_fallocate() goes through the motions of pasting a shared NUMA > > mempolicy onto its pseudo-vma, but how could there ever be a shared NUMA > > mempolicy for this file? hugetlb_vm_ops has never offered a set_policy > > method, and hugetlbfs_parse_param() has never supported any mpol options > > for a mount-wide default policy. > > > > It's just an illusion: clean it away so as not to confuse others, giving > > us more freedom to adjust shmem's set_policy/get_policy implementation. > > But hugetlbfs_inode_info is still required, just to accommodate seals. > > > > Yes, shared NUMA mempolicy support could be added to hugetlbfs, with a > > set_policy method and/or mpol mount option (Andi's first posting did > > include an admitted-unsatisfactory hugetlb_set_policy()); but it seems > > that nobody has bothered to add that in the nineteen years since v2.6.7 > > made it possible, and there is at least one company that has invested > > enough into hugetlbfs, that I guess they have learnt well enough how to > > manage its NUMA, without needing shared mempolicy. > > TBH i'm not sure people in general rely on shared mempolicy. The > original use case for it was to modify the numa policy of non anonymous > shared memory files without modifying the program (e.g. Oracle > database's shared memory segments) Ah, "without modifying the program": that makes a lot of sense, but I had never thought of it that way - I just saw it as the right way to manage the shared object (though an outlier, since we have so many other msyscall()s which do not manage the underlying shared object in this way). > > But I don't think that particular usage model ever got any real > traction: at leas I haven't seen any real usage of it outside my tests. If the hugetlbfs support had actually gone in, I imagine Oracle would have managed it that way; but they seem to have survived well without. > > I suspect people either are fine with just process policy or modify the > program, in which case it's not a big burden to modify every user, > so process policy or vma based mbind policy works fine. > > Maybe it would be an interesting experiment to disable it everywhere > with some flag and see if anybody complains. > > On the other hand it might be Hyrum'ed by now. This is interesting info, Andi, thank you for providing. I'm torn. shmem and mempolicy (and struct vm_operations_struct) would certainly be simpler without shared mempolicy: but I frankly don't have the time and courage to experiment with deprecating it now; and it is fundamentally right that such policy should be kept with the object, not with its mappings. I've assumed for years that it has to stay. Hugh