From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA98DC433EF for ; Wed, 24 Nov 2021 23:12:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2CA076B0074; Wed, 24 Nov 2021 18:11:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 27A9F6B0075; Wed, 24 Nov 2021 18:11:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 142BD6B007B; Wed, 24 Nov 2021 18:11:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0033.hostedemail.com [216.40.44.33]) by kanga.kvack.org (Postfix) with ESMTP id 015B66B0074 for ; Wed, 24 Nov 2021 18:11:45 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id B8EC61850A0A9 for ; Wed, 24 Nov 2021 23:11:35 +0000 (UTC) X-FDA: 78845372508.27.97AF0B3 Received: from mail-qv1-f48.google.com (mail-qv1-f48.google.com [209.85.219.48]) by imf26.hostedemail.com (Postfix) with ESMTP id 63DE220019C8 for ; Wed, 24 Nov 2021 23:11:34 +0000 (UTC) Received: by mail-qv1-f48.google.com with SMTP id g9so866971qvd.2 for ; Wed, 24 Nov 2021 15:11:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=j4DAkTLRTUTn6M8l4sKe/Kwq0t663JKEGap3Rg8k/6A=; b=B31+pIz/iV9fCE8l7Mvr9RRg9S4GUC7soBlMgxNXXxYvURePVUwIyJnxpLiDKb+oSb lxlp+J2bLfG9s1gnIxtjhMUCf6Z0GNSTICrpKZtYBPyqKr1C3Fdal6kb3iwUpamRHoXz GHdq2YmqycyjDdWUZ3sgUCUF3z61+VhXkTVkgAnSAgH++MCGjQmFosRbOO0chlfKkNH8 fOMrZeKC/XlkO+GqjornkApRcW0V06/KYahFW7EXF8FlPGhtYBvFK1DWUzB13Zb4GnZ0 3cvkhAbivwMRA64jWd1z0AsMlmzGUlP8vg/iH+Epq+3Mfx06gIbCZDEQ09cPjjvpKWOb dhkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=j4DAkTLRTUTn6M8l4sKe/Kwq0t663JKEGap3Rg8k/6A=; b=dB3wmH0JKyGyLShWVMEbgLAotWY0F2kRIrnpFjikZgFHr84IDHx7+a0N0hUVueA1h5 GaP5tcF6c9YixIyvjp/1zVyT9s11bSx4ds0n3T5tChcs+rheiPPh5UtHM1XwYvTIfsKK YaJf2yfRPvMNn4IS/e0o+v2F5sr3iJtH8DLsfQXPOlTk9n+Ar2Q9JZ6P/Tod96201PeW gaZr4BWrqRiM1AVTUGKTi4LfDVGmmBuAh1kem/5iH/4cJgisWexiaYL5oPnLRSAb2UD1 odinKZMEkfyno5iiiobY6oCNB/rXxymLBxjSKwF0M9AiTnNw7AyCWh9iZdjbsXOy3rmw SI9w== X-Gm-Message-State: AOAM533NGB5pDUfAM28VZAifNxb5wLlJEPhnrGB9RJR2+aZ6TAGTFUXq HYaDiTIQpguxRRKsg/W7s0HSKA== X-Google-Smtp-Source: ABdhPJyJEaleOzcdEx/cNWyWjJvi7icetXaIFP/UFdi2V0cpZc8qe2bqd+h8H9Mz/Yok6T+eDloKYQ== X-Received: by 2002:a0c:8031:: with SMTP id 46mr99661qva.126.1637795494565; Wed, 24 Nov 2021 15:11:34 -0800 (PST) Received: from ziepe.ca (hlfxns017vw-142-162-113-129.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.162.113.129]) by smtp.gmail.com with ESMTPSA id h2sm606488qkn.136.2021.11.24.15.11.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Nov 2021 15:11:33 -0800 (PST) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1mq1QL-001RrF-AF; Wed, 24 Nov 2021 19:11:33 -0400 Date: Wed, 24 Nov 2021 19:11:33 -0400 From: Jason Gunthorpe To: David Hildenbrand Cc: Vlastimil Babka , Jens Axboe , Andrew Dona-Couch , Andrew Morton , Drew DeVault , Ammar Faizi , linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, io_uring Mailing List , Pavel Begunkov , linux-mm@kvack.org Subject: Re: [PATCH] Increase default MLOCK_LIMIT to 8 MiB Message-ID: <20211124231133.GM5112@ziepe.ca> References: <20211124132353.GG5112@ziepe.ca> <20211124132842.GH5112@ziepe.ca> <20211124134812.GI5112@ziepe.ca> <2cdbebb9-4c57-7839-71ab-166cae168c74@redhat.com> <20211124153405.GJ5112@ziepe.ca> <63294e63-cf82-1f59-5ea8-e996662e6393@redhat.com> <20211124183544.GL5112@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 63DE220019C8 X-Stat-Signature: d3tr14je9yd8wnhwrowyxmmx4chu14hn Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=ziepe.ca header.s=google header.b="B31+pIz/"; spf=pass (imf26.hostedemail.com: domain of jgg@ziepe.ca designates 209.85.219.48 as permitted sender) smtp.mailfrom=jgg@ziepe.ca; dmarc=none X-HE-Tag: 1637795494-611848 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Nov 24, 2021 at 08:09:42PM +0100, David Hildenbrand wrote: > That would be giving up on compound pages (hugetlbfs, THP, ...) on any > current Linux system that does not use ZONE_MOVABLE -- which is not > something I am not willing to buy into, just like our customers ;) So we have ZONE_MOVABLE but users won't use it? Then why is the solution to push the same kinds of restrictions as ZONE_MOVABLE on to ZONE_NORMAL? > See my other mail, the upstream version of my reproducer essentially > shows what FOLL_LONGTERM is currently doing wrong with pageblocks. And > at least to me that's an interesting insight :) Hmm. To your reproducer it would be nice if we could cgroup control the # of page blocks a cgroup has pinned. Focusing on # pages pinned is clearly the wrong metric, I suggested the whole compound earlier, but your point about the entire page block being ruined makes sense too. It means pinned pages will have be migrated to already ruined page blocks the cgroup owns, which is a more controlled version of the FOLL_LONGTERM migration you have been thinking about. This would effectively limit the fragmentation a hostile process group can create. If we further treated unmovable cgroup charged kernel allocations as 'pinned' and routed them to the pinned page blocks it start to look really interesting. Kill the cgroup, get all your THPs back? Fragmentation cannot extend past the cgroup? ie there are lots of batch workloads that could be interesting there - wrap the batch in a cgroup, run it, then kill everything and since the cgroup gives some lifetime clustering to the allocator you get a lot less fragmentation when the batch is finished, so the next batch gets more THPs, etc. There is also sort of an interesting optimization opportunity - many FOLL_LONGTERM users would be happy to spend more time pinning to get nice contiguous memory ranges. Might help convince people that the extra pin time for migrations is worthwhile. > > Something like io_ring is registering a bulk amount of memory and then > > doing some potentially long operations against it. > > The individual operations it performs are comparable to O_DIRECT I think Yes, and O_DIRECT can take 10s's of seconds in troubled cases with IO timeouts and things. Plus io_uring is worse as the buffer is potentially shared by many in fight ops and you'd have to block new ops of the buffer and flush all running ops before any mapping change can happen, all while holding up a mmu notifier. Not only is it bad for mm subsystem operations, but would significantly harm io_uring performance if a migration hits. So, I really don't like abusing mmu notifiers for stuff like this. I didn't like it in virtio either :) Jason