From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC272D49789 for ; Sun, 1 Dec 2024 20:57:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1084A6B0082; Sun, 1 Dec 2024 15:57:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 090CF6B0083; Sun, 1 Dec 2024 15:57:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E4D386B0085; Sun, 1 Dec 2024 15:57:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id C37C96B0082 for ; Sun, 1 Dec 2024 15:57:13 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 47A0A812C6 for ; Sun, 1 Dec 2024 20:57:13 +0000 (UTC) X-FDA: 82847599968.23.5965931 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf14.hostedemail.com (Postfix) with ESMTP id B2961100006 for ; Sun, 1 Dec 2024 20:56:58 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="Hkeapk/3"; spf=none (imf14.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733086621; a=rsa-sha256; cv=none; b=p7y+03RpvA58ssBmjWf9WjThNZuwZGpmsIquEACaclI7BRedME5ypyvIYtRmZQw/JAoMoC IHj/ulKJiaAJSBx2QglHRnXXtFkSpmb6KtVRruYrA8rvMe42AfD+wo+PlSCiqxV2Kyzbid SUx+pPY6ygcdEye7viI+8FgaO2sGtzg= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="Hkeapk/3"; spf=none (imf14.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733086621; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fU9Jle0znWkccUyLHseIvb02ztydEpbeWGhQiEbCi6Q=; b=61bA85WxU7EpWgOMVDPTvPsFrRGrIi9OkYDSL6i2zLtNH938wRf4NMDVuVEvnIn4v5O7hZ EpJ+buE5yPuITlO7HhX1Y9g1BqUCIC+kSPHWaa4Xgzm9b4eqY8vUPCQdrr5ytuHTrkHO76 fBfUL4MU13LXkT8IxU3jd4NhVRO+Spg= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=fU9Jle0znWkccUyLHseIvb02ztydEpbeWGhQiEbCi6Q=; b=Hkeapk/3Zz6TWA7PPf44Eiu62l L7qqia2zZabrdL0lTnN2y5E9v0QTCCoEKOTiZ+d0ACYstIRbTq/0v5pBSz0bLrfryhkKZyIxRRc9Q azlxB1RJRaUlCjco+ULoMg0K9NonQbTgJEhAwM8pDVu21dh9Z5m6/f6laezS4NhPT28Om277MqubD 6vt1iEWB93P/R57kgV+1sMxFQMD3oUi8ERLY/Uk+stxmMRbiQYZdqLwZSHRnMzcOQWoCnE5G+zeyh QD+KMH8TOh/r+dKP+AZSjGnrDAVBD5D3p1uxd7nLFp5Nkt9t5qTbvN6f0jB0sHPTRBzdkdg18oAmJ JnsraPjw==; Received: from willy by casper.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tHr07-000000076hh-0zxX; Sun, 01 Dec 2024 20:57:07 +0000 Date: Sun, 1 Dec 2024 20:57:07 +0000 From: Matthew Wilcox To: Dmitry Dolgov <9erthalion6@gmail.com> Cc: linux-mm@kvack.org Subject: Re: [QUESTION] Resizing shared mapping without clashing with others Message-ID: References: <3kpxpd3dbjgg6epasi2554c4qyils4t3cm2pjnyzer7gkyoaxl@khhdxjiggyhp> <20241201184410.gl2huwqkbdwm6jvj@erthalion.local> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241201184410.gl2huwqkbdwm6jvj@erthalion.local> X-Rspamd-Queue-Id: B2961100006 X-Stat-Signature: t5qmnfn774zcmb4crj7kjet8pie4rdun X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1733086618-101288 X-HE-Meta: U2FsdGVkX18BdLpntTv4AcG4kToZDHHbryiL8gTWqWcvxOVqq1Gl5tScpv7Hb+0NUfiROsVXfwpcDE9KqvjkXdxJa4s9IKcM4mpGEg7U8+calAcGU6zHpfYh3SghzYq/JAxsvXHpxaaWya9wi2sofNii3n6uXpQdEMWq2cHZ/foe6p93b0ny6fopr3FGXTW4phobfrWr+aXmeqfvi2ZDPznwNGH8cCVTvbsyMwGRarNAl7G7+gkWZXapPT1Wkv+kLn4founV0K3RZ6gUAS5mbbGlFeBa7Kg8317UK/sX1pmulfZWuV15aa+Ds1o8P7qRp/Jh6PFkLlSDUBje8Kjv0S6nvA+FDMBo4ljuuFS/oEgVZ4K1aBNkGAIFgBLaMhlbHYrT2jjZitBlJLZ+n7gZgy62g3AZ0RBSS87HjnJTsGX7HFqHK4bIzI4jYtulDdgkECbk4A0tGIt0RaZsrT+n26Z2fJMJq+UTqbsEflNNp4geBSwOiHyAImAwxcalSA5RXZfYVmJjFm0CIRD4nb6YYkbl03cR0iImmECuibo8dQhESv5TgPc4QJTkHbmq62BrKhk6lCo2K4/h1kP5yw4D5wsqkUO/kx6vTmV/7TQesEsitf5m1qmkyJk/TrONakynP6W9hLAnvCjq7KKfLZZ7R7F7wSSBKUcGh1rNrWmPXU0zLC7B5Uv6tYevVebjmblcfnnTUvjb53QwDgHRFm15XHNqd/1MRTgqgxbFNCWdpLrh1I7Pm9YysvY4A01JuvoujD6WyaAT6MUrAOyg2Q8TKJuuUu15eMABBx2yi1VLIgPiFTftrpo3m2VoPjDdWPtaykgQYXI/t41adXZVZ9udyZjLSBnsUoi7Kxl/x27tapEJmIzI4KBi9pIbXcAahaReJc9vhKP0y6P6LWxSrRJtuN52/HqNiHXaDCfo1Sz1Aumod8O8+CwI0hrAQAaUInKVmkZrNAbauTxtd1krKHU gCqunCY3 VsJ5ggrYJfHnwX0Unx4Mrq4/8UqMNhdIoEh2N9Wh0d8Ij3eJusKjf1SIZHyFb6Y/8CJ5v2sNO27U+gsLtBOpYMNXnPo3ZQiVX0IZLtpxZnRNFTXLIBtJzYDQIZsV8Zmu7g5wantLfMzpQpPFCT1nNgLrKBD5PoCpv/Dr3aBj/3KP+beLqcUuT5qmo0tlx35nNxkzEvUtfKPTYT3HJApRgoEGD8MvmZN6rOp3m X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Dec 01, 2024 at 07:44:10PM +0100, Dmitry Dolgov wrote: > > On Sun, Dec 01, 2024 at 11:55:37AM +0000, Matthew Wilcox wrote: > > On Sat, Nov 30, 2024 at 05:24:13PM +0100, Dmitry Dolgov wrote: > > > Hi, > > > > > > While working on PostgreSQL [1] we've stumbled upon a question regarding > > > resizing of shared mappings without conflicting with any other possible > > > mappings. Before making any wrong conclusions, I would love to get some > > > consultation from kernel folks on that topic. > > > > > > To put it into a context, PostgreSQL uses anonymous shared memory > > > mapping as a buffer cache for data. The mapping size is configured at > > > the start, and could not be changed without a restart. Now, we would > > > like to make it more flexible and allow to change it at runtime, ideally > > > without changing already used addresses and copying stuff back and > > > forth. > > > > > > The idea is to place the shared mapping at a specified address (with > > > MAP_FIXED if needed) with a gap, then use mremap to resize it into the > > > gap. This approach has an open question -- how to make sure there will > > > be no other mapping created withing the same address space, where we > > > want to expand the shared mapping? E.g. the shared mapping was created, > > > then large memory allocation caused another mapping to be created close > > > to it, so that expanding is not possible. > > > > I think there's a very straightforward answer, which is to mmap() it to > > the larger size to begin with. If, say, you create a file of 1GB, you > > can mmap() the first 100GB of that file. If you access the last 99GB of > > the mapping, you'll get SIGBUS, but you can truncate() the file larger > > and gain access to the new memory that way. Does that work for you? > > > > Or if you're doing MAP_ANON | MAP_SHARED, just don't access the last > > 99GB until your configuration changes. Memory is allocated on demand, > > so you won't be charged for it until you use it. > > Right, mapping with the larger size than needed is one option we're > considering. But there are few arguments against that: > > * Folks are wary of unnecessary large shared mappings, since in the past > there were issues with OOM killer making unfavorable to postgres > decisions because of that. It might have changed over time, but to > confirm that will require some investigation. > > * It can cause memory accounting problems. E.g. if we use hugetlb inside > a cgroup with reservation limits set (something like > hugetlb.2MB.rsvd.limit_in_bytes), then such mmap() will be counted > against the limit, even though the memory wasn't allocated -- meaning > that we claim some resource without using it. If it does turn out to be a problem, you can use a similar trick to how ld.so maps binaries: mmap(NULL, 2055640, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f221a758000 mmap(0x7f221a780000, 1462272, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x28000) = 0x7f221a780000 mmap(0x7f221a8e5000, 352256, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18d000) = 0x7f221a8e5000 mmap(0x7f221a93b000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e2000) = 0x7f221a93b000 mmap(0x7f221a941000, 52696, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f221a941000 Although you wouldn't want to do consecutive mmaps, you'd want to use mremap() with MREMAP_FIXED -- not to change new_address, but to expand length over the initial reserving-space mapping.