From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A996C4707B for ; Thu, 18 Jan 2024 14:59:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 18F526B0098; Thu, 18 Jan 2024 09:59:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 141066B0099; Thu, 18 Jan 2024 09:59:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 007876B009A; Thu, 18 Jan 2024 09:59:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id E4D8A6B0098 for ; Thu, 18 Jan 2024 09:59:24 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id AD7B4C0CBE for ; Thu, 18 Jan 2024 14:59:24 +0000 (UTC) X-FDA: 81692740248.30.FA32F29 Received: from mail-ed1-f45.google.com (mail-ed1-f45.google.com [209.85.208.45]) by imf15.hostedemail.com (Postfix) with ESMTP id C8C87A0016 for ; Thu, 18 Jan 2024 14:59:22 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=r2hGAP8N; spf=pass (imf15.hostedemail.com: domain of zokeefe@google.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=zokeefe@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705589962; a=rsa-sha256; cv=none; b=BX0MplaG940bvFEIaYHxm753nXaAIrw9pOhO8ao8c6uTJmvlJS43TBGmR0dVsjuIyfJjZb yiG4CuSwM2kBo74tMr11jSmzbVbg4s1QLUcGhAN0tcdqsbEzWZ8hLg0j+Y4Ai7B67gDxee OK9pOhiQOia2EErFXdZpPif8EoIN7NY= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=r2hGAP8N; spf=pass (imf15.hostedemail.com: domain of zokeefe@google.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=zokeefe@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705589962; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=k6rGCGj38YEdXpsi6au5FMf/nUAYKbTHYROQ02xUMEc=; b=k71Gw9m+iHHh5VMzzA5eEZxxugrY/wkCWnbmZuJmLQjx+VPRn2waLo4bliwlnXjDTk5BmF Yz1phreq5CHdnOB/2ejm8slK7h1sZ9Nng5jwSgSQr3MpNgvDIFX5JKmT+gumsbWRpPpj1h FSbdB3hsQVu4+MfhJVa1BXUCMW5g/ec= Received: by mail-ed1-f45.google.com with SMTP id 4fb4d7f45d1cf-553e36acfbaso8182a12.0 for ; Thu, 18 Jan 2024 06:59:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1705589961; x=1706194761; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=k6rGCGj38YEdXpsi6au5FMf/nUAYKbTHYROQ02xUMEc=; b=r2hGAP8NCu7Omhhot6J265aGzGAqCjdYsDGRsWdWdUjAdxIFJX6s/wdg0KAk35m2+9 baiF+LeWJmR9dGH1UCzwRbHWvLNYUqwniiJ7IR2t1ud/ZhpZi6ugxK+qxBM8Apo6OYnU kQ0z7DeWFNSBN26C0hgditN4Fxa5DpYa8NW4PKLuuG+7X3H8K962IBHOQPpGiPTMgPbF Uq3Mz6uV/GM+F3K0UijP8MLpPyPZ9j/x/tlxD4T0IqhmSBd0vAMv2K+dOkwOtwIHKuAa GZgUl1k7NfTLjYUOgOyMH6WM5QoEstOYV/i2E0xjKw9yB1yMcGiPzvywuWoigdNDphbq pVmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705589961; x=1706194761; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=k6rGCGj38YEdXpsi6au5FMf/nUAYKbTHYROQ02xUMEc=; b=FKinGovrP6+wo2r4gwLoG1SjBMMpnwk3Dmp6wMRkrRiGbHq5HS3mxkjt7ikud4eNlw T/wHIlwjJYX5jKJVLgTXfcvDI3u10fefSkuStHdfIEiwvcQx5l1hjhi3qAfuxNYhBdRN DkdyrbmJu0On35Vl6pAbKAFko4yJ3AFwHnjggqyX5DJMAxyVPt4eAAMilkNMXZPkuXoF gIKCbVnpUttNAboHX2pvS9hKOLbb1MOAz3aoi1EJPhHqMHhy313Bfg0qcpPrrE6BeKwz OuERZv6ydFWDkb+tWVW+1zRlgOeCZ6lQuNTtbaScav2atUv359SwNepxMxNdZSXjHi0G eegw== X-Gm-Message-State: AOJu0YyctSzH+uoCvemDNOxtM9owYhqt49HpiwnQMZP5YBwnDhwAyy/m wxzqcNXEHr/rUEhXapGMDu0nKKlGOiUFbGk3aEMEFufwEHx6x+mIm+zGEUlRV0/mmCDZtdmutR9 c/hiu80Gt/V/uxmfzjTXFU/HAl39nE2sHhZ8F X-Google-Smtp-Source: AGHT+IHtQJZIhYWo20RfWorkKjfr5BHeQ+VsCtRIUwXOCTZ4aOM7H6cuXBtOcWoP3jUZY8xgehvROMlmzSA3U90VP98= X-Received: by 2002:a05:6402:1d91:b0:558:7f0f:aa70 with SMTP id dk17-20020a0564021d9100b005587f0faa70mr92826edb.5.1705589961035; Thu, 18 Jan 2024 06:59:21 -0800 (PST) MIME-Version: 1.0 References: <20240118120347.61817-1-ioworker0@gmail.com> In-Reply-To: From: "Zach O'Keefe" Date: Thu, 18 Jan 2024 06:58:42 -0800 Message-ID: Subject: Re: [PATCH v2 1/1] mm/madvise: add MADV_F_COLLAPSE_LIGHT to process_madvise() To: Michal Hocko Cc: Lance Yang , akpm@linux-foundation.org, david@redhat.com, songmuchun@bytedance.com, shy828301@gmail.com, peterx@redhat.com, mknyszek@google.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: C8C87A0016 X-Stat-Signature: t1y3k48qzpmpqinbrmiinxymw5d7hrnf X-Rspam-User: X-HE-Tag: 1705589962-537279 X-HE-Meta: U2FsdGVkX1+acDVEFTWivIG2rgPpxJTk3RZeEd83o6j8N81nO177wm0JjuSHLVBk+IY6vm6G+9u4T7q4DcpCja2uTXr3z5gu2UxCUgVy9/oAR1J7Gt38gMfERI4xxHWUO8uWS2jNAS1HLsgZvw4DmOw97AA2xclQlPmGl/eOyJk4vpDdZT0E1sqheowPdzCd5lqFr9bBb/EU+OVw6xsIwyRDMkCJLiMfJisRUOq7szGlyp2ZuzengT76QlCb8mWXRhl6PDaurNiPD4r5CuQHhEy1mj3iHuGKBgomxhKxYfy/M7sUgko5GUq4yQL0w+4uM+bj5x6FZwidz9wUj6SMuUa8r3Stl2z2qeP+zl+IfwSQxKcqN5Ha4gAA+O8LjVk0D0tej8ydt9JyDgbk2DQUAv8eZoIowpnTu/sFdOrjsfKGstN5Nx6WlaSDF6gPi5fw9024OPznIst3ehEhkf3GHCAR73xUy/cel1zjr21s4cZC6IXO+CkWQw6qyOlRZY7fKOl5oO40djmOw76jySxgZ/wJsxcYdYu8FVWnWln1LKa4MOIrax/ETaKIn8BGseD3mZ4fmQOMXQGLv7nxrL9dRh0bzfc9UXLk9jte3kfxCsjhfpuKGDFMzOKMvLvEmGuSNzby2D1BGxSux4iefgegglJMkJmSa46BWj1h6ZRJamVeK9dNBCq50pEsst9FSWBeM1LeRRVR6JLNKid3H5uUo2YdFWDZxVMQFHqP/I/ID90ECjpE6E81/ZibWUuhXKsZ2qGGe3FiU4cbFPZNeglKSZdoLimCuKlp6oht4Xs5qfRaseoenrSzHes4FqRi2k76sI9U0pBICZw45phZlJn/eSOsvWsla1HVVUPSsICty77Mmfz9ditz71LbFvK2nseOJPYM+JLU7gPbZjqnONJL+2IjuspVTo5DKNTHGb1p3aguLF5kCq3TZZGxyKzRjQU+r7zlql4Wf0t1ZBWw9+P a16/3qer z8raAytUN4MN5rsL3/ppSwyK/AqLMZvPRPDrOKy27wovDzHwbum7bwemsZEYY8zKXEkH4+XwVnu+6cuVs5bIBUbcDrFYAaZN+SRISbgRsC0i+XSp86Q/IPy6cpq5a8mwHFSmgiVF6STOFMFpjKclfjrCxnER26BA911AFPXR6UbrFgkTk3O+tzBTS4rmIADFVSH1NSxyXczAql4rwc6HqQpn8O3hMkf+/q5j2bQ++tQSve/ckGTkk+xxsQJnkmhg1MOJdB8KmPLXw4jRI06ZVfELwpI88CTPV0aIrdHpadivhBSPkq+J5KirucmFUAlrqzd38ZZ6D74I+qBFGW8mMW71jxmwoOeyC0NwJrQGdetFRwYTkPjHfWeDusxNRdcK8Cs7pKiE8bByhSRFAXbP2VNHk/qPdkQVHvxegfWWR7dDQCA8xG7DUYt0jcXkkheqBJLlk X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jan 18, 2024 at 5:43=E2=80=AFAM Michal Hocko wrot= e: > > Dang, forgot to cc linux-api... > > On Thu 18-01-24 14:40:19, Michal Hocko wrote: > > On Thu 18-01-24 20:03:46, Lance Yang wrote: > > [...] > > > > before we discuss the semantic, let's focus on the usecase. > > > > > Use Cases > > > > > > An immediate user of this new functionality is the Go runtime heap al= locator > > > that manages memory in hugepage-sized chunks. In the past, whether it= was a > > > newly allocated chunk through mmap() or a reused chunk released by > > > madvise(MADV_DONTNEED), the allocator attempted to eagerly back memor= y with > > > huge pages using madvise(MADV_HUGEPAGE)[2] and madvise(MADV_COLLAPSE)= [3] > > > respectively. However, both approaches resulted in performance issues= ; for > > > both scenarios, there could be entries into direct reclaim and/or com= paction, > > > leading to unpredictable stalls[4]. Now, the allocator can confidentl= y use > > > process_madvise(MADV_F_COLLAPSE_LIGHT) to attempt the allocation of h= uge pages. Aside: The thought was a MADV_F_COLLAPSE_LIGHT _flag_; so it'd be process_madvise(..., MADV_COLLAPSE, MADV_F_COLLAPSE_LIGHT) > > IIUC the primary reason is the cost of the huge page allocation which > > can be really high if the memory is heavily fragmented and it is called > > synchronously from the process directly, correct? Can that be worked > > around by process_madvise and performing the operation from a different > > context? Are there any other reasons to have a different mode? > > > > I mean I can think of a more relaxed (opportunistic) MADV_COLLAPSE - > > e.g. non blocking one to make sure that the caller doesn't really block > > on resource contention (be it locks or memory availability) because tha= t > > matches our non-blocking interface in other areas but having a LIGHT > > operation sounds really vague and the exact semantic would be > > implementation specific and might change over time. Non-blocking has a > > clear semantic but it is not really clear whether that is what you > > really need/want. IIUC, usecase from Go is unbounded latency due to sync compaction in a context where the latency is unacceptable. Working w/ them to understand how things can be improved -- it's possible the changes can occur entirely on their side, w/o any additional kernel support. The non-blocking case awkwardly sits between MADV_COLLAPSE today, and khugepaged; esp when common case is that the allocation can probably be satisfied in fast path. The suggestion for something like "LIGHT" was intentionally vague because it could allow for other optimizations / changes down the line, as you point out. I think that might be a win, vs tying to a specific optimization (e.g. like a MADV_F_COLLAPSE_NODEFRAG). But I could be alone on that front, given the design of /sys/kernel/mm/transparent_hugepage. But circling back, I agree w/ you that the first order of business is to iron out a real usecase. As of right now, it's not clear something like this is required or helpful. Thanks, Zach > > > [1] https://github.com/torvalds/linux/commit/7d8faaf155454f8798ec5640= 4faca29a82689c77 > > > [2] https://github.com/golang/go/commit/8fa9e3beee8b0e6baa73337409961= 81268b60a3a > > > [3] https://github.com/golang/go/commit/9f9bb26880388c5bead158e9eca3b= e4b3a9bd2af > > > [4] https://github.com/golang/go/issues/63334 > > > > > > [v1] https://lore.kernel.org/lkml/20240117050217.43610-1-ioworker0@gm= ail.com/ > > -- > > Michal Hocko > > SUSE Labs > > -- > Michal Hocko > SUSE Labs