From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56274C47422 for ; Fri, 19 Jan 2024 01:46:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BF3076B007E; Thu, 18 Jan 2024 20:46:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BA3D86B0082; Thu, 18 Jan 2024 20:46:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A6AC96B0085; Thu, 18 Jan 2024 20:46:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9225F6B007E for ; Thu, 18 Jan 2024 20:46:45 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 64D80A021A for ; Fri, 19 Jan 2024 01:46:45 +0000 (UTC) X-FDA: 81694371570.26.228DDBD Received: from mail-yb1-f174.google.com (mail-yb1-f174.google.com [209.85.219.174]) by imf02.hostedemail.com (Postfix) with ESMTP id 992CD80009 for ; Fri, 19 Jan 2024 01:46:43 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=hmraOb8c; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.219.174 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705628803; a=rsa-sha256; cv=none; b=a5uTUPhHsQSj+ZuI3Qlu8UJkrBwSwWvxQmKoFdJyxfUqNpqqbzChPo08VSQiNST1ACvrCC GEcLTovjT4lTBXU1pwVXH63hZBI0PO+hpt7mV4oNf5qHZKyIFxpOxgS65cRkGMHzeyV86s CWYN5ce+tJ+cS+WUkAb8t8N/MSzzaPI= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=hmraOb8c; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.219.174 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705628803; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AueCKfkK92AxG/6IJd7YrY+xmTeP8e01vZBo030LmM8=; b=N9mw61TJNd5KaVLCM40KR2KAF7YcBO/NTi1M9rg6NoMrQXpCsyz1+QHwH53FMUQ1DJ15kG vA1wOcp0dclBTEuECTLFHVmjDoYCeG8SGgyisFjC5+pJGcPCxHwBkWPIqIkHuCgfrkDQ8j U/qn/RFMz42Jbe+U8gYV7GZaylbDz2M= Received: by mail-yb1-f174.google.com with SMTP id 3f1490d57ef6-dc236729a2bso324775276.0 for ; Thu, 18 Jan 2024 17:46:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705628802; x=1706233602; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=AueCKfkK92AxG/6IJd7YrY+xmTeP8e01vZBo030LmM8=; b=hmraOb8cZLBw1rCVS5ORZr0WueYu9yA9A/UMu7ve6JjV4XDVnpNN+Oz1BBZGcFemxD Fk+7v42ofAqUkpkB0Vi7YnOmSZOVDXC7xip905yDzrMd5vw+Y2Z/gxq+6Roleaw510OI 2ymzcgCzgfRfAt9z2IRiBsVmO/d6xLKjpdl2XGF3s3ZbVtGu5ACm+/zJ94fqYBZ5YyKt fe9VN+D6+yEA46OWKdKuZ6/T4+oKwgJUGEEHbVfuO55t0mv/VBYjFk/UyHqekT/v4FF8 4RJO0Sch3OfIMpOpiCdxDyjWPsI/hsXmSxw/IbERK156RqNTL8HHpnvTgMPJprIDWxJ5 flpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705628802; x=1706233602; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AueCKfkK92AxG/6IJd7YrY+xmTeP8e01vZBo030LmM8=; b=H/eGGTk87ZY20ldOt2/2od0cuUOGKUwkE68QOHKUFMZXWY26eN8YISIZvUtayxYx9o b8aHcfute3YiQK34KuBxt+auySEs7AFj6QxyPFVklUBHaTI9kKZ3TdGdCNc8uHGzYQ5+ SxkYu/Xz/0M1XlvD/nkDwAxTDc9v+Lus2Sxej5j30VBu3BniPU3b6hJ4wcxzNAsqozXb /BV3qVy+rURJoGP4hhoN2P8urxh4j9juieWiQ7jTZXVbI0/guPQSStupccYOGbPpXoNc vdTM4K0S6cg8We673dQMM+W6WcwCw4bHXRCJ2V7alncKXmCU58SE+tqUntrHiECrp2CO cWFA== X-Gm-Message-State: AOJu0YxKsMxcyyAyFKoUVMGEAVq4cAZ6lpaquCi5Od7bm+bydIaCac1y EXtUCGSeA/UUNXevHOVnSfnTODRiC1r+6dZuT9ukD/+nyTJV6Za0PRpxRjNMOFdtI1GS7VgwbB8 3zsHK2W5AajxyV8q4OcTxY3cEseY= X-Google-Smtp-Source: AGHT+IG8H/gwTuHimb/wtOeta/7zcplF6H/nP9A9ZRsXDnkTs2vERVUIyr7VvrS0nX9v6FcSiEemCKNfYILJ6WWk9ts= X-Received: by 2002:a05:6902:1b0d:b0:dc2:1c77:436c with SMTP id eh13-20020a0569021b0d00b00dc21c77436cmr1627083ybb.97.1705628802569; Thu, 18 Jan 2024 17:46:42 -0800 (PST) MIME-Version: 1.0 References: <20240118120347.61817-1-ioworker0@gmail.com> In-Reply-To: From: Lance Yang Date: Fri, 19 Jan 2024 09:46:31 +0800 Message-ID: Subject: Re: [PATCH v2 1/1] mm/madvise: add MADV_F_COLLAPSE_LIGHT to process_madvise() To: "Zach O'Keefe" Cc: Michal Hocko , akpm@linux-foundation.org, david@redhat.com, songmuchun@bytedance.com, shy828301@gmail.com, peterx@redhat.com, mknyszek@google.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 992CD80009 X-Stat-Signature: w6g7gptjynzkgoawo8hjxty4dyxpi8qu X-HE-Tag: 1705628803-532261 X-HE-Meta: U2FsdGVkX1/8AICKnNaX2BJGz0iPKocO+CxutfTaMft37hzbe83SsZdvCxtQUAk6gzGyUnQOKln0/6A4RoGkwoUfSEhipLjEGwuxYTE/XUNY6/NJvG778BdqkzYQi+F5TP/JSpoNzg3msJ14VCmX1MpcIh3oSrtDwwACBA8SIxtgMcKFQOgaYY74X/zDRGb/ep/oCBYWDYoKkzVkLLzF7M/AlquNXNBSdE9ZrYc7kB7+zl80I/Uj7fUkpIibOsKeNF9lVcWiDbkl1Ymlg3GrCnpmHYqg9aUwqJA92zIwkSvvuftcjjNjB54r2R5OwaB6paDkCe5NAPAYo3qb6TYZ2zgR2jToQkucKipVrFNZVQNcJhjlsJV2z0vsh2NdIkgoe7JT90xifBzqwvSkKk++nNXyndVZklRkRMhOhzfL+O8r/Pzz0dA6UsLtT6QKEV8HTDmGvGIrM+qjnXaBUc0wlk34UHEzm0iiJgEDsGvrhaeznjzQXx28ETYSRuci24vFJtHnH/3i0TKVm8BqoYQBU1ri0KVzDUy2keFgkhjT3JQpG/mjbsrOZrDQkKZDrh5q5DV/drTbddhAzG583imFRi4dABwmzEd6tkOl3+jNqE/mWNjDaZEPOrDoEOHX6uDvGsU7z/6j9FpyWuOUC9SFUuYcDZ+hp1b4ZvoMCemxp08tRMaVmbo1JEVz1ex3XzsbjhvW8EUCxIXJ7RU6p5+o6DTczWcFtW6Sa5VDdKUWp1UvJm9T8tJmzH9DPI4LNj+fwToWahBX3owPoVtgUTgE7XVaTM8OlJH5h0ca6oaBuO3+RP2cQYnkPz1ikK/hbTpJKUiwxH4UrvUhvjudV+7uj5wF/irIvx+xW/lLSeoipaisujuG/5fOH9K8kVIGF5AYc0dYaWMH6e7ghakZ4caHLyZQrQMaa0P/bfsxH0Nqj25gV0yBDPTn5hZsz9iOGKBcGrejkUPaEPwAVIkcm9H Mbm6dGyt VOPSPOVTCig0dkEo8ZzJ3u2EA4K8iHlK4FzpmOmv2VEl58BX8lS5sG7EJYAZ964e3CsdBHqHkjphuwVNg4bWTvFmjkgwFB7C+O9fxG8AUc1r2fH+Rp5RtrhEp880e8EREU3Es78FUIK660vV3Enndnc2TMp5TAmSTR6r70IuHpTkXUx3dDnP5RXWKpFa1J/aopIxT+qwh5RO02AKVdqHrr6Ylle+RTnwYe66vFyYnLIkPGwp5IJKhDZu/n8hKuhZfBChJLa+63lAfTlWmKx8hDMimSTTro2O1XSCxSFz5wzvA4M5zLn4nyLEPcXtpaTqPhnUaaOPFxiJoms+GvgBe0RF4QJ2Kub63xRr1T6hkx22jhOmsGmuOcciDEblzH8AOqQV09EMHhN07zFwGg04TNChsnKX73P/m96+sS7Dh7JlaOtR5NJlrsqMu9qQvOF1ra6VLLTS/NAkFOWve6P6LGMQtP9DXCVnnuPoASD8Sh6bPMQIXwkZetrZV4I3tdyd5q1/4PSpLzEfaGsakkUK3Xt4QNDyy2FnJ2MFt/ZamVXiGhZhrMreXi415GlQ8j0/XNj0GnCAu/aE1iVIYItTRfNtwZg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jan 18, 2024 at 10:59=E2=80=AFPM Zach O'Keefe = wrote: > > On Thu, Jan 18, 2024 at 5:43=E2=80=AFAM Michal Hocko wr= ote: > > > > Dang, forgot to cc linux-api... > > > > On Thu 18-01-24 14:40:19, Michal Hocko wrote: > > > On Thu 18-01-24 20:03:46, Lance Yang wrote: > > > [...] > > > > > > before we discuss the semantic, let's focus on the usecase. > > > > > > > Use Cases > > > > > > > > An immediate user of this new functionality is the Go runtime heap = allocator > > > > that manages memory in hugepage-sized chunks. In the past, whether = it was a > > > > newly allocated chunk through mmap() or a reused chunk released by > > > > madvise(MADV_DONTNEED), the allocator attempted to eagerly back mem= ory with > > > > huge pages using madvise(MADV_HUGEPAGE)[2] and madvise(MADV_COLLAPS= E)[3] > > > > respectively. However, both approaches resulted in performance issu= es; for > > > > both scenarios, there could be entries into direct reclaim and/or c= ompaction, > > > > leading to unpredictable stalls[4]. Now, the allocator can confiden= tly use > > > > process_madvise(MADV_F_COLLAPSE_LIGHT) to attempt the allocation of= huge pages. > > Aside: The thought was a MADV_F_COLLAPSE_LIGHT _flag_; so it'd be > process_madvise(..., MADV_COLLAPSE, MADV_F_COLLAPSE_LIGHT) I apologize for the misunderstanding. I will provide the correct implementa= tion in version 3. BR, Lance > > > > IIUC the primary reason is the cost of the huge page allocation which > > > can be really high if the memory is heavily fragmented and it is call= ed > > > synchronously from the process directly, correct? Can that be worked > > > around by process_madvise and performing the operation from a differe= nt > > > context? Are there any other reasons to have a different mode? > > > > > > I mean I can think of a more relaxed (opportunistic) MADV_COLLAPSE - > > > e.g. non blocking one to make sure that the caller doesn't really blo= ck > > > on resource contention (be it locks or memory availability) because t= hat > > > matches our non-blocking interface in other areas but having a LIGHT > > > operation sounds really vague and the exact semantic would be > > > implementation specific and might change over time. Non-blocking has = a > > > clear semantic but it is not really clear whether that is what you > > > really need/want. > > IIUC, usecase from Go is unbounded latency due to sync compaction in a > context where the latency is unacceptable. Working w/ them to > understand how things can be improved -- it's possible the changes can > occur entirely on their side, w/o any additional kernel support. > > The non-blocking case awkwardly sits between MADV_COLLAPSE today, and > khugepaged; esp when common case is that the allocation can probably > be satisfied in fast path. > > The suggestion for something like "LIGHT" was intentionally vague > because it could allow for other optimizations / changes down the > line, as you point out. I think that might be a win, vs tying to a > specific optimization (e.g. like a MADV_F_COLLAPSE_NODEFRAG). But I > could be alone on that front, given the design of > /sys/kernel/mm/transparent_hugepage. > > But circling back, I agree w/ you that the first order of business is to > iron out a real usecase. As of right now, it's not clear something > like this is required or helpful. > > Thanks, > Zach > > > > > > > > [1] https://github.com/torvalds/linux/commit/7d8faaf155454f8798ec56= 404faca29a82689c77 > > > > [2] https://github.com/golang/go/commit/8fa9e3beee8b0e6baa733374099= 6181268b60a3a > > > > [3] https://github.com/golang/go/commit/9f9bb26880388c5bead158e9eca= 3be4b3a9bd2af > > > > [4] https://github.com/golang/go/issues/63334 > > > > > > > > [v1] https://lore.kernel.org/lkml/20240117050217.43610-1-ioworker0@= gmail.com/ > > > -- > > > Michal Hocko > > > SUSE Labs > > > > -- > > Michal Hocko > > SUSE Labs