From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6DD0EC433EF for ; Mon, 22 Nov 2021 04:57:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C2FF36B0071; Sun, 21 Nov 2021 23:56:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BDEF76B0072; Sun, 21 Nov 2021 23:56:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ACDA36B0073; Sun, 21 Nov 2021 23:56:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0157.hostedemail.com [216.40.44.157]) by kanga.kvack.org (Postfix) with ESMTP id A039E6B0071 for ; Sun, 21 Nov 2021 23:56:50 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 58AF68248D52 for ; Mon, 22 Nov 2021 04:56:40 +0000 (UTC) X-FDA: 78835355718.07.2CFD059 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf28.hostedemail.com (Postfix) with ESMTP id A02BB90000A5 for ; Mon, 22 Nov 2021 04:56:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=2aCQpVDlbYWsNB09YHVxAuw9fyHTyZh0tb1unY+8gTY=; b=IKxhl0e7JezN6sCtME5ISB8xjq Lpu74sETuf4Ogz2z5L/1c+Pd+wudeSTFDVQ8+LJNHx7K4N8QudQWunKWpzQIgvwBydtPD55RCpTnn v/D0s5IbNx1x+uzOk2gyspiINa6l7vKVMe1i/22enAeLdIZXmURiCM09zFIObo4v8FczPftK/S1Jw 9oBqhqLE2Kp70p7Unl6BhBqR8YWT7KsMmuG76IQKr3rNOK6NnPBeeW9rU2hkW6xJoY2oOlRSmW4q4 aLqRyYMC6vkxmcvbg9/7YusTGBslg06mvdnaDQcubvWJ86LwvNIFY4eYpXoaFXsW1sSLz5fyzH+vd pdHSaDdw==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1mp1NR-00CZRG-My; Mon, 22 Nov 2021 04:56:25 +0000 Date: Mon, 22 Nov 2021 04:56:25 +0000 From: Matthew Wilcox To: Shakeel Butt Cc: David Hildenbrand , "Kirill A . Shutemov" , Yang Shi , Zi Yan , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm: split thp synchronously on MADV_DONTNEED Message-ID: References: <20211120201230.920082-1-shakeelb@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20211120201230.920082-1-shakeelb@google.com> X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: A02BB90000A5 X-Stat-Signature: j7i7bwmucatbsdqxyepodgyysb19uphg Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=IKxhl0e7; dmarc=none; spf=none (imf28.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org X-HE-Tag: 1637556999-901872 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Nov 20, 2021 at 12:12:30PM -0800, Shakeel Butt wrote: > Many applications do sophisticated management of their heap memory for > better performance but with low cost. We have a bunch of such > applications running on our production and examples include caching and > data storage services. These applications keep their hot data on the > THPs for better performance and release the cold data through > MADV_DONTNEED to keep the memory cost low. > > The kernel defers the split and release of THPs until there is memory > pressure. This causes complicates the memory management of these > sophisticated applications which then needs to look into low level > kernel handling of THPs to better gauge their headroom for expansion. In > addition these applications are very latency sensitive and would prefer > to not face memory reclaim due to non-deterministic nature of reclaim. > > This patch let such applications not worry about the low level handling > of THPs in the kernel and splits the THPs synchronously on > MADV_DONTNEED. I've been wondering about whether this is really the right strategy (and this goes wider than just this one, new case) We chose to use a 2MB page here, based on whatever heuristics are currently in play. Now userspace is telling us we were wrong and should have used smaller pages. 2MB pages are precious, and we currently have one. Surely it is better to migrate the still-valid contents of this 2MB page to smaller pages, and then free the 2MB page as a single unit than it is to fragment this 2MB page into smaller chunks, and keep using some of it, virtually guaranteeing this particular 2MB page can't be reassembled without significant work?