From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f45.google.com (mail-wg0-f45.google.com [74.125.82.45]) by kanga.kvack.org (Postfix) with ESMTP id 708376B0035 for ; Tue, 8 Apr 2014 15:53:05 -0400 (EDT) Received: by mail-wg0-f45.google.com with SMTP id l18so1472185wgh.4 for ; Tue, 08 Apr 2014 12:53:04 -0700 (PDT) Received: from mail-we0-x232.google.com (mail-we0-x232.google.com [2a00:1450:400c:c03::232]) by mx.google.com with ESMTPS id eh10si1504177wib.58.2014.04.08.12.53.03 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 08 Apr 2014 12:53:03 -0700 (PDT) Received: by mail-we0-f178.google.com with SMTP id u56so1485738wes.9 for ; Tue, 08 Apr 2014 12:53:03 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <1396910068-11637-1-git-send-email-mgorman@suse.de> <5343A494.9070707@suse.cz> Date: Tue, 8 Apr 2014 15:53:02 -0400 Message-ID: Subject: Re: [PATCH 0/2] Disable zone_reclaim_mode by default From: Robert Haas Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Lameter Cc: Vlastimil Babka , Mel Gorman , Andrew Morton , Josh Berkus , Andres Freund , Linux-MM , LKML , sivanich@sgi.com On Tue, Apr 8, 2014 at 10:17 AM, Christoph Lameter wrote: > Another solution here would be to increase the threshhold so that > 4 socket machines do not enable zone reclaim by default. The larger the > NUMA system is the more memory is off node from the perspective of a > processor and the larger the hit from remote memory. Well, as Josh quite rightly said, the hit from accessing remote memory is never going to be as large as the hit from disk. If and when there is a machine where remote memory is more expensive to access than disk, that's a good argument for zone_reclaim_mode. But I don't believe that's anywhere close to being true today, even on an 8-socket machine with an SSD. Now, perhaps the fear is that if we access that remote memory *repeatedly* the aggregate cost will exceed what it would have cost to fault that page into the local node just once. But it takes a lot of accesses for that to be true, and most of the time you won't get them. Even if you do, I bet many workloads will prefer even performance across all the accesses over a very slow first access followed by slightly faster subsequent accesses. In an ideal world, the kernel would put the hottest pages on the local node and the less-hot pages on remote nodes, moving pages around as the workload shifts. In practice, that's probably pretty hard. Fortunately, it's not nearly as important as making sure we don't unnecessarily hit the disk, which is infinitely slower than any memory bank. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org