From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF48CC433EF for ; Fri, 5 Nov 2021 16:13:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 363AB6023B for ; Fri, 5 Nov 2021 16:13:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 363AB6023B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id A04616B006C; Fri, 5 Nov 2021 12:13:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9B3996B0071; Fri, 5 Nov 2021 12:13:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8A2746B0073; Fri, 5 Nov 2021 12:13:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0219.hostedemail.com [216.40.44.219]) by kanga.kvack.org (Postfix) with ESMTP id 778ED6B006C for ; Fri, 5 Nov 2021 12:13:32 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 3E383180BAA7B for ; Fri, 5 Nov 2021 16:13:32 +0000 (UTC) X-FDA: 78775371864.09.C59C730 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf19.hostedemail.com (Postfix) with ESMTP id 91A04B0000B8 for ; Fri, 5 Nov 2021 16:13:22 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10159"; a="229394686" X-IronPort-AV: E=Sophos;i="5.87,212,1631602800"; d="scan'208";a="229394686" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Nov 2021 09:07:03 -0700 X-IronPort-AV: E=Sophos;i="5.87,212,1631602800"; d="scan'208";a="502000107" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Nov 2021 09:07:02 -0700 Date: Fri, 5 Nov 2021 09:07:02 -0700 From: Ira Weiny To: Qu Wenruo Cc: Linus Torvalds , David Sterba , Linux Kernel Mailing List , Linux-MM Subject: Re: Kmap-related crashes and memory leaks on 32bit arch (5.15+) Message-ID: <20211105160702.GY3538886@iweiny-DESK2.sc.intel.com> References: <20211104115001.GU20319@twin.jikos.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.11.1 (2018-12-01) X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 91A04B0000B8 X-Stat-Signature: ry8yug8qsey5j83um68w949u384mxucd Authentication-Results: imf19.hostedemail.com; dkim=none; spf=none (imf19.hostedemail.com: domain of ira.weiny@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=ira.weiny@intel.com; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none) X-HE-Tag: 1636128802-483674 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Nov 05, 2021 at 08:01:13AM +0800, Qu Wenruo wrote: > [snip] > > > BTW, I also thought that part can be suspicious, as it keeps the page mapped > (unlike all other call sites), thus I tried the following diff, but no > difference for the leakage: I just saw this thread and I was wondering why can't kmap_local_page() be used? I know we are trying to remove highmem from the kernel but the DAX stray write protection I've been working on depends on the kmap interface to ensure that DAX pages are properly accessed.[1] Also there are a couple of new helpers which could be used instead of the changes below. I know this does not solve the problem Linus is seeing and DAX is not yet supported on btrfs but I know there has been some effort to get it working and things like commit ... 8c945d32e604 ("btrfs: compression: drop kmap/kunmap from lzo") ... would break that support. > > diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c > index 65cb0766e62d..0648acc48310 100644 > --- a/fs/btrfs/lzo.c > +++ b/fs/btrfs/lzo.c > @@ -151,6 +151,7 @@ static int copy_compressed_data_to_page(char > *compressed_data, > kaddr = kmap(cur_page); > write_compress_length(kaddr + offset_in_page(*cur_out), > compressed_size); > + kunmap(cur_page); > *cur_out += LZO_LEN; > > orig_out = *cur_out; > @@ -160,7 +161,6 @@ static int copy_compressed_data_to_page(char > *compressed_data, > u32 copy_len = min_t(u32, sectorsize - *cur_out % sectorsize, > orig_out + compressed_size - *cur_out); > > - kunmap(cur_page); > cur_page = out_pages[*cur_out / PAGE_SIZE]; > /* Allocate a new page */ > if (!cur_page) { > @@ -173,6 +173,7 @@ static int copy_compressed_data_to_page(char > *compressed_data, > > memcpy(kaddr + offset_in_page(*cur_out), > compressed_data + *cur_out - orig_out, copy_len); > + kunmap(cur_page); memcpy_to_page()? > > *cur_out += copy_len; > } > @@ -186,12 +187,15 @@ static int copy_compressed_data_to_page(char > *compressed_data, > goto out; > > /* The remaining size is not enough, pad it with zeros */ > + cur_page = out_pages[*cur_out / PAGE_SIZE]; > + ASSERT(cur_page); > + kmap(cur_page); > memset(kaddr + offset_in_page(*cur_out), 0, > sector_bytes_left); > + kunmap(cur_page); memzero_page()? Just my $0.02 given I've been trying to remove kmap() uses and btrfs remains one of the big users of kmap(). Thanks, Ira [1] https://lore.kernel.org/lkml/20210804043231.2655537-16-ira.weiny@intel.com/ > *cur_out += sector_bytes_left; > > out: > - kunmap(cur_page); > return 0; > } > > Thanks, > Qu > > > > In particular, what if "offset_in_page(*cur_out)" is very close to the > > end of the page? > > > > That write_compress_length() always writes out a word-sized length (ie > > LZO_LEN bytes), and the above pattern seems to have no model to handle > > the "oh, this 4-byte write crosses a page boundary" > > > > The other writes in that function seem to do it properly, and you have > > > > u32 copy_len = min_t(u32, sectorsize - *cur_out % sectorsize, > > orig_out + compressed_size - *cur_out); > > > > so doing the memcpy() of size 'copy_len' should never cross a page > > boundary as long as sectorsize is a power-of-2 smaller or equal to a > > page size. But those 4-byte length writes seem like they could be page > > crossers. > > > > The same situation exists on the reading side, I think. > > > > Maybe there's some reason why the read/write_compress_length() can > > never cross a page boundary, but it did strike me as odd. > > > > Linus > > > >