On 4/28/21 10:03 PM, Theodore Ts'o wrote: > (Hmm, why did you cc linux-km on this report? I would have thought > dm-devel would have made more sense?) > > On Tue, Apr 27, 2021 at 04:15:39PM +0800, kernel test robot wrote: >> FYI, we noticed the following commit (built with gcc-9): >> >> commit: 21175ca434c5d49509b73cf473618b01b0b85437 ("ext4: make prefetch_block_bitmaps default") >> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master >> >> in testcase: mdadm-selftests >> version: mdadm-selftests-x86_64-5d518de-1_20201008 >> with following parameters: >> >> disk: 1HDD >> test_prefix: 01r1 >> ucode: 0x21 > So this failure makes no sense to me. Looking at the kmesg failure > logs, it's failing in the md layer: > > kern :info : [ 99.775514] md/raid1:md0: not clean -- starting background reconstruction > kern :info : [ 99.783372] md/raid1:md0: active with 3 out of 4 mirrors > kern :info : [ 99.789735] md0: detected capacity change from 0 to 37888 > kern :info : [ 99.796216] md: resync of RAID array md0 > kern :crit : [ 99.900450] md/raid1:md0: Disk failure on loop2, disabling device. > md/raid1:md0: Operation continuing on 2 devices. > kern :crit : [ 99.918281] md/raid1:md0: Disk failure on loop1, disabling device. > md/raid1:md0: Operation continuing on 1 devices. > kern :info : [ 100.835833] md: md0: resync interrupted. > kern :info : [ 101.852898] md: resync of RAID array md0 > kern :info : [ 101.858347] md: md0: resync done. > user :notice: [ 102.109684] /lkp/benchmarks/mdadm-selftests/tests/01r1fail... FAILED - see /var/tmp/01r1fail.log and /var/tmp/fail01r1fail.log for details > > The referenced commit just turns block bitmap prefetching in ext4. > This should not cause md to failure; if so, that's an md bug, not an > ext4 bug. There should not be anything that the file system is doing > that would cause the kernel to think there is a disk failure. > > By the way, the reproduction instructions aren't working currently: > >> To reproduce: >> >> git clone https://github.com/intel/lkp-tests.git >> cd lkp-tests >> bin/lkp install job.yaml # job file is attached in this email > This fails because lkp is trying to apply a patch which does not apply > with the current version of the md tools. Hi Ted, Thanks for the feedback, yes, there's patch already be merged into mdadm, we have removed it from our code. > >> bin/lkp split-job --compatible job.yaml >> bin/lkp run compatible-job.yaml > And the current versions lkp don't generate a compatible-job.yaml file > when you run "lkp split-job --compatable"; instead it generates a new > yaml file with a set of random characters to generate a unique name. > (What Multics parlance would be called a "shriek name"[1] :-) We have updated the steps to avoid misunderstanding. > > Since I was having trouble running the reproduction; could you send > the /var/tmp/*fail.logs so we could have a bit more insight what is > going on? I attached the log file for your reference, btw the test is from https://github.com/neilbrown/mdadm/blob/master/tests/01r1fail, you may want to run it directly. Best Regards, Rong Chen > > Thanks! > > - Ted > _______________________________________________ > LKP mailing list -- lkp@lists.01.org > To unsubscribe send an email to lkp-leave@lists.01.org