12-01-2015 02:12 AM
I am using petalinux 2014.4 on the KC705 board.
I found cp command fails when the source file is very large(e.g. 256MB).
Though it seems to finish normally, the destination file is different from the source file in fact.
The phenomenon can be reproduced very easily by executing the script below.
for VAL in `seq 1 100` do echo "$VAL: creating" dd if=/dev/urandom of=/run/test_src.bin bs=$((4*1024**2)) count=64 echo "$VAL: copying" cp /run/test_src.bin /var/volatile/test_dst.bin echo "$VAL: verifying" cmp -l /run/test_src.bin /var/volatile/test_dst.bin rm /run/test_src.bin /var/volatile/test_dst.bin done
Strangely enough, the cmp command shows that test_dst.bin is different from test_src.bin.
The broken region appears sparsely and they have 4KB boundary.
It can be reproduced on not only real board but also qemu environment.
If you doesn't have the real KC705, you can check it with qemu as below.
# petalinux-create -t project -s Xilinx-KC705-v2014.4-final.bsp # cd Xilinx-KC705-AXI-lite-2014.4 # petalinux-boot --qemu --prebuilt 3
You can also reproduce the phenomenon with open/read/write/close system call.
But when you use open with O_DIRECT flag, the phenomenon disappears.
That is why, I think it is software issue (especially it will be something to do with disc cache of VFS).
Does anyone know how to solve or avoid this problem?
12-01-2015 07:10 AM
I think you are on the right path with the page cache. I did some quick testing as I was curious about it also. I can't get the overcommit stuff to work properly based on what I read. See if you can make any progress.
/proc/sys/vm/overcommit_ratio, overcommit_memory, and overcommit_kbytes
You can watch stuff in /proc/meminfo | grep Commit
12-01-2015 09:12 PM
Thank you for your advice. It is very helpful for me.
I am using 1GB RAM and it is separated as below.
MemTotal: 1034624 kB HighTotal: 524284 kB LowTotal: 510340 kB
I checked the over commit stuff. Here is the result.
# cat /proc/sys/vm/overcommit_ratio 50 # cat /proc/sys/vm/overcommit_kbytes 0 # cat /proc/sys/vm/overcommit_memory 0
 I checked /proc/meminfo just after boot. The result was below.
MemTotal: 1034624 kB MemFree: 994712 kB HighTotal: 524284 kB HighFree: 492912 kB LowTotal: 510340 kB LowFree: 501800 kB CommitLimit: 517312 kB Committed_AS: 32748 kB
 After generating 256MB src file, the result became below.
MemTotal: 1034624 kB MemFree: 732072 kB HighTotal: 524284 kB HighFree: 492912 kB LowTotal: 510340 kB LowFree: 239160 kB CommitLimit: 517312 kB Committed_AS: 294516 kB
 After copying src file to the dst file, the result became below.
MemTotal: 1034624 kB MemFree: 469688 kB HighTotal: 524284 kB HighFree: 464268 kB LowTotal: 510340 kB LowFree: 5420 kB CommitLimit: 517312 kB Committed_AS: 556660 kB
As you mentioned, the Committed_AS certainly became larger than CommitLimit.
But the physical memory is still available because overcommit_ratio == 50.
I think it is no problem. Actually, the exit status of the cp command was 0(success).
I experimentally increased the CommitLimmit by the command below.
sysctl -w vm.overcommit_ratio=75
Then, the Committed_AS became smaller than CommitLimit even after the copy.
But the wrong-copy phenomenon still happens.
After some experiments, I got one more interesting result.
When the src and dst file uses only LowMemory, the phenomenon doesn't happen.
But if it uses both LowMemory and HighMemory, the phenomenon happens.
My test flow is below.
Step1. Boot linux. Step2. Generate 64MB src file. Step3. Copy src file to dst_N file(where N is repeat count). Step4. Check if HighMem is consumed by checking /proc/meminfo. Step5. Repeat Step3 and Step4 until HighMem is consumed.
Before HighMem starts to be consumed, the src file and dst_N file is the same.
But after then, the src file and dst_N file may be different.
According to the result, I think when the disc cache uses both LowMem and HighMem,
this problem happens(This is just my guess).
12-09-2015 09:01 AM
I got some more time to look at this. I did verify that I don't see this issue with the ARM 2014.4 kernel (Zynq in QEMU). That was more to verify that the kernel version (generically) did not exhibit the issue.
I'm not seeing this issue with 2015.2 Petalinux (3.19 kernel) and MicroBlaze so that's good news. I tried to run the 3.19 kernel in 2014.4 but still saw the issue which does not make sense to me yet. I'm still looking at the details there.
I do see some changes to MicroBlaze (arch/microblaze) between 2014.4 and 2015.2 that could have fixed an issue but I cannot identify the specific issue yet.
12-09-2015 09:05 AM
12-09-2015 01:08 PM
I have found a work around if you can live with a bit less memory. When i turn off high memory support in the kernel I don't see the issue across multiple kernel versions. I tested from 3.10 to 4.0 and always saw the issue with high memory on.
Memory: 769860K/786432K available (3572K kernel code, 153K rwdata, 1008K rodata, 3455K init, 536K bss, 16572K reserved)
If you can live with that amount of memory above, then disable high memory in the kernel configuration (kernel features->High memory support) and rebuild the project.
I'll continue to try to understand the root of the problem. Since I don't see this issue with ARM it seems like either a bug in the MicroBlaze kernel or a kernel configuration issue that is not yet obvious.
12-09-2015 01:40 PM
I've also found a couple other kernel configurations with more low memory seem to be working. I'm not sure I completely understand the affects of this change so use with caution til I learn more.
I found in the kernel configuration the amount of low memory can be increased. (kernel features->Prompt for advanced kernel configuration options), then select Set maximum low memory and enter a size. I'm testing with 0x38000000 and 0x3C000000 successfully. A test of 0x40000000 caused the kernel not to boot.
These options get most of the memory ( > 900 MB) back so that there is less impact (that is understood) to not having high memory in the kernel.
12-10-2015 02:09 AM
Thank you for your great effort.
After some experiments, I have a simple question.
__kunmap_atomic() function in the highmem.c has the lines below.
pte_clear(&init_mm, vaddr, kmap_pte-idx); local_flush_tlb_page(NULL, vaddr);
Full code is available on the URL below. Please check it.
I found these lines are disabled when CONFIG_DEBUG_HIGHMEM is not set. Do you think it is correct?
I wonder that __kunmap_atomic() doesn't call pte_clear() even though kmap_atomic_prot() calls set_pte_at().
Actually, when I define CONFIG_DEBUG_HIGHMEM, the copy failure didn't happen.
Furthermore, some other architectures always clear the pte in the __kunmap_atomic().
For example, x86 calls kpte_clear_flush() in the __kunmap_atomic(). Please check the URL below.
I hope this information helps your analysis.
12-11-2015 06:21 AM