bash&shell Serial articles:http://www.cnblogs.com/f-ck-need-u/p/7048359.html
<http://www.cnblogs.com/f-ck-need-u/p/7048359.html#blogshell>

To compare whether the contents of two files are exactly the same, Easy to usediff command. for example:
diff file1 file2 &>/dev/null;echo $?
howeverdiff Command can only give two file parameters, So you can't compare multiple files at once( Directories are also treated as files), Anddiff Very inefficient when comparing non text files or very large files.

You can usemd5sum To achieve, Comparisondiff Line by line comparison of,md5sum It's much faster.

md5sum See:Linux Chinese documentMD5 check <http://www.cnblogs.com/f-ck-need-u/p/7430264.html>.

butmd5sum Only by viewingmd5 Value to indirectly compare files for the same, To achieve automatic batch comparison, It needs to be written as a loop. The script is as follows:
#!/bin/bash ########################################################### #
description: compare many files one time # # author : Golden Dragon # # blog :
http://www.cnblogs.com/f-ck-need-u/ #
########################################################### # filename: md5.sh
# Usage: $0 file1 file2 file3 ... IFS=$'\n' declare -A md5_array # If use while
read loop, the arrayin while statement will # auto set to null after the loop,
so i usefor statement # instead the while, and so, i modify the variable IFS to
# $'\n'. # md5sum format: MD5 /path/to/file # such
as:80748c3a55b726226ad51a4bafa1c4aa/etc/fstab for line in `md5sum "[email protected]"` do
index=${line%% *} file=${line##* } md5_array[$index]="$file
${md5_array[$index]}" done # Traverse the md5_array for i in ${!md5_array[@]}
do echo -e "the same file with md5: $i\n--------------\n`echo
${md5_array[$i]}|tr ' ' '\n'`\n" done
To test the script, Copy a few files first, And modify the contents of several of them, for example:
[[email protected] ~]# for i in `seq -s' ' 6`;do cp -a /etc/fstab /tmp/fs$i;done
[[email protected]~]# echo ha >>/tmp/fs4 [[email protected] ~]# echo haha >>/tmp/fs5
Now,/tmp Under the catalog6 Filefs1,fs2,fs3,fs4,fs5 andfs6, amongfs4 andfs5 Be modified, Surplus4 File contents are identical.
[[email protected] tmp]# ./md5.sh /tmp/fs[1-6] the same file with md5:
a612cd5d162e4620b442b0ff3474bf98-------------------------- /tmp/fs6 /tmp/fs3
/tmp/fs2 /tmp/fs1 the same file with md5: 80748c3a55b726226ad51a4bafa1c4aa
-------------------------- /tmp/fs4 the same file with md5:
30dd43dba10521c1e94267bbd117877b-------------------------- /tmp/fs5
More general comparison methods: Compare files with the same name in multiple directories.
[[email protected] tmp]# find /tmp -type f -name "fs[0-9]" -print0 | xargs -0 ./md5.sh
the samefile with md5:a612cd5d162e4620b442b0ff3474bf98
-------------------------- /tmp/fs6 /tmp/fs3 /tmp/fs2 /tmp/fs1 the same file
with md5:80748c3a55b726226ad51a4bafa1c4aa-------------------------- /tmp/fs4
the samefile with md5:30dd43dba10521c1e94267bbd117877b
-------------------------- /tmp/fs5
Script description:

(1).md5sum The calculation result format is"MD5 /path/to/file", Therefore, it is necessary to output both in the resultMD5 value, Same outputMD5 Corresponding documents, Consider using arrays.

(2). I used it at the beginningwhile loop, Read each file from standard inputmd5sum Result. Statements are as follows:
md5sum "[email protected]" | while read index file;do md5_array[$index]="$file
${md5_array[$index]}" done
But because of the pipelinewhile Statement in childshell Execute in, Thereforewhile Array assigned inmd5_array Will fail at the end of the cycle. So it can be rewritten as:
while read index file;do md5_array[$index]="$file ${md5_array[$index]}" done
<<<"$(md5sum "[email protected]")"
But I ended up using the more cumbersomefor loop:
IFS=$'\n' for line in `md5sum "[email protected]"` do index=${line%% *} file=${line##* }
md5_array[$index]="$file ${md5_array[$index]}" done
butmd5sum There are two columns in each row result of, andfor Loop to defaultIFS The two columns are split into two values, So it was modifiedIFS The value of the variable is$'\n', Make a line assign a variable once.

(3).index andfile Variables are used tomd5sum The result of each row of is split into two variables,MD5 Partial as arrayindex,
file As part of the value of an array variable. therefore, The array assignment statement is:
md5_array[$index]="$file ${md5_array[$index]}"
(4). After array assignment, Start traversing array. There are many ways to traverse. I'm going through arraysindex list, That is, every row.MD5 value.
# Traverse the md5_array for i in ${!md5_array[@]} do echo -e "the same file
with md5: $i\n--------------\n`echo ${md5_array[$i]}|tr ' ' '\n'`\n" done