[mdlug] UNIX tips: Learn 10 good UNIX usage habits

Tue Mar 11 14:44:59 EDT 2008

2008-03-11, Monsieur Robert Citek a ecrit:
> (
> mkdir -p tmp/a/
> tmpfile=tmp/a/longfile.txt
> for COUNT in 2 20000000; do
>  yes and | head -$COUNT > $tmpfile
>  time -p grep -c and $tmpfile
>  time -p < $tmpfile grep -c and
>  time -p cat $tmpfile | grep -c and
> done \
>> & output.txt
> )

Here are my results (this is on an AMD X2 3700+ 2.2GHz w/1GB RAM, in
Fedora 7 with a 2.6.24.2 kernel):

$ yes and | head -20000000 > tmp/a/longfile.txt

$ time grep -c and tmp/a/longfile.txt
20000000

real    0m1.848s
user    0m1.736s
sys     0m0.053s

$ time cat tmp/a/longfile.txt | grep -c and
20000000

real    0m1.977s
user    0m1.802s
sys     0m0.104s

So the one grep beats the cat+pipe to grep.
Thank you for helping me to make my point, I appreciate it. :)

> 1) For the 95% of the time where cat+pipe is inefficient, it doesn't
> matter.  So, don't worry about it.

I recall saying that it was *unnecessary* 95% of the time, but who
said that it was *inefficient* 95% of the time? There's a difference,
you know. ;)

> 2) Do the experiment yourself to verify the data.  Don't believe
> everything you read on the internet.

Absolutely, couldn't agree more.

> 3) 87.4% of all statistics are made up on the spot.

I'd agree with that about 79.6% of the time.

Out of curioisty, I appended the string "lametest" twice to that
longfile.txt file, then grepped for "me" both ways:

$ time grep -c me tmp/a/longfile.txt
2

real    0m0.243s
user    0m0.180s
sys     0m0.045s

$ time cat tmp/a/longfile.txt | grep -c me
2

real    0m0.316s
user    0m0.196s
sys     0m0.102s

Single grep wins again.

Of course, real-world tests would be preferable.

Michael