%expand%
| Bracing against the wind | |
| www.documentroot.com |
|
Thursday, February 10, 2011
Approximate Line Count for Very Large Files
A good estimate (2 sig figs) is, 90% of the time, what I need. alc is my "approximate" line count tool. It counts the number of lines in a file, just like wc, except it only "samples" the file in a series of segments. By seeking and reading 200K from a dozen places in the file, rather than reading the whole thing, I get a good representative sample, and an accurate-enough count. UPDATE: The new version 1.2 uses a log-linear model to predict the line-count of compressed (gzipped) files. For files ending in ".gz", the output is always the predicted uncompressed line count, not the total line count. It's off by no more than 10% for the vast majority of files tested, and a huge improvement over the older version. [View/Post Comments] [Digg] [Del.icio.us] [Stumble] |
|
Bloghop:
|
Blogarama
|
Technorati
|
Blogwise