File "ScanFilter_Guide.txt", dated 4 Nov. 2009, for version 2009v1.2 of ScanFilter View this ASCII file with a constant width screen font. *** The ScanFilter Guide *** -- program by Jens Dierks (= JD) -- documentation by Laurent Siebenmann (= LS) INTRODUCTION The program ScanFilter is designed to efficiently (albeit incompletely) remove 'speckles' from bw (= black and white = bilevel) scanned images of print. The term 'speckle' here refers to a small collection of pixels with a common color (b or w), which the program user considers inapproptiate and hence wants to switch the color. Such "despeckling" is typically a preliminary step in preparing scanned images of print for onscreen viewing. It reduces file bulk in all compressed formats; it also increases image quality when done with care. Despeckling is often urgent when the original scan files were bw rather then grayscale, the degree of urgency depending on the scanner and its settings. ScanFilter currently operates under the Microsoft Windows operating system only. It combines a neat interactive user interface with the capacity to process many thousands of pages per hour. Its speed comes from occasional use of assembly language, and also some 64 bit vector processing features that are collectively called "MMX". The latter vector features were introduced on Intel's Pentium microprocessors, and now exist on many others, including AMD's K6 and later. Furthermore, ScanFilter has good input-output (= i/o) capabilities thanks to the free "LibTiffDelphi" code library that was accessed by JD from his Delphi Pascal programming environment; thus it can even serve as as a format converter. A last-minute addition to this version is a bitmap inspection tool intended to collaborate with other bw bitmap filtering utilities. It is a setup for visual scrutiny of differences between any two bw bitmaps on the same pixel grid. It is accessed by "drag-and-drop" onto the icon of the ScanFilter program. OPERATION ScanFilter's image formats for input and output (= i/o) can thus belong to common bw subtypes of two major formats for rectangular bitmaps, namely BMP and TIF. More precisely: --- the uncompressed bw "BMP" or "Bitmap" format, whose standard interplatform filename extension is ".bmp". It was promulgated by Microsoft. (The RLE compressed variant of BMP is not supported; use TIF with RLE in its stead, as indicated next.) --- the "TIF" (= Tagged Image File) format family, whose standard interplatform filename extension is ".tif". It was originally promulgated by Aldus Inc. and is now maintained by Adobe Inc. Where bw is concerned, "TIF" is a binary format with (optionally) one of several compression types, including "RLE" (= Run Length Encoding), which is usually the fastest, and "G4" (= Fax Group 4), which is usually the most compact. For output, ScanFilter currently supports only these two subtypes. It does *not* support multipage TIF files in any way, except to input the first page. One can launch ScanFilter as a normal MS Windows program by clicking on its icon. There is an old-fachioned command-line user interface as well as a graphical interface. We first describe the syntax of the commandline interface since its commands are used (piecemeal) in the graphical interface and again in certain log files. COMMAND LINE MODE This primitive mode of operation should be readily portable to the Linux and Macintosh operating systems. When calling the program from the command line, the core syntax for an 'atomic' filter process, including a minimum of i/o, is: ScanFilter f(o)(bw)(e) Some further i\o command line syntax, including an extra "-Tree" option, will be elucidated later through command examples. -- f is 1, 2, or 3, a filter type identifier that is always present. Parentheses indicate that the last three items are may be absent (are optional). -- (o) is an optional integer parameter specific to f. -- (bw) is b or w, a color option requesting alteration of black pixels only or white pixels only. -- (e) is an option commanding a certain bitmap enlargement for the duration of the processing; it is available currently for f = 2 or 3 only; its value is the letter 'e' if present. The first number f specifies one of these three well known filter types: 1 ==> "3x3 median", a 3-by-3 weighted median filter with just one variable weight, the "center weight" CW, which is specified as the the option o 2 ==> "Least N" suppression of the 8-way connectivity color components having pixelcount < N 3 ==> "Least N" similar, but for 4-way connectivity The second number o is an option depending on f, as follows: "3x3 median" : o = CW = -9...9 an integer, the "center weight". If CW is not specified, it has default value 5. "Least N" : o = 1..100 = N, is the least pixelcount of the connected color components on which Default value is N=4. color will not be changed. Default value is N=4. (bw) the color option: b ==> Change only black pixels w ==> Change only white pixels not specified ==> Change both black and white pixels For basic infomation on these filter types see the accompanying files "3x3_Weighted_Median.txt" and "8-_and_4-connectivity.txt". For further information on these (and many others) see references at the end of this file. NOTES ON THE "3x3 MEDIAN" FILTER TYPE (See also the file "3x3_Weighted_Median.txt" in the ScanFilter distribution.) -- center weight CW = 9 leaves the bitmap unchanged. -- negative center weight CW = -n is permitted, which is not (yet) traditional. When CW = -9, one gets interchange of black and white at all pixels , which is a conversion frequently used. -- the action of a '3x3 median filter' is in principle undefined for the pixels touching the sides of the bitmap's rectangle -- since such pixels have no 3x3 neighborhood. Many implementors simply leave color unchanged there. However, ScanFilter always does act there -- by enlarging the bitmap, to a neighborhood consisting of 9 grid rectangles, using the (imaginary) device of placing 4 mirrors along the 4 sides of the given bitmap rectangle. -- if CW is even, there are cases when the color median determining the new color lies exactly between black and white. In these cases, ScanFilter paints the pixel in question black. (This is the only situation in which ScanFilter is biased toward a specific color without a visible option commanding the bias.) NOTE ON THE 'e' OPTION As stated above, this applies only to the "Least N" filter types (f = 2 or 3). When the extension option 'e' is present, the bitmap is temporarily extended during processing by the mirroring device mentioned above, which is used (always) for the 3x3 median filter type (f=1). Thus, in effect, any color component touching the rectangular grid perimeter has its pixel count doubled or quadrupled, because it is viewed as part a component extended by reflections to intersect 2 or 4 neighbors of the bitmap's bounding rectangle. COMPOSITE FILTER PROCESSES Given a finite (but nonempty) sequence of 'atomic' filter process designators (as in the last syllable of the command lines explained above), they can be placed together on the command line separated by one or more spaces; the result will be sequential composition of these processes, one after the other, starting on the left. Their order is in general important! A COMMAND SUMMARY THROUGH EXAMPLES ScanFilter scan1.bmp scan1#.bmp 212b Will load scan1.bmp, do the function "Least N" for 8-connectivity with N = 12 on black pixels only, and store the processed bitmap under the name "scan1#.bmp". ScanFilter scan1.bmp scan1#.bmp 3w 13 Will load scan1.bmp, apply "Least N" filter for 4-connectivity with N = 4 on white pixels only, and then apply a "3x3 median" filter with center weight 3 on all pixels, and finally store the processed bitmap in the file "scan1#.bmp". ScanFilter *.bmp *#.bmp 3w 13 differs in that all bw BMP files in the current folder (= directory) will be processed and stored with the original nameroot augnented by "#" placed after it. ScanFilter *.bmp *.tif 3w 13 differs in that all bw BMP files in the current folder will be converted to TIF files of the run-length encoded (= RLE) variety. To get the TIF fax group 4 variety instead, the command is: ScanFilter *.bmp *.tif 3w 13 -outTIFFG4 "Filtered\*.bmp" as the output specification will place the output in a (possibly new) subfolder "Filtered" of the current folder. "d:\*" as output specification will store the files on drive d at top level. "s*" as input specification will process all files having a filename starting with letter "s" or "S". ScanFilter * Filtered\* 2w 13 -Tree acts as above, but all recognized bw bitmap files in subfolders are also processed. The whole 'subfolder-tree' containing those bw files, will reproduced in processed form within the (possibly) new "Filtered" folder. ScanFilter * ..\Filtered\* 2w 13 -tree differs in that the (possibly) new "Filtered" folder will lie alongside of the input folder. Scanfilter c:\* d:\* 26 -Tree Will process all recognized input files in drive c and store them in drive d with strictly parallel (sub)folder structure. CAVEAT. There may be no 'overwrite prompt', so be careful in specifying the output! For example, just "*" for output will overwrite all input files! There are three command keywords designating possible output formats: -outBMP -outTIFFRLE -outTIFFG4 but reasonable default conventions often make them unnecessary. They should come after the filter process designation. The -Tree keyword already mentioned also comes somewhere after. FORMAT CONVERSIONS The process 21 (the "Least 1" process with 8-connectivity) leaves bitmaps unchanged. And it has been optimized to waste no time doing so! It is thus suitable for performing file format conversions. BATCH FILE MODE One makes a 'batch file with suffix ".bat" using a text editor, puting the commands as described above into this file as a sequence of lines. Then put the ScanFilter progam and the ".bat" file into the folder relative to which you will specify the files to process, and then start processing by clicking on the ".bat" file. THE INTERACTIVE USER INTERFACE It hopefully has enough familiar and visible features to be almost self-explanatory. ScanFilter's interactive user interface has two halves called called "Image Mode" and "Folder Mode" respectively, each with its own window and a transfer button to toggle to the other mode. "Image Mode" is normally entered on launching ScanFilter. It is devoted to interactively processing some individual bitmap file. It is designed for careful construction of a composite filter processes to later despeckle a big collection of related scan files, for example, one arising from scanning a complete printed volume. Some notable features: -- One can open any admissible bitmap file by dropping its icon onto the icon of the ScanFilter executable. It immediately appears in the Image Mode window. -- One can navigate through the bitmap and zoom in to parts of it. The zooming is always toward (or away from) the center of the window displaying the bitmap or part thereof. Maximum zoom is 'x 4', which is sufficient to make each pixel of the image a distinctly visible and readily countable square. -- One can observe any process as it happens, and even toggle rapidly between the the image before and after the filter process being studied, be that process 'composite' or 'atomic'. -- One can save files in intermediate forms. (Use a title somehow identifying the process involved.) -- One can quickly revert to the bitmap file originally opened. Or choose a new one. -- Having built a suitable composite process, one can move on to "Folder Mode", via a transfer button. -- To permit ScanFilter to serve as an inspection tool for *any* bw bitmap filter program (local or remote!), here is a hidden feature permitting one to compare any two admissible bw bitmaps on the same pixel rectangle. Just push the two bitmap icons (together not separately) onto the scanfilter icon. Both images will be loaded into the Image Mode viewer; and the toggle button carrying the labels "undo last function"/"redo last function" will now toggle between the two bitmaps just loaded; and the filenames will be toggled correspondingly. One can also navigate and zoom as already described. Beware however that any filter process then applied to one of the bitmaps will flush out the other, in effect replacing it by the freshly processed file. "Folder Mode" is the second half of the interactive interface and has its own window. Its purpose is to apply a user designated filter process (possibly 'composite') to a user designated class of bitmap files in a user designated folder, outputing into a user designated folder, using a a user designated output file format. Its syntax borrows from the commandline syntax. In other respects, it is rather conventional. LOG FILES. Each process of the sort described for the interactive "Folder Mode" gives rise to a log file in ASCII text format that appears alongside of ScanFilter's executable file. This log records in outline what ScanFilter has done, much as does the window of "Folder Mode". In production work, it is advisable to preserve this log along with the original and the filtered bitmaps. If, at some later time, damage or avoidable speckling is discovered in the filtered bitmaps, the log will help you discover the cause and hopefully suggest remedies. Modern hard disk capacities and highspeed filtering together make it feasible to maintain original scan files indefinitely, and refilter them whenever your filtering technology significantly improves. LONGTERM DEVELOPMENT It seems foolish to predict longterm development in any detail. Instead, here are four circumstances that will probably drive further development: -- an evident need to improve the quality of freely available scanned scientific works currently found on Internet, in particular the substantial legacy literature in the public domain. Where mathematics is concerned, best access portal is Ulf Rehmann's: http://www.math.uni-bielefeld.de/~rehmann/DML/dml_links.html -- a scarcity of similar programs for improving bw bitmaps, especially ones fast enough for production work. -- a rich literature on manipulating bw bitmaps (see references). The known techniques go far beyond the classical 'weighted median' and 'connected component' filterng implemented in the present ScanFilter version. Those of so-called 'morphological image processing' seem particularly relevant; see the reference volumes. -- an extraordinarily effective system called "DjVu" for electronicly publishing bw scans of print. It involves formats, viewers, and compilers. And all have adequate versions that are freely available. See http://djvu.org/ http://any2djvu.djvuzone.org/ http://yann.lecun.com/ex/downloads/index.html DjVu delivers bw bitmaps to readers worldwide with an unrivalled combination of efficiency and viewing quality. Modest Internet bandwith and modest computers suffice. Good despeckling improves both density and quality of the DjVu files. POSSIBLE SHORTTERM DEVELOPMENTS There will be an accompanying list of possible improvements to the current "ScanFilter", some of which reflect obvious shortcomings. However, no improvements are currently scheduled; indeed JD is already 100% occupied bith other programming projects. PAST DEVELOPMENTS This program arose from discussions in the Internet newsgroup "sci.image.processing" that were initiated by LS in June 2009. This newsgroup incidentally provides an open forum suitable for further discussion of ScanFilter. Several persons made helpful contributions, many of which are not mentioned in ScanFilter documentation; they can all be located on Internet. JD was the only contributor who provided an autonomous executable binary (in July 2009); it was a fast one, with source. Development toward the present ScanFilter version continued in correspondence between JD and LS throughout August, September, and October 2009. JD did all the programming, using his preferred algorithms, and working in the Delphi Pascal environment under MS Windows; LS offered ideas, did much testing, and finally put together this documentation. DISTRIBUTION CONDITIONS The use of this program is unrestricted. Use it at your own risk; there are no warranties! Free distribution on internet of (complete!) copies of any version of ScanFilter is permitted. Commercial distribution of ScanFilter is forbidden. Here is the current ScanFilter distribution site: http://lcs98.free.fr/soft/scanfilter/ It is managed by LS and will serve for updates, and for postings of and/or references to related software. AUTHORS' CONTACT COORDINATES -- program author Jens Dierks, jdierks dot fw at freenet dot de -- documentation author Laurent Siebenmann, laurent at math dot sunysb dot edu SOURCECODE AND COMPILATION for ScanFilter2009 As a complement to the standalone binary "ScanFilter.exe" in Version 1.2 of ScanFilter2009, there is its source code that JD compiled using the commercial Delphi6 Pascal compiler (see URL reference). The external modules for Delphi that the compilation required are two well-supported and freely available ones: LibTiffDelphi, and DirectoryEdit from RxLib (see URL references below). If you are actively interested in further devellopment of ScanFilter, please contact (JD) and (LS). REFERENCES --- Other Documentation Files --- --- in the ScanFilter distribution --- ** "3x3_Weighted_Median.txt" ** "8-_and_4-connectivity.txt" --- Internet Documentation--- ** the file "meanmed.pdf" at http://www.cs.tau.ac.il/~turkel/notes/ includes a helpful introduction to 'weighted median' filtering. See also the file "3x3_Weighted_Median.txt". ** http://www.visionbib.com/bibliography/twod268.html is an immense bibliography for 'weighted median' filtering. --- Three General reference volumes --- ** W.K. Pratt, "Digital Image Processing", third edition 2001, editor John Wiley. ** K.R. Castleman, "Digital Image Processing", Prentice Hall, 1996. ** J.C. Russ, "The Image Processing Handbook", CRC Press and IEEE Press, ISBN 0-8493-2532, in many editions edition since 1990s. --- Programmer resources for ScanFilter --- ** Delphi 2007, 2009 and 2010 of Embarcadero Corp. are the currently sold Pascal development environments for MSWindows that are direct successors to the Delphi6 environment used by JD to compile ScanFilter2009. URL: http://www.embarcadero.com/products/delphi/ ScanFilter has yet to be compiled in these currently sold environments. ** LibTiffDelphi is needed in the compilation of ScanFilter2009, this is a pre-compiled version of LibTiff for use in the Delphi programming environment; it is a sequence of ".obj" files. URL: http://www.awaresystems.be/imaging/tiff/delphi.html This URL is incidentally an excellent portal to information on TIF format, and more. ** RxLib is a library for Delphi Pascal, having URL: http://www.micrel.cz/RxLib/index.html The component called DirectoryEdit (used by JD for directory path selection) lies in a category called RxTools inside RxLib. For the compilation of ScanFilter2009, it suffices to install RxTools into Delphi6.