Chapter 6. Scanning

Table of Contents

1. Scanning profiles
2. Attributes generation
2.1. Fixed
2.2. DocumentType
2.3. Incremented
3. Empty page detection
3.1. Attributes

Tahiti has wide support for scanning and supports desktop, mid-range and also hi-speed scanners. It is important to carefully prepare scanning process. Such process have to be cost/time efective. There are lot of different scenarios and Tahiti can be easily accomodated for such process. Scanning is often directly connect not only with the document digitalization but also with document identification - attaching document type, filling attributes and sending document to the document store.

Some of the possible scanning scenarios:

Used scenario depends on the type of the scanned documents, amount of documents and other conditions. It is also possible to use combination ot these methods.

Important part of the efective scanning process is to use predefined Section 1, “Scanning profiles”. These profiles are stored in stand-alone xml file which can be distributed to the users. Profiles can be also used to further automate document processing, set barcode recognition options and generation of attributes.

1. Scanning profiles

Scanning profiles are defined in the xml file. This file is part of the configuration and is called scans.xml. File has to be in one of the following locations:

  • <tahiti-dir>/locale/<locale>/scans.xml

  • <tahiti-dir>/configs/<domain>/scans.xml

  • <repository>/<domain>/scans.xml

scans.xml can contain list of definitions common for all scanners and also individual list of definition for given scanner. Following example contains two profiles common for all scanners. First profile is called "ADF-gray-150" and second "Flat-Photo 9x13". Profile name should be short and easily understandable. E.g. First part of name in the example says if automatic document feeder (ADF) is used or flat scanner (Flat).

Example 6.1. scans.xml

<?xml version="1.0"?>
<scan_formats>
 <scanner ProductName="*">
  <format id="1" name="ADF-gray-150" 
          resolution="150" depth="8" 
          feeder="1" autofeed="1"
          duplex="0" 
          size_x="8.268" size_y="11.692"
          option="compression=30;" /> 
  <format id="2" name="Flat-Photo 9x13"
          resolution="300" depth="24"
          feeder="0" autofeed="0"
          duplex="0" 
          size_y="4.2" size_x="5.8" /> 
 </scanner>
</scan_formats>

There have to be root tag called <scan_formats>. This contains one or more tags called <scanner> each describing configuration for one scanner. Common configuration for all scanners is <scanner ProductName="*"> and each configuration file heve to contain such entry. There can be used name of the scanner instead of asterisk for individual scanner configuration.

Tag <scanner> contains list of predefined scanning profiles. Each of them is defined inside separate tag <format>.

Table 6.1. scans.xml, tag format

AttributeMandatoryDescription
idYIdentificator of the entry.
nameYName of the profile - user visible
resolutionYResolution in DPI
depthYBit-depth, can be 1, 8, 24.
thresholdNThreshold (0-255), valid only for depth="1"
optionNString describing options for used codec.
xferNTransfer protocol - communication between Tahiti and scanner. Only Twain experts should change this value. Possible values: NATIVE, FILE, MEMORY.
xferFormatNTransfer format if xfer="FILE". Possible values: TIFF, PICT, BMP, XBM, JFIF, FPX, TIFFMULTI, PNG, SPIFF, EXIF
size_xNPage size in inches (width).
size_yNPage size in inches (height).
feederNFlag is feeder shoould be used. 0 - not used, 1 - use automatic feeder
duplexNFlag if use duplex scanning. O - no used, 1 - use duplex scanning
pageFormatNFormat of scanned page, can be use instead of size_x, size_y. Available values: A3, A4, A5, B3, B4, B5, C3, C4, C5, LETTER, USLEGAL. Some scanners do not allow to set size_x and size_y and only page format can be specified.
transformationNTransformation function

2. Attributes generation

During scanning process various attributes can be generated and used in created documents. Attribute generation is driven by configuration file scanattrs.xml.

Example 6.2. scanattrs.xml

<ScanAttributes>
 <Profile name="All">
  <Fixed id="Scan.Agency" name="Agentura" value="TA"/>
  <Fixed id="Scan.separator" name="separator" value=""/>
  <DocumentType id="Document.type" name="Dokument" shared="0"/>
  <Incremented id="CISLO_JEDNACI" name="CISLO_JEDNACI" 
             prefix="" length="4" shared="0" incrOnValue="1"/>
  <Fixed id="HOSP_ROK" name="HOSP_ROK" value=""/>  
  <Fixed id="OBDOBI" name="OBDOBI" value=""/>  
 </Profile>
</ScanAttributes>

Attributes are generated in groups ( all attributes from active group are inserted into newly created document ). Group is defined in section <Profile>. Every group has it's name and set of attribute generators. During scanning at most one group can be active (user select active group by it's name in Tahiti). Group can contain attribute generators of following types:

  • Fixed

  • DocumentType

  • Incremented

2.1. Fixed

Fixed attribute generator produce constant value for every new document.

Table 6.2. Attributes

Attribute NameDescription
idAttribute with this id will be added to created document.
nameName of attribute. This name is displayed in Tahiti.
valueValue of attribute. Can be changed in Tahiti.


2.2. DocumentType

DocumentType attribute generator produce attribute which contains name of document type for every new document. Value can be set in Tahiti where user can select document type from all document types supported in Tahiti.

Table 6.3. Attributes

Attribute NameDescription
idAttribute with this id will be added to created document.
nameName of attribute. This name is displayed in Tahiti.
shared1-value of this attribute is shared across all groups. 0-value is local for this group.

2.3. Incremented

Incremented attribute generator produce attribute which contains value created from prefix and numerical part for every new document. Numerical part is incremented on given event type. Value is incremented before inserted into document. Last used value is stored on disk for next use.

Table 6.4. Attributes

Attribute NameDescription
idAttribute with this id will be added to created document.
nameName of attribute. This name is displayed in Tahiti.
prefixPrefix of value
lengthLength of value
incrOnValueEvent type for increment numerical part of value. Numerical part is incremented when value of attribute Scan.separator is same as given value. Tahiti internally generate following values of attribute Scan.separator: 1-separation page type-1, 2-separation page type-2
shared1-value of this attribute is shared across all groups. 0-value is local for this group.

It is possible to generate current date, user name as part of prefix. Variables usable in the prefix:

%y

year (last 2 digits)

%m

month (2 digits)

%d

day (2 digits)

%H

hours

%M

minutes

%u

username

prefix="cp-%y%m%d" - will generate string with prefix and current date, e.g. cp-080517

3. Empty page detection

During scanning process it is posible to detect and reject empty pages. It is very usefull when duplex scan mode is used.

3.1. Attributes

Parameters are set in tahiti.xml and can be overwritten in domain.xml.

Table 6.5. Parameters driving detection of empty page.

Scan.EmptyPage.Soil.LevelDetection of "interesting" pixels ( pixels carrying information ). Pixel is interesting when its intensity differ from average value more then Scan.EmptyPage.Soil.Level. Posible values <0,255>. Default value 15.
Scan.EmptyPage.Soil.RatioFactor of filling of page <0,10000>. 0 - no data on page. Page is not empty when detected filling is greater than Scan.EmptyPage.Soil.Ratio. Default value - 90 ( at least 0.9 % of page is filled )
Scan.EmptyPage.SideSize of strip of ignored part of image <0,1000> per mille. Default value - 30 ( ignore 3 % from each margin).
Scan.EmptyPage.TypeType of detection algorithm 2-old, 3-new (recomended).