Skip to main content

Check out Interactive Visual Stories to gain hands-on experience with the SSE product features. Click here.

Skyhigh Security

Prepare the IDM (Enhanced) Fingerprint File

Fingerprint files are created when you train the data source file using the DLP Integrator, which includes the IDMTrain tool. As a prerequisite for index document matching, the data source file must be trained using the IDMTrain tool to generate the .db file.

All values in the data source file are normalized and hashed in the fingerprint file, regardless of the definition you use in classifications.

Install the DLP Integrator

To use IDM, you need to install DLP Integrator v.6.4.0, which includes the IDMTrain tool, and is supported on both Windows and Linux platforms.

For more information, see:

Generate the database (.db) files using the IDMTrain Tool

You can use the command line interface (CLI) or any third-party data transfer tool, such as PuTTy to run the idmtrain command with these options to train the data source file.

Command Line Options
CLI Option - Short form CLI Option - Full form Description
-? --help Shows the IDMTrain tool help and exit
-v [  --verbose  ] Shows the verbose output
-V [  --version  ] Display version information and exit
-q [  --quantity  ] Extra initial dummy train for quantity of files
-A [  --all-files  ] Process all files (including hidden files)
-E [  --no-errors  ] Specifies not to generate error messages or enforce thresholds
-W [  -no-warnings  ] Specifies not to generate warning messages
-j [  --json  ] file Output progress and exit status to file as JSON
-r [  --report  ] file Output training information to file as JSON
-o [  --output  ] file Resultant database to create
-e [  --errors  ] % (=5) Specifies the error threshold in percentage
-D [ --db-name ] name Specifies the database name (default based on output file name)
-x [  --exclude  ] pat ... Exclude files with case insensitive MS-DOS pattern
-p [ --progress ] [=secs(=2)] Shows the progress after the specified interval

[ --options ] file ...

Read positional options from the options file section below

-a [ --age-whole ] days Include files that are at least this many days old
-s [ --size-whole ] sigs Include files with at least this many signatures
-t [ --type-whole ] pat ... Include files with MS-DOS pattern
Options File

An example for running the training tool with the command:

idmtrain -p -q -r c:\idm\out\traininfo.json -o c:\idm\out\fingerprint.db -C ee9a4c72-81ff-479d-a493-1104d37100ea -R c:\idm\small_samples\

is equivalent to:

idmtrain -p -q -r c:\idm\out\traininfo.json -o c:\idm\out\fingerprint.db -@ c:\idm\folders.txt 

 where c:\idm\folders.txt contains:

-C ee9a4c72-81ff-479d-a493-1104d37100ea -R c:\idm\small_samples\

Here the options file can have many lines up to a limit of 16 MB. When using the options file, wild card expansion that would occur via the shell is not performed. To use UTF-8, rather than the native character set, save the file with a UTF-8 BOM (byte order mark).

Whole File Match

If your fingerprint directories contain older files that cannot be excluded based on their file type using the -xoption, you can use the -a option to exclude them based on their age. For example, the following command excludes files older than 30 days from the training process:

idmtrain -p -q -r c:\idm\out\traininfo.json -o c:\idm\out\fingerprint.db -C ee9a4c72-81ff-479d-a493-1104d37100ea -R c:\idm\small_samples\ -a 30
Position Dependent Options
CLI Option - Short form CLI Option - Full form  
-I [  --ignore  ] Train following paths to ignore rather than classify
-C [  --class  ] guid ... Train following paths with these classifications
-P [  --path  ] path ... Train files directly under these paths
-R [  --rpath  ] path ... Train files recursively under these paths

In a standard run, the following points should be noted.

  • Hidden files will not be trained. Hidden files on Windows have the hidden attribute set or start with ".", on Linux files starting with "." are hidden. Also on all OSs  __MACOSX folders will not be trained. The command line -A or --all-files option will train hidden files.
  • Directory symbolic links will not be followed.

NOTE: If -r is not specified then warnings/errors will go to stderr and there will be a completed message. If -p is specified then there will be progress output that will go to the JSON file if -j is used. The -q option is required for percentage progress.

  • Was this article helpful?