Parse-O-Matic
Home Sitemap
 

About the PSKB / Terms of Use

Installing and running a script


;=============================================================================== ; ; Text File Successive Duplicate Line Remover ; ; This script copies a file. As it does so it will delete duplicate lines. ; It can make this decision based on the entire line, or it can do so based ; on a column (character position) range, so a line need not be entirely ; identical to the preceding one to be considered a duplicate. For example, ; you could compare lines based only on positions 5 to 10. ; ; Deletion is done only on "successive" duplicates. This means that the ; lines must follow one another (or be part of a series of duplicates). ; Thus, the following would be cleaned up: ; ; AAAAAAAAAAAAAA ; AAAAAAAAAAAAAA ; BBBBBBBBBBBBBB ; ; ... yielding one line of A's and one line of B's. The following example ; would NOT be altered: ; ; AAAAAAAAAAAAAA ; BBBBBBBBBBBBBB ; AAAAAAAAAAAAAA ; ; There is indeed duplication here, but it is not successive duplication. ; This script could be modified to handle non-successive duplication, but ; that would require saving the entire input file in an array, which is ; beyond the scope of this demonstration. ; ; This script works only on CRLF-terminated (Windows/DOS) text files, ; though modifying it to work on Unix/Linux and Mac files would be very ; easy. ; ; This script is designed for use with the Parse-O-Matic Power Tool, ; which is available from www.parse-o-matic.com. ; ;=============================================================================== ; Config Section ;=============================================================================== Config $CfgEnableOptionX = 'Y' $CfgCaptionX = '&FromColumn' $CfgHintX = 'Starting column number (blank = start of line)' $CfgEnableOptionY = 'Y' $CfgCaptionY = '&ToColumn' $CfgHintY = 'Ending column number (blank = use entire line)' $CfgEnableOptionZ = 'Y' $CfgCaptionZ = '&Log?' $CfgHintZ = 'Enter Y to copy deleted lines to log; N otherwise' $CfgCopyright = 'Copyright 2005-2009 by Pyroto, Inc.' $CfgVersion = '1.00.00' $CfgProgrammer = 'Kevin Beck' $CfgEmail = 'info' $40 'parse-o-matic.com' ; Note anti-spam tactic $CfgLicense = 'This script may be used by anyone who has a valid ' >> 'Advanced Scripting License from Pyroto, Inc.' >> ', or is evaluating one of our ' >> 'Parse-O-Matic products (for up to 30 days).' End ;=============================================================================== ; TaskInit Step ;=============================================================================== TaskInit ;----------------------------------------------------------------------------- ; Check options ;----------------------------------------------------------------------------- Call CheckOption $OptionX '/FromColumn' FromCol = CheckOption Call CheckOption $OptionY '/ToColumn' ToCol = CheckOption If ToCol #< FromCol Stop '"FromColumn" must be less than "ToColumn"' If $OptionZ = '' $OptionZ = 'N' If 'YN' ~ $OptionZ Stop 'Please set the Log? option to Y or N' >> $0A$0D$0A$0D >> 'Y saves all deleted lines to the log file' $0A$0D >> 'N does not do this' ;----------------------------------------------------------------------------- ; Handy constants ;----------------------------------------------------------------------------- NoMatch = $0A$0D ; This cannot be a line in a CRLF-delimited text file End ;=============================================================================== ; FileInit Step ;=============================================================================== FileInit ;----------------------------------------------------------------------------- ; Are we logging deletions? ;----------------------------------------------------------------------------- Begin $OptionZ = 'Y' LogMsgLF LogMsg '-------------' LogMsg 'Deleted Lines' LogMsg '-------------' NumDeleted = 0 End ;----------------------------------------------------------------------------- ; Set LastFragment in case multiple files are being copied (using wildcards). ; Note that this script sends all lines to the same output file. This could ; be easily changed, using $CfgDefaultOFN = '' and the OutFile command. ;----------------------------------------------------------------------------- LastFragment = NoMatch End ;=============================================================================== ; FileDone Step ;=============================================================================== FileDone ;----------------------------------------------------------------------------- ; Are we logging deletions? ;----------------------------------------------------------------------------- Begin $OptionZ = 'Y' If NumDeleted = 0 LogMsg 'No deletions' End ;----------------------------------------------------------------------------- ; Set LastFragment in case multiple files are being copied (using wildcards). ; Note that this script sends all lines to the same output file. This could ; be easily changed, using $CfgDefaultOFN = '' and the OutFile command. ;----------------------------------------------------------------------------- LastFragment = NoMatch End ;=============================================================================== ; Main Step ;=============================================================================== ; If $Data = '' Done ; Uncomment this line to ignore null lines ;------------------------------------------------------------------------------- ; Assess starting column ;------------------------------------------------------------------------------- If FromCol <> 0 PosnFrom = FromCol ; A start column was specified Otherwise PosnFrom = 0 ; We use zero so null lines are also seen If $Data Len< PosnFrom Call OutDone ; FromCol exceeds length of input line ;------------------------------------------------------------------------------- ; Assess ending column ;------------------------------------------------------------------------------- Begin ToCol <> 0 PosnTo = ToCol ; An end column was specified If $Data Len< PosnTo Call OutDone ; ToCol exceeds length of input line Else PosnTo = Len $Data ; No end column was specified End ;------------------------------------------------------------------------------- ; Compare with the previous line ;------------------------------------------------------------------------------- TestFragment = Cols $Data PosnFrom PosnTo ; Get the fragment Begin TestFragment = LastFragment ;----------------------------------------------------------------------------- ; It's the same as the last fragment; log it if we're doing that ;----------------------------------------------------------------------------- Begin $OptionZ = 'Y' LogMsg $Data Inc NumDeleted End Done End ;------------------------------------------------------------------------------- ; This is different, so output it and remember it ;------------------------------------------------------------------------------- LastFragment = TestFragment OutEnd $Data Done ;=============================================================================== ; Subroutines ;=============================================================================== Procedure CheckOption MyOption = Parse CheckOption '>*/' '' 'Cut' TrimChar CheckOption Begin CheckOption = '' CheckOption = 0 Exit End TestNum = Numeric CheckOption If TestNum = 'N' Stop '"' MyOption '" must be blank (i.e. empty) or a number' If CheckOption #< 0 Stop '"' MyOption '" may not be a negative number' End Procedure OutDone OutEnd $Data LastFragment = NoMatch Done End

Here is some good data for testing this script (copy only the actual lines of text, not the blank lines). AAAAAAAAAAAAA AAAAAAAAAAAAA AAAAAAAAAAAAAAA BBBBAAAAAAAAA BBBBBBBBBBBBBBBBBB BBBBBBBBBBBBBBBBBB CCCCCCCCCCCCCCCCCC BBBBBBBBB BBBBBBBBBBBBBBBBBB CCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCC DDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDD




 

Parse-O-Matic Free, Basic, Business and Enterprise are data conversion tools that allow you to parse, convert, mine, import and export data files, reports, web capture, logs, legacy databases, text, CSV (comma separated; comma delimited), ASCII, EBCDIC, and almost any data format that you may have.

Copyright © 1986-2010 Pyroto, Inc. All rights reserved. Legal