Parse-O-Matic
Home Sitemap
 

About the PSKB / Terms of Use


Installing and running a script

;=============================================================================== ; ; ASCII Clean-Up Converter ; ; - Translate ISO-8859-1 ASCII to 7-bit ASCII, or delete such characters ; - Strip control characters (except Tab, CR, LF, FF), or leave them as-is ; ; This script works on all text files: Windows, DOS, Unix/Linux, Mac. ; It is designed for use with the Parse-O-Matic Power Tool, available ; from www.parse-o-matic.com. ; ;------------------------------------------------------------------------------- ; ; High-Bit Character Cleanup ("Edit" Option) ; ------------------------------------------ ; ; This script translates "diacritical" characters to their unaccented ; equivalent. Special symbols are translated to a very rough visual ; approximation. As an alternative, this script can remove the ; characters altogether. ; ; The characters affected all have an ASCII value above 127 ($7F). Thus, ; all characters in the output will have an ASCII value below 128 ($80). ; ; While fixing accented characters is simple enough, some characters simply ; cannot be translated. The "Plus/Minus" symbol ($B1), for example, is ; translated into a percent sign, while the "Thorn" symbols (e.g. $DE) are ; translated to a lowercase d. In some cases this translation can render ; the translated text meaningless, as when the symbol for "one quarter" ; ($BC) is translated to a percent sign. ; ; Fortunately, it is easy to modify the translation table (which is named ; Translate_Table). You could even modify it to flag all suspect characters ; with the same seldom-used character (such as the vertical bar character, ; which is $7C). ; ; This script could further be enhanced to replace single characters with ; multiple characters. For example, the symbol for "three quarters" ($BE) ; could be replaced with the string '3/4'. However, this could change the ; length of the lines of text. ; ;------------------------------------------------------------------------------- ; ; Control-Character Cleanup (CtrlChars Option) ; -------------------------------------------- ; ; Control characters are those whose ASCII value is below that of the space ; character, as shown on any ASCII chart. Most of these characters are not ; very useful. This script can delete the ones that are little-used, while ; retaining those that do get used: ; ; $09 = Tab ; $0A = LF (Line Feed) ; $0C = FF (Form Feed) ; $0D = CR (Carriage Return) ; ;=============================================================================== ; Config Section ;=============================================================================== Config $CfgEnableOptionX = 'Y' $CfgCaptionX = 'E&dit' $CfgHintX = 'High-bit chars: ' >> 'T = Translate; D = Delete; K = Keep as-is' $CfgEnableOptionY = 'Y' $CfgCaptionY = '&CtrlChars' $CfgHintY = 'Control chars: ' >> 'D = Delete; K = keep as-is' $CfgEnableOptionZ = 'N' $CfgInpFileType = 'Binary' $CfgRecLen = 100 ; This number is fairly arbitrary $CfgCopyright = 'Copyright 2005-2009 by Pyroto, Inc.' $CfgVersion = '1.00.00' $CfgProgrammer = 'Timothy Campbell' $CfgEmail = 'info' $40 'parse-o-matic.com' ; Note anti-spam tactic $CfgLicense = 'This script may be used by anyone who has a valid ' >> 'Advanced Scripting License from Pyroto, Inc.' >> ', or is evaluating one of our ' >> 'Parse-O-Matic products (for up to 30 days).' End ;=============================================================================== ; TaskInit Step ;=============================================================================== TaskInit ;----------------------------------------------------------------------------- ; Control characters ;----------------------------------------------------------------------------- Control_Chars = >> $00 $01 $02 $03 $04 $05 $06 $07 $08 $0B $0E $0F >> $10 $11 $12 $13 $14 $15 $16 $17 $18 $19 $1A $1B $1C $1D $1E $1F ;----------------------------------------------------------------------------- ; High-bit characters ;----------------------------------------------------------------------------- High_Bit_Chars = >> '' >> ; $80 to $8F '' >> ; $B0 to $BF ' ' >> ; $A0 to $AF '' >> ; $B0 to $BF '' >> ; $C0 to $CF '' >> ; $D0 to $DF '' >> ; $E0 to $EF '' ; $F0 to $FF ;----------------------------------------------------------------------------- ; Translation table to convert ISO-8859-1 to 7-bit ASCII ;----------------------------------------------------------------------------- Translate_Table = >> 'EX,f,,tt^%S>' >> ; $80 to $8F 'X''''""---"Ts>oXzY' >> ; $90 to $9F ' !cLcY|S:Ca<~-R-' >> ; $A0 to $AF '^%23''uP-,1^>%%%?' >> ; $B0 to $BF 'AAAAAAECEEEEIIII' >> ; $C0 to $CF 'DNOOOOOxOUUUUYdB' >> ; $D0 to $DF 'aaaaaaeceeeeiiii' >> ; $E0 to $EF 'onooooo/ouuuuydy' ; $F0 to $FF ;----------------------------------------------------------------------------- ; Check options ;----------------------------------------------------------------------------- If 'TDK' ~ $OptionX Stop 'Please set the Edit option to T or D or K' >> $0A$0D$0A$0D >> 'T translates characters above ASCII 127' $0A$0D >> 'D deletes such characters' $0A$0D >> 'K keeps all high-bit characters as-is' If 'DK' ~ $OptionY Stop 'Please set the CtrlChars option to D or K' >> $0A$0D$0A$0D >> 'D deletes control characters (except Tab, CR, LF, FF)' $0A$0D >> 'K keeps control characters (leaving them as-is)' End ;=============================================================================== ; Main Step ;=============================================================================== Cntr = 0 Begin $Data Len> Cntr ;----------------------------------------------------------------------------- ; Get the next character ;----------------------------------------------------------------------------- Inc Cntr Char = Cols $Data Cntr ;----------------------------------------------------------------------------- ; Do a quick check to see if we need to do anything ;----------------------------------------------------------------------------- Check = '' If Char <= $1F Check = 'C' If Char >= $80 Check = 'H' If Check = '' Continue CtrlTest = $OptionY Check ;----------------------------------------------------------------------------- ; Process the character ;----------------------------------------------------------------------------- FindChar = '1*' Char Begin CtrlTest = 'DC' ;--------------------------------------------------------------------------- ; Remove control characters ;--------------------------------------------------------------------------- CharPosn = FindPosn Control_Chars FindChar 'MatchCase' Begin CharPosn <> 0 $Ignore = Parse $Data Cntr Cntr 'Cut' Dec Cntr Continue End End Begin $OptionX <> 'K' If Check <> 'H' Continue ;--------------------------------------------------------------------------- ; Look up the character ;--------------------------------------------------------------------------- CharPosn = FindPosn High_Bit_Chars FindChar 'MatchCase' If CharPosn = 0 Continue Begin $OptionX = 'T' ;------------------------------------------------------------------------- ; Translate the character ;------------------------------------------------------------------------- NewChar = Cols Translate_Table CharPosn Overlay $Data Cntr NewChar Else ;------------------------------------------------------------------------- ; Delete the character ;------------------------------------------------------------------------- $Ignore = Parse $Data Cntr Cntr 'Cut' Dec Cntr End End Again ;------------------------------------------------------------------------------- ; Output ;------------------------------------------------------------------------------- Output $Data




 

Parse-O-Matic Free, Basic, Business and Enterprise are data conversion tools that allow you to parse, convert, mine, import and export data files, reports, web capture, logs, legacy databases, text, CSV (comma separated; comma delimited), ASCII, EBCDIC, and almost any data format that you may have.

Copyright © 1986-2010 Pyroto, Inc. All rights reserved. Legal