RSS

 
WinEdt Macro Library  | macro previous macro  | next macro

tokenizer

Description

This macro separates a list of tokens into its elements. It's especially useful if called by another macro.

The package consists of the file tokenizer.edt, which includes the documentation. The macro is of general purpose.

Usage

The macro expects the named register tkStr to hold the string to work on. It stores the result in the named registers tk0-tk[n] and tkCnt.

A token representing a sublist is excluded from parsing because it is embraced in square brackets (default sublist wrapper). You can change the separator string as well as the strings that enclose a sublist, by definining them before you call the macro.

Used Named Registers (local variables)

tkStr

Contains the string to be tokenized. If %$('tkStr'); is empty, the macro will return tkCnt=0 as result.

tkSep

The sepator string. Default is ";", which can be overridden by Assign("tkSep",";");. The seperator string may consist of more than one character.

tkSepRegEx

set to "1" , you can instruct the macro to treat %$('tkSep'); as Regular Expression. This way you can eg. achieve a tokenizing process with Assign('tkSep','{>>>}|{>>}');, which retrieves text parts separated by single or double empty lines.

tkSubOpen, tkSubClose

The open resp. close string for a potential sublist. Defaults are "[" and "]" , which can be overridden by Assign("tkSubOpen","["); or Assign("tkSubClose","]");

tk[number]

the found tokens are stored in numbered named registers, %$('tk0');, %$('tk1'); , ..., %$('tkn');

tkCnt

the number of found tokens

tkMatchStr

You can instruct the macro to search the token list for the first token matching the content of tkMatchStr. When no tkMatchStr was specified, the search algorithm is skipped by assigning "0" to tkMatchPos (which isn't the same as empty see below).

tkMatchPos

The result of tkMatchStr - as the matching position in the list - is stored in tkMatchPos. The macro starts with an empty tkMatchPos; the search algorithm is performed as long as no value is assigned to tkMatchPos. Thereafter, there is no reason to continue.

tkResultPrefix

can be used to override the default prefix "tk" in the result named registers. E.g. with Assign("tkResultPrefix","list"); tk1,tk2,tk3,..., tkCnt, tkMatchPos become list1,list2,list3,..., listCnt, listMatchPos.

Can come handy, if you have to hold the result of several tokenizing processes in parallel.

Example

Suppose you have a list in the Local Register

    %$('tkStr'); : "one;two;[three.a;three.b;three.c];four"
The macro will store the separated items in registers:
     %$('tk0'); one
     %$('tk1'); two
     %$('tk2'); [three.a;three.b;three.c]
     %$('tk3'); four
and the number of tokens (in this case 4) in
    %$('tkCnt');
%$('tk2'); could be tokenized in a second run. In case you specified the content of the Named Register %$('tkMatchStr'); with eg. "fo" , then %$('tkMatchStr'); holds "3" as result.

Known Bug

%$('tkMatchStr');: the last one in a list is not detected

See also

text/getRelativeFilePath.edt as an example of how to use this macro.

Installation Instructions

Put the macro in the folder %b\Macros\macro\.

Download

macro/tokenizer.edt

Macro contributed by Georges Schmitz <georges.schmitzheitec.de>

WinEdt Macro Library  | macro previous macro  | next macro