
|
|
tokenizer
Description
This macro separates a list of tokens into its elements. It's especially useful if called by another macro.
The package consists of the file tokenizer.edt, which includes the documentation. The macro is of general purpose.
Usage
The macro expects the named register tkStr to hold the string to work on. It stores the result in the named registers tk0-tk[n] and tkCnt.
A token representing a sublist is excluded from parsing because it is embraced in square brackets (default sublist wrapper). You can change the separator string as well as the strings that enclose a sublist, by definining them before you call the macro.
Used Named Registers (local variables)
- tkStr
-
Contains the string to be tokenized. If %$('tkStr'); is empty, the macro will return tkCnt=0 as result.
- tkSep
-
The sepator string. Default is ";", which can be overridden by Assign("tkSep",";");. The seperator string may consist of more than one character.
- tkSepRegEx
-
set to "1" , you can instruct the macro to treat %$('tkSep'); as Regular Expression. This way you can eg. achieve a tokenizing process with Assign('tkSep','{>>>}|{>>}');, which retrieves text parts separated by single or double empty lines.
- tkSubOpen, tkSubClose
-
The open resp. close string for a potential sublist. Defaults are "[" and "]" , which can be overridden by Assign("tkSubOpen","["); or Assign("tkSubClose","]");
- tk[number]
-
the found tokens are stored in numbered named registers, %$('tk0');, %$('tk1'); , ..., %$('tkn');
- tkCnt
-
the number of found tokens
- tkMatchStr
-
You can instruct the macro to search the token list for the first token matching the content of tkMatchStr. When no tkMatchStr was specified, the search algorithm is skipped by assigning "0" to tkMatchPos (which isn't the same as empty see below).
- tkMatchPos
-
The result of tkMatchStr - as the matching position in the list - is stored in tkMatchPos. The macro starts with an empty tkMatchPos; the search algorithm is performed as long as no value is assigned to tkMatchPos. Thereafter, there is no reason to continue.
- tkResultPrefix
-
can be used to override the default prefix "tk" in the result named registers. E.g. with Assign("tkResultPrefix","list"); tk1,tk2,tk3,..., tkCnt, tkMatchPos become list1,list2,list3,..., listCnt, listMatchPos.
Can come handy, if you have to hold the result of several tokenizing processes in parallel.
Example
Suppose you have a list in the Local Register
%$('tkStr'); : "one;two;[three.a;three.b;three.c];four"
The macro will store the separated items in registers:
%$('tk0'); one
%$('tk1'); two
%$('tk2'); [three.a;three.b;three.c]
%$('tk3'); four
and the number of tokens (in this case 4) in
%$('tkCnt');
%$('tk2'); could be tokenized in a second run. In case you specified the content of the Named Register %$('tkMatchStr'); with eg. " fo" , then %$('tkMatchStr'); holds " 3" as result.
Known Bug
%$('tkMatchStr');: the last one in a list is not detected
See also
text/getRelativeFilePath.edt as an example of how to use this macro.
Installation Instructions
Put the macro in the folder %b\Macros\macro\.
Download
macro/tokenizer.edt
Macro contributed by Georges Schmitz <georges.schmitz heitec.de>
|