Next Previous Contents

3. Rules

In Epos, nearly all of the TTS processing is controlled by a rule file; there is one rule file per language and it usually has the .rul suffix. The rule file for the German language, for instance, resides by default in lng/german/german.rul. The rules may also slightly vary for the individual voices using the soft options.

3.1 Text Structure Representation Overview

The text being processed by Epos is internally stored in a multi-level data structure suitable for the application of transformational rules. Every phonetic unit (or an approximation of one) is represented by a single node in the structure. The nodes are organized into layers corresponding to linguistic levels of description, such that a unit of level n can list its immediate constituents, that is units of level n-1. Every layer also has a symbolic name, which is used to refer to it in the rules.

The number and symbolic names of individual levels can be specified with the unit_levels option before the languages are defined. An example is given in a table.

Level name written TSR semantics spoken TSR semantics
text the whole text the whole text
sent sentence construction terminated utterance
colon sentence/clause/colon intonational unit
word word stress unit
syll word syllable
phone letter sound
segment segment
Available Text Structure Representation layers (an example)

Every unit, be it segmental level or not, may contain a character. The TSR, as generated by the text parser, contains the appropriate punctuation at suprasegmental levels (that is, levels above the phone level): spaces at the word level, commas at the intonational unit level, periods, question marks and such will become the contents of a sentence (terminated utterance) level. Some suprasegmental units will have no content, because they have been delimited only implicitly; for example, a colon-final word has been delimited by a comma, but the comma is actually a colon level symbol: the last word will have no content. This content may be modified by the rules and actually, it often is. This allows marking up a unit for a later use (changing its content into an arbitrary character, such as a digit or anything else, then applying some rules only within units having this contents using a rule of type inside.)

3.2 Rule File Syntax Overview

The rules are applied sequentially, unless stated otherwise. Each rule operates units of a certain level within a unit of some other level; for instance, a rule may assimilate phones within a word, another rule may change the syllabic prosody within a colon. The smaller units being manipulated are called target units, the larger unit is referred to as a scope unit; the respective levels are called scope and target. Each scope unit is always processed separately (from any other scope units) as if no other text ever existed. For example, if the scope of some assimilation happens to be "word", every word will have the rule applied in isolation and the assimilation will never apply across the word boundary nor will be able to distiguish a word boundary from sentence boundary.

Any line of the rules file may contain at most one rule and possibly some comment. The rule begins with an operation code specifier (what to do), followed by the parameter (one word, opcode specific), and possibly by scope and target specification, if the defaults (usually word and phone, respectively) are not suitable.

The scope and the target can be one of the available levels of linguistic description as defined with the unit_levels option. If target or even scope for a rule is not specified, the default_target or default_scope option value, respectively, will be used. The typical defaults are phone and word, respectively.

Every rule is evaluated within certain unit, and the scope specifies, what kind of unit it should be. The meaning of the target is somewhat opcode specific, but generally, this is the level which is affected by that rule, or the lowest level affected by that rule within the scope. See the individual rule descriptions in this section in conjunction with the real world rule files for exact interpretation of the target level.

The code, scope and target identifier is not case sensitive, but the parameter usually is.

3.3 Character Encoding

As you sometimes need different character encodings for different languages, there is this mechanism for switching character encodings in text files including rule files and dictionaries.

3.4 Escaping Special Characters

You can use the backslash to escape any special character including the backslash itself anywhere in the rules just as in the configuration files. See the corresponding section for details.

Notice especially the possibility of referring to several internal pseudocharacters with the nice property that they can never be found in the input text and therefore are suitable for temporary markers of all kinds in the rules. See the raise rule example.

In addition, special characters listed in the table of escape sequences can be inserted using the same mechanism.

3.5 Using Comments and the @include Directive

Any text starting with a semicolon or # not in the middle of a word up to the end of the line is a comment. It will be properly ignored. If a line doesn't contain anything except whitespace and/or comment, it is also ignored. The @include directive can be used to nest the rule files. The same rules apply within .ini files; for more details, see the @include directive in configuration files.

3.6 Macros

A line which doesn't contain a rule may contain a macro definition instead. It is specified as identifier = replacement, for example,

$vowel = aeiouy
Alternatively, the keyword external may follow an identifier instead of the equality sign and the replacement:
$some_pathname  external

This way the macro identifier is assigned the value of its corresponding configuration parameter (for the current language if possible).

The macros will get expanded anywhere where they occur except for their own point of definition. Therefore, $vowel $short$long will be a valid macro definition, provided that $short and $long have already been defined. The expansion is performed at the definition time and it is not iterated, because the replacement is not expected to contain the dollar sign.

Macros can later be redefined if you wish and they can be local to a block of rules as described below.

If there be any uncertainty concerning the exact length of the identifier, you can use braces to delimit it: ${name} is usually equal to $name, but $nameaeiou is not equal to ${name}aeiou. It is also possible to use a colon or an ampersand as a delimiter: $name&aeiou.

It is a good practice to use macros extensively for classes of symbols so that the same sets and subsets of characters are listed only once in the rules and therefore are kept consistent throughout. The exact values of the macros are however always language specific and so Epos doesn't specify any built-in macros. If any macros are used in the examples for specific rules below, reasonable definitions of the macros are assumed to precede the rule.

For an abundance of examples see existing rule files.

3.7 The "Except" (!) Operator

Whenever an unordered list of tokens should be specified within the parameter to some rule (use common sense and/or individual rule descriptions above), you can also make negative specifications, such as "all consonants except l and r". To do this, use the exclamation mark serving as an "except" operator: $consonants!lr (The right operand is subtracted from the left one.) If there is no left operand, say in !x, the semantics is "all but x". A consequence is that ! alone means "everything".

The operator is right-associative; !$vowels!ou means "all excluding vowels, but o and u don't count as vowels just now". Therefore, o and u are included in this unordered list.

This operator never works for ordered lists, not even for the syll rule sonority groups. But there is a similar usage associated with rule types if, with, prep and postp, where the exclamation mark can be used to negate the condition; see the respective rule types.

3.8 Dictionary-oriented Rules

The rule types described in this subsection operate in some way on a list of words (or other strings), which can range from a few items up to machine-generated megabytes of data. These strings are usually listed in a separate file, while the parameter of such a rule is the file name. Alternatively, the strings can be quoted inside the rule file, especially if only a few ones are listed. Such a collection of strings is called a dictionary and obeys the same format for any rule type which needs external data (except for the neural networks).

The dictionary consists of multiple lines, each of which contains a single dictionary item. An item consists of two whitespace separated words, the former being the item itself, the latter being some string associated with the item. Often, the second string is used to replace every occurrence of the first string in the text being processed. That's why the strings are called replacee and replacer, respectively. The order of dictionary items is not significant.

We use adaptive hash tables -- and balanced optionally bounded depth AVL trees for collisions -- for representation of the dictionary in memory to achieve instant lookups of any item, even in a huge dictionary.

The replacee cannot contain whitespace (unless escaped with a backslash), but the replacer can. That is, if more than two words are found on a line, the first one is the replacee and the rest of the line, except for any post-replacee and/or trailing whitespace, becomes the replacer. However, some rule types may not allow multiple word replacers.

The dictionaries follow the same conventions for character encoding, escaping special characters, inclusion directives and comments as the rule files and other text files.

Instead of a file name reference, it is possible to quote the contents of the dictionary directly; this is done by encapsulating the contents in double quotes. Dictionary items are in this case whitespace-separated, every replacer and replacee are separated by a comma.

The dictionary may either be parsed and loaded into memory at Epos startup or at the moment of the first use. The former option's advantage is early error reporting, while the latter can sometimes completely avoid loading a huge unused dictionary. Use the option paranoid to choose your preference.

Type subst

Substring substitution. The replacers replace every occurrence of their respective replacees; longer matches are matched first; the process is iterated until no replacee occurs in the string. It there is a tie between several matches of equal length, the rightmost match is chosen.

It is required either to have a phone target, or to keep all the replacers and replacees the same length (because of the descendants of the units affected). Note also that to be considered a match in the former case (target phone), all characters other than phones also have to match (must be found or not found on the same positions in both the replacee and the occurrence in question). The only exception is the terminating scope-level separator (if any), which is ignored and preserved.

Any replacer may begin with a ^ or end with a $. That forces the substring being replaced to be at the beginning or the end of the scope unit, respectively. This ^ or $ also counts as a character when determining the longest match.

The replacer should not contain units of the scope level or higher. Unless the paranoid option is set, this is tolerated, but the replacer is truncated at the first of such characters.

With the phone target, this rule type will drop the internal structure of the replaced text as soon as a match is found. In other words: an affected scope unit with a replacer is re-parsed as any other plain text. With any other target the original structure is always kept.

Infinitely looping substitutions are currently reported as an error condition.

As this rule type should not be used for trivial tasks with short and often matching dictionaries, the example we shall now give is somewhat involved:

<   word   syll
        regress   \ >m(!_!)   word  word
        regress   \ >d(!_!)   word  word
        regress   \ >t(!_!)   word  word
        regress   \ >q(!_!)   word  word
        regress   \ >p(!_!)   word  word
subst   "^mmmmm,mmmxm  ^mmmmmm,mmmxmm  ^mmmmmmm,mmmmxmm \
         pmm,pxm  qmmm,qxmm  pmmm,pxmm  tmmmm,tmmxm  qmmmm,qxmxm \
         dmmmmm,dmmxmm  tmmmmm,tmmxmm  qmmmmm,qxmxmm  pmmmmm,pxmmxm \
         dmmmmmm,dmmxmmm  tmmmmmm,tmmxmmm  qmmmmmm,qxmmxmm \
         pmmmmmm,pxmmxmm  dmmmmmmm,dmmxmmxm  tmmmmmmm,tmmxmmxm \
         qmmmmmmm,qxmmmxmm  pmmmmmmm,pxmmmxmm \
         mmmmmmmm,mmmmxmmm"                                 colon  word
postp   "m"               word  word

The purpose of this example sequence of rules is to form stress units out of graphical words, based on the following assumptions for the given language: polysyllables are retained as stress units, but following monosyllables may be merged to them; monosyllables which are colon-final should be retained; other monosyllables may merge to each other and/or to the preceding polysyllable; the merges should not produce too long stress units.

The first part of the example is used to mark all non-colon-final (more exactly: space delimited) words with the letters m,d,t,q,p based on the number of syllables; note that the p is used not only for pentasyllables, but also for all words of more than five syllables. Then the substition rule is used to relabel some monosyllables (destined as heads of stress units consisting solely of monosyllables) with x. Finally, all monosyllables that haven't been relabeled to x are merged to the preceding stress word if there is any using the postp rule.

The substitution rule in the example has 21 dictionary items, the first three being applicable only at the colon-initial position. Mostly it directly lists the resulting labeling for the whole colon, but with extremely long sequences of monosyllables it relies on the facts that the longest matching replacee (and the rightmost one if there are multiple) is chosen and that the substitution process is iterated. For example, pmmmmmmmmm would be first relabeled to pmmmmmxmmm using the last item as listed in the dictionary and then once more using a different item to pxmmxmxmmm.

Type prep

Preposition. If the scope unit is identical to some replacee, it gets replaced with its respective replacer and merged to its right-hand neighbor. If there is no such neighbor, nothing happens. As with the subst rule, the target must currently be phone or all the replacers of sizes corresponding to their respective replacees.

Let us take a typical example:

prep   preps.dic

where the referenced file contains a list of prepositions for the language, e.g. for Czech:

pRes  pRez

You can see that most of the prepositions have the replacers identical to the replacees, so that the preposition doesn't change except for being merged to the left if found. There is however one irregular monosyllabic preposition in Czech which does change its behavior with regard to the voicing assimilation in Czech, and this can be done too as shown. Notice also that the unit (here: the word) must match the dictionary item exactly, as opposed to mere substring matching required by the subst rule.

As a special case, if the parameter begins with an exclamation mark, then the rest of the parameter is parsed as usual and any substitutions are performed exactly as usual, but the scope units which get finally merged to their respective right hand neighbors are exactly those which are not found in the dictionary.

A typical example can be the following rule whose purpose is to abolish all syllable boundaries (within each word). The rule defines an empty dictionary and then merges each word which is not found in the dictionary and which has something to be merged to.

prep   !""        syll

Type postp

Postposition. See rule type prep for the description and examples, but the resultant unit is merged to its left-hand neighbor instead of the right-hand neighbor.

Type analyze

This rule type analyzes a unit of level immediately below the scope level into a sequence of units based on a dictionary of known contents of the new units at the target level and priorities assigned to them. We will explain the operation of this rule in terms of the morphematic analysis, i.e. the most common use with the scope level being the word, the result of the analysis will be the morpheme level (just below the word level) and the target will be phones.

Each item of the dictionary corresponds to a single morpheme (some linguists would prefer to say "morph" here). The replacee is the form of the morpheme expected withing the word, the replacer is a numeric value which expresses the "badness" of this particular string.

In addition to these values, the dictionary must also include two additional items, !META_unanal_unit_penalty and !META_unanal_part_penalty. These serve as global parameters of the analysis process.

The rule will split every affected word into morphemes so as to minimize the sum of badnesses of each of the new morphemes. Each such possible analysis may contain parts (morphemes) which have been found in the dictionary and which incur the badness specified there, and also parts which failed to be found in the dictinary. For each part not found, both a per letter penalty (as specified with !META_unanal_unit_penalty) and a per part penalty (as specified with !META_unanal_part_penalty) is added to the total badness.

If two alternative analyses are available with the same total badness, the first (leftmost) part which is not of identical length in both is considered and the analysis with the longer one is chosen.

Usually the global parameters are set so high that the algorithm will resort to an analysis to morphemes found in the dictionary whenever at least one is possible, and also to one which consists of a smaller number of morphemes and also one which avoids morphemes, which are known to the dictionary, but repelled by larger badness values.

An example fragment of the dictionary file may look like this:

!META_unanal_unit_penalty       100
!META_unanal_part_penalty       150
cow     5
cows    5
s       5
slip    5
lip     5

In this case, the word cowslip will not be analysed as cow-s-lip, because that would yield badness 15, while there are two analyses with badness 10, namely cows-lip and cow-slip. Here the first (and wrong) one will be taken due to the tie breaking rule, which will probably bring about voiced pronunciation of the fricative. Several alternative changes to the dictionary can be proposed to avoid the misguided morpheme boundary inside slip. No cows in the dictionary, an explicit cowslip item with badness less than 10, or decreasing the badness of slip can serve as examples.

In the unchanged case, the word gas will be analysed as ga-s with total badness 355, as the only alternative analysis gas has total badness 450: that is, one unanalysed part consisting of three unanalysed units (letters). A solution would be, for example, to increase the badness of the miniature morpheme s to a value somewhere between 101 and 249; with this adjustment it will still be cheaper than an isolated unanalysed letter s in an otherwise known context, but it will not be recognized at the border of an unanalysed context.

Type prosody

This rule type is a prosody modeling rule which uses a dictionary of prosodic adjustments to be applied. More details below.

Type segments

You don't want to read about this rule type, unless you are preparing a new voice for a synthesizer with the traditional segment-level interface based on a newly structured speech segment inventory.

Setup the segment layer below the phone layer. The parameter names a file, which contains phone to segment mappings, again in the dictionary format. The replacees each represent a three character segment identifier, the replacers are the respective segment codes (decimal). It is possible, and indeed typical to include multiple identifiers for the same segment number.

The middle character denotes the phone the resulting segment will be assigned to. The left hand and right hand characters may either be a question mark, or they may specify the right hand and/or left hand neighbors to match a specific character. The question mark is therefore a kind of wildcard.

If both fully specified and partly specified segments exist for a given triplet of phones, they will be placed from left to right in this order: lt?, ?t?, ?tr, ltr.

A sentence may contain these segment with the Czech diphone inventory by Tomas Dubeda:

    p       l       o       u       t       e       f
  0p?   pl? ?lo   ?o? ou?  ?u?    ut? ?te  ?e?    ef? ?f0

or, with the traditional Czech segment inventory:

    p       l       o       u       t       e       f
 0p? ?pl  pl? ?lo   ?o? ou?  ?u?    ut? ?te  ?e?    ef? ?f0

(In this second example, for instance the diphones ?pl and ?pt would actually share the segment number and would correspond to the p-any consonant diphone.)

There are more possibilities for representing a segment inventory; it is necessary to decide for the major diphone types, whether they should live in their initial or final sound. That is unfortunate, but it is the way it is.

It is possible to repeat a segment a few times. This effect can be controlled by adding 10000 times the number of extra repetitions to the segment number. Therefore,

?e?     20241
generates three identical segments number 241 for the stationary part of the specified vowel.

Type with

This is actually a conditional rule, though it also uses a dictionary. It applies an arbitrary rule upon the units (words) listed in the dictionary. More details below.

3.9 Contentual Rules

The contentual rules manipulate unit contents. That is, they're suitable for implementation of more regular letter-to-sound rules, character replacement and other transformations. They are a magnitude faster than e.g. the more general (and more heavy weight) subst rule, so they should be used whenever possible.

Type regress

Assimilation, elision or other mutation of phones or other units depending on their immediate environment. The parameter is of the form o>n(l_r), where o,n,l,r are arbitrary strings. The semantic is "change tokens in o to their corresponding tokens in n whenever the left neighbor is in l and right one is in r". The first two strings should therefore either be of equal length, or n should be a single character, with the obvious interpretations of "corresponding".

The zero character (00) may be included in any of the strings; it means "no element", and it can be used to insert new units, delete the old ones, and to limit the change to the beginning or the end of the scope unit, respectively. On the other hand, if the contents of some unit is literal 0 before the application of this rule, it will stay untouched. This special meaning of 0 with this rule type can be suppressed by escaping.


        regress  0>'(0_aeiou)  word  phone
inserts the apostrophe before the vowels listed at the beginning of a word.
        regress  $voiceless>$voiced(!_$voiced)  word  phone
assimilates voiceless consonants to their voiced counterparts (assuming $voiced and $voiceless have been defined previously), when they're followed by a voiced consonant. The change proceeds from the right to the left, therefore ppb will change to bbb. See below for the explanation of the exclamation mark (here: "everywhere").

Type progress

As above, but the change proceeds from left to right. In the second example for the regress rule, the result would be pbb if progress was employed.

3.10 Structural Rules

The structural rules can be used to restructuralize the text. They usually interact with multiple levels of description simultaneously.

Type raise

Move a unit to a higher level of description, e.g. when a segment level unit should directly affect the prosody. The parameter is of the form from:to (from and to are arbitrary strings, and they can employ the except operator (exclamation mark). The tokens in from, if found at the target level, are copied to the scope level, if the original scope token is listed in to. It is also possible to omit the colon and the to string; the default interpretation is "everywhere".

This rule is usually found as a link between rules operating on different levels. For example, suppose we want to split every colon before any occurence of one of the words nebo and anebo:

with  "nebo  anebo"  word
        regress 0>\X(0_!)
raise   \X:!            word    phone
syll    \X<\ _               colon   word
regress \X>\ (!_!)      colon   word
regress \X>0(!_!)

Having inserted an internal pseudocharacter \X at the phone level at the beginning of each of the words listed in the dictionary used by the with rule, we raise this pseudocharacter to the word level and treat it as the least "sonorous" element with the following "syllabification" (splitting) rule. The last two rules perform a simple clean-up - they change all word level occurences of the pseudocharacter to a space and delete all phone level occurences thereof.

Type syll

Roughly speaking, this rule type can be used to split words into syllables according to the theory of sonority, i.e. at the least sonorous phones.

More generally, it is used to do any sort of inserting unit boundaries depending on local extremes of a simple metric defined at target units. A split occurs at the scope level unit, and, whenever necessary, at all levels between the scope and the target units.

The parameter is an ordering of the target units (typically, phones), starting from the extremal (least sonorous) ones, with groups of equal status (equal sonority) delimited by <


        syll  0<ptkf<bdgv<mnN<lry<aeiou"  syll  phone
inserts the following (and other) syllable boundaries:
  a|pa  ap|pa  ap|ppppa  arp|pa  ar|pra  a|pr|pa

Tokens not listed are considered least sonorous, order of tokens within the same sonority group (see the example) is irrelevant. It is not possible to use the except operator with this rule type.

As you can see from the example, the syllable boundaries are inserted exactly once per every sequence of equivalent target units (e.g. equisonorous phones) such that both preceding and following target units of the group have higher sonority, and they're inserted either between the first and second element of the group, or, if the group consists of a single unit, before that unit.

This semantics is suitable for the syllabification task in all languages known to us where syllabification is not primarily morphologically based, but this rule type can also be used for other tasks involving a unit split as some point defined by its contents, e.g. splitting a higher level prosodic unit before or after certain words, as shown in the example to the raise rule. The authors are eager to hear from you if you'd prefer an extension or simplification of this rule type or if you can comment on automated syllabification issues over a wide range of languages.

3.11 Prosody Modeling Rules

The utterance prosody is modeled in Epos by assigning values for the following prosodic quantities of individual text structure units (possibly at multiple levels of description):

Currently, these are values per cent, 100 being the neutral value.

Epos doesn't currently provide sets of segment inventories for multiple pitch ranges, therefore extreme values, such as 15 or 1500 may sound very unnatural.
The prosody adjustments at different levels sum up for the actual values assigned to the generated segments. For example, a phone with the frequency (pitch) value of 130 in a word with the value of 120 will contain segments (after the segments rule is applied) with frequency of 150. Alternatively, it is possible to multiply the values for pitch, volume and duration instead, by setting the pros_eff_multiply_f, pros_eff_multiply_i and pros_eff_multiply_t options, respectively. It is also possible to change the neutral value of 100 to a different base value with the f_neutral, i_neutral and t_neutral options.

Type contour

This rule assigns a specified prosody contour to units at some level of description within a unit which consists of them. For example, the rule can be used to assign pitch contours to stress units; individual values will probably be assigned to syllables.

The parameter describes a single prosody contour. The first letter denotes the prosodic quantity (frequency, intensity or duration) to be specified; the second is a slash; the adjustments follow as colon-separated decimal integers. For an example,

        contour   f/+2:+0:-2   word   syll
assigns a falling pitch contour to a trisyllabic word. The number of syllables in a word, or, more generally, of the target units in a scope unit, must match the number of adjustments specified in a contour rule, otherwise an error occurs; consider the length-based selection of rules to ensure that. As an exception to that, it is possible to specify padding in the contour. At most one adjustment may be immediately followed by an asterisk. This adjustment will be used for zero or more consecutive target units as necessary to stretch the contour over the scope unit.

Type prosody

Individual prosodic feature generation. (See also the contour rule for assigning whole contours more conveniently.)

Typically, there will be many instances of this rule in the rules file, each of which will use a different configuration file for different purpose (e.g. one may handle word stress, another one the sentence-final melody of wh- questions, another one semantic emphasis corresponding to an exclamation mark). The parameter of this rule is the name of a file formatted as a dictionary (see dictionary-oriented rules) and is further specified here.

Each prosodic adjustment occupies one line; it affects exactly one of frequency, intensity and duration (T, I, or F, respectively) of units positioned among others as specified. Their ordering is insignificant, because each of them affects different units or a different quantity of them.

The structure of an adjustment is very simple, so let's just pick an example: i/3:4 -20. The first letter must be one of T, I, F and specifies the quantity that may be adjusted; the first number specified denotes the position within a unit whose length is to be equal to the second number: here, the rule applies at every third syllable of every tetrasyllable, provided that the target of the rule is syllable, while the scope is word (this is specified in the rules file as usual, not in the prosody file). The last number, separated by whitespace, is the intensity adjustment to be added everywhere this specification applies. It is an integer value.

It is also possible to have an adjustment applied for any length of the scope unit (in the example above, for words of any number of syllables. To do this, use "*" as the second number of the adjustment. Also, it may make sense to count the target unit starting at the end of the scope unit; in this case append the word "last" to the first number. An example could be f/1last:* -30, or "drop the pitch by 30 for last syllables of every word". Consequently, at most three distinct rules may affect a unit; if that happens, only one is chosen -- the more specific one, or, if both contain the asterisk, the one counting from the beginning is chosen. An example, in order of decreasing precedence:

   t/1:2     +30
   t/1:*     +20
   t/2last:*  +5

You can therefore override general adjustments with exceptions for some lengths which have to be handled separately.

If multiple prosodic rules (using their own files) supply adjustments for a certain unit, the adjustments are summed.

It is important to understand the difference between e.g. a syllable and its phones: the syllable can have an entirely different prosodic value than its phones; for every given segment, the value for any prosodic quantity is obtained by totalling the values for all of higher levels units it is contained in. This independence of levels of description might theoretically be useful for modeling tone languages.

Type smooth

Smoothing out of one of the F,I,T quantities. The parameter is

where the left_weights, if there are multiple ones, shall be slash separated, the right_weights shall be backslash separated. The new value of the quantity specified for any target is computed as a weighted average of the values for the surrounding units at the same level. If the target is too near to the scope boundary to have enough neighbors in some direction, the value for the last unit in that direction instead.


        smooth  i/10/20/40\20\10  word  syll
applied to the second word un-ne-ce-ssa-ry will adjust intensity values for all of the syllables. E.g. the second syllable will be computed as 0.3 x i("un") + 0.4 x i("ne") + 0.2 x i("ce") + 0.1 x i("ssa")

The computations for different units do not interfere. The weights can also be specified as negative quantities and/or as sums of more values. This permits linear parameterization of the rules.

The smooth rule has also an unavoidable side effect. If (some of) the prosodic adjustments are assigned at the word level, for example, and smoothing should take place at the syllable level, it is first necessary to move the prosodic information down to the syllable level. It is done by adding the quantity found at the word level to every contained syllable and by removing it from the word level at all. The unit::project method is responsible for that; it is called before the actual smoothing. Prosodic adjustments existing at lower levels than is the one being smoothened are ignored by the smooth rule.

3.12 Composite Rules

Multiple rules are occasionally necessary where there are syntactical placeholders for a single rule only. Or, several rules have to be grouped in a certain way -- for example, when one rule has to be chosen nondeterministically out of a set of rules. To satisfy these needs, Epos rules include three types of composite rules with different semantics. A composite rule is syntactically treated a single rule.

Blocks of Rules

A block is a sequence of rules enclosed within braces ("{" and "}"). Both the opening and the closing brace follow the rule syntax, but they take no parameters except for an optional scope specification. The block is treated as a single rule, which is useful especially with conditional rules:

if   condition
        do   this
        do   that

The rules are applied sequentially, as you would expect, for every unit of the proper size as given by the scope of the opening brace. This means that every word (if the scope is word) is processed separately throughout all the rules in the block. This involves some splitting of execution on entering the block. By default, no such splitting is done and the block inherits its scope from its master rule (a conditional rule, a block it is encapsulated in, or the global implicit block which covers all the rules altogether). Consequently, the scope of any enclosed rule may not be larger than the scope of the block.

Any macros defined in the block are local to the block. The semantic details are C-like and are by no means important.

Choices of Rules

A choice is a sequence of rules enclosed within brackets ("[" and "]"). Both the opening and the closing bracket follow the rule syntax, but they take no parameters except for possible scope specification. The choice is treated as a single rule.

Whenever the choice is applied, one of its subordinate rules is chosen at random for every unit of the proper size as given by the scope of the opening brace, and only this rule is applied.

Generally, choices behave like blocks; the main difference is that with blocks, all of the rules are applied, whereas with choices, exactly one of them gets applied (possibly different rules for different pieces of the text processed).

Empty choices (with no rules within) are not tolerated, contrary to empty blocks.

Length-based Selection of Rules

A (length-based) switch is a sequence of rules enclosed within angle brackets ("<" and ">"). Both the opening and the closing bracket follow the rule syntax, but they take no parameters except for possible scope and target specification. The switch is treated as a single rule.

Whenever the switch is applied to a scope unit, target units contained within are counted. If n units are found, the n-th rule in sequence of the subordinate rule is applied.

If there is less than n rules available, the last one will be used. You can avoid this behavior by specifying "nothing" after the last rule.

An example is supplied inside the example for the subst rule.

Repeated Rules and Choice Probabilities

Write "3x" before a rule to repeat it three times (in a block) or to make it three times more probable (in a choice):

        3x prosody              typical.dic
           prosody              variant.dic

(The first alternative now has 75% of being chosen, while the other one is left for the remaining 25%.)

The repeat count must be a positive integer. You can not use this feature just after conditional rules, because repeated rules are not counted as a single rule for syntactic purposes:

        if  $something
                2x   regress   0>x(!_!)   #...wrong!

You should rewrite this to

        if  $something
                2x   regress   0>x(!_!)

Huge integers (like one million) are disallowed. This is because the current implementation needs a few bytes of memory (one pointer) per every repetition.

3.13 Conditional Rules

The conditional rules execute the following rule if and only if a condition is met. The condition is specified as the parameter, the following (conditioned) rule is given on a separate line (or lines, if e.g. a composite rule follows). (Comments, whitespace and empty lines may intervene as usual.) It is not syntactically necessary to indent the conditioned rules with whitespace, but it is strongly recommended for readability.

The conditioned rule is syntactically considered to be a part of the conditional rule.

Type inside

Apply a rule or a block of rules within certain units only. The parameter is a list of values at the scope level, wherein the following rule should be applied; the except operator may be used.

Every unit (a sentence, for example), which fulfills the criterion, is processed separately, therefore the scope of the following rule may be at most that of the inside rule itself.


if    phr_break
        regress   0>\#(!_0)     colon
        inside  \#      phone
                contour   t/-65   phone

This example takes action only if the phr_break variable is set. The action is to insert a hash character (representing a pause) to the phone level at the end of every colon, and to affect pro prosodic values of the new character, so that the pause is sufficiently short. Notice the necessary escaping of the hash character so as not to confuse it with a comment-out character.

Type near

Apply a rule or a block of rules within units which contain at least one of the specified units. The parameter is a list of values at the target level, which are looked up in a unit of the scope level; the except operator may be used. If an occurence is found, the following rule gets applied to the scope level unit.

If the parameter begins with an asterisk, the asterisk is treated as an except operator and the test is negated. In other words, the following rule gets applied, if every target level unit contained meets the set description with the leading asterisk ignored. You can combine asterisk and an extra except operator to get tests of the "contains no characters of this class" type.


near    *!$vowel
        regress $lowercase>$uppercase(!_!)
        subst   spellout.dic

A fragment of this kind can be used to spell out all words which contain no vowels (and are thus supposedly unpronounceable). The referenced dictionary spellout.dic should contain the spelled out equivalents for each upper case letter. The shift of the word to the upper case may look puzzling, but it is actually only a technical trick to prevent the spell-out phrases (which are supposedly listed in lower case) to be spelled out themselves.


near    *$vowel
operates only on words consisting solely of vowels;
near    $vowel
operates on words which contain at least one vowel and
near    !$vowel
operates on words which contain at least one non-vowel.

Type with

Apply a rule or a block of rules for listed units. In contrast with the preceding rule type, this refers not only to the token at the scope level (such as space), but to the whole structure (such as the string of phones delimited by the space).

The parameter is a dictionary filename or a quoted dictionary; it should list the strings subject to the following rule, such as special words. All the details concerning the syntax of the parameter are exactly the same as with other dictionary oriented rules and a simple example is given at the raise rule.

(Advanced users: replacers can be specified in the dictionary and they will be used to replace the replacee as with any other dictionary-oriented rule, but the replacement process will not be iterated.)

The parameter can optionally be prefixed by an exclamation mark, in which case the subordinate rule will be applied exactly to those units which did not match instead of those which did.

An example of how to apply a block of rules to all words except the words "exception" and "resistant":

with    !"exception  resistant"     word

Type if

Apply a rule or a block of rules only if a condition (given by the parameter) is met. The condition must currently be specified as a boolean voice configuration option (possibly a soft option) or its negation (i.e. prefixed with an exclamation mark).


if   !colloquial

The rules within the block will be applied only if the colloquial option is not set.

This if rule inherits its scope from its parent rule if not specified explicitly.

Again, the scope of a subordinate rule may not be larger than that of the if rule itself.

3.14 Special Rules

Type regex

Regular expression substitution. The parameter is of the form /regular_expression/replacement/. This rule type is similar to subst with only one dictionary item, but it is way more powerful and more arcane; its use is not intended for end wizards nor trivial tasks. For a regular expressions' overview, UNIX users can consult e.g. the grep manual page, whereas Windows users can telnet to a nearby UNIX machine and write man grep there.

Epos uses the extended regular expression syntax with the following difference: in "regular" regular expressions, parentheses match themselves, while the open group and close group operators are \( and \), respectively. As we use groups heavily and next to none real parentheses, we decided to do it the other way round. Also, sed users may be surprised by the iterative behavior of the regex rule type in Epos.

The replacement may contain escape sequences referring to the match of the n-th group within the regular expression: \1 to \9. \0 represents the entire match, but this is probably unusable under the current design, as this would cause an infinite substitution loop.

In order to use this type of rule, you need to have the rx or regex library already installed and have WANT_REGEX enabled in common.h. This is because we don't actually implement the regex parsing stuff; we leave it to your OS libraries. In case you don't have such libraries installed, we use the glibc implementation (rx.c in the Epos distribution).

Note that if your system doesn't support locale setting nor provides a usable regex library, you can't use named character classes such as [:upper:] in your regular expressions. This is the case on Windows CE.

Type debug

Debugging information during the application of the rules. Scope and target are ignored, the parameter is parsed lazily.

Parameter "elem": dump the current state of the text being processed Parameter "pause": wait until keypress

Next Previous Contents