Automated Abstracts
Chapter 12-13
Automated Abstracts
The C API routine Abstract allows a C API program to use the automated abstract engine used by the @Abstract function. The automated abstract engine can select the important pieces of a message, based on a number of different criteria; abbreviate words to reduce the length of a piece of text; and remove unnecessary components of messages, such as mail headers or unneeded spaces. Please see the @Abstract documentation in the the HCL Domino Designer Help database for additional requisites, such as using abbreviation text files. These requisites apply to both the @Abstract function and the Abstact C API.
The Abstract routine is given the text to abstract, a maximum length for the abstract, and a set of commands that determine how to process the text. Abstract can process only single-byte characters; multi-byte characters give unpredictable results.
"Chunks"
The text to be abstracted is treated as a series of "chunks" of text. There are three types of chunks: text, mail headers, and punctuation. Text chunks are basically sentences: character strings that end with a punctuation character. The punctuation character is not included; it becomes a separate punctuation chunk. Mail header chunks are identified by mail header keywords, followed by a colon, followed by a white space character (for example, "Subject: ", "From: ", or "To: ").
Parameters
A number of internal parameters control the processing and output format of Abstract. You can set these parameters by including a parameter assignment in the command string. Parameter assignments consist of a parameter keyword, an equals sign, and the parameter value. Parameter names are not case-sensitive, but case is preserved in string parameter values.
You can use "false", "no", or "0" to set Boolean parameters to False. To set these parameters to True, use "true", "yes", or "1", or simply include the parameter name with no value. The Boolean parameters are:
- ab-usedict=
- Use the abbreviations dictionary to identify abbreviations. The default is True.
- Remove vowels from words when abbreviating the text. The default is False.
- Remove the first vowel from each word (even if it's the first letter) when abbreviating the text. The default is False.
- Trim all white space from around punctuation when abbreviating the text. The default is False.
- Trim white space to a single space when abbreviating the text. The default is True.
You can set string parameters to any character string that does not include white space, although ChunkSep is a special case. The string parameters are:
- ChunkBegin=
- This string will be written at the beginning of the output buffer. The default is "" (an empty string).
- This string will be written at the end of each chunk, to delimit the chunks. The default is " " (a single space). Three special values are supported:
- space - Use a single space
lf - Use a linefeed character
crlf - Use a carriage return/linefeed pair
- This string will be written after all chunks have been output. The default is "" (an empty string).
Commands
You can pass any number of commands (along with parameter settings) to Abstract in the null-terminated string argument szKeywords. Commands and parameter settings are delimited by whitespace chacters. The supported commands are:
- textonly
- Delete all mail header and punctuation chunks. Subsequent commands will have only text chunks on which to operate.
- The words in the text are tallied and a "significance" value is computed for each word.
save
- Save the current state of the abstraction engine on an internal stack, including the current state of the text. You can use the restore command to restore these saved states.
restore
- Restore the most recently saved state of the abstraction engine. If no states have been stored, this command has no effect.
tryfit
- Determine whether the abstract will fit into the output buffer space provided. If it fits, the current state of the document is written to the output buffer and Abstract returns. Any remaining commands are ignored.
abbrev
- Apply the abbreviation rules to the text.
sortchunks
- Sort the chunks according to a "significance value." The significance of a chunk is a function of the number of words in the chunk, the significance values of the words in the chunk, and the type and position of the chunk.
If you used the countwords command to compute word significance values, the significance values for the words are added and the total is used for the significance of the chunk.
nostoplist
- Disable use of the "stop list," which is a list of words that are normally insignificant, such as "the," "of," and "and."
nosiglist
- Disable use of the significant word list, which is a list of words that are normally significant, such as "urgent" "important," or "priority."