Automated Abstracts

Chapter 12-13

Automated Abstracts

The C API routine Abstract allows a C API program to use the automated abstract engine used by the @Abstract function. The automated abstract engine can select the important pieces of a message, based on a number of different criteria; abbreviate words to reduce the length of a piece of text; and remove unnecessary components of messages, such as mail headers or unneeded spaces. Please see the @Abstract documentation in the the HCL Domino Designer Help database for additional requisites, such as using abbreviation text files. These requisites apply to both the @Abstract function and the Abstact C API.

The Abstract routine is given the text to abstract, a maximum length for the abstract, and a set of commands that determine how to process the text. Abstract can process only single-byte characters; multi-byte characters give unpredictable results.

"Chunks"

The text to be abstracted is treated as a series of "chunks" of text. There are three types of chunks: text, mail headers, and punctuation. Text chunks are basically sentences: character strings that end with a punctuation character. The punctuation character is not included; it becomes a separate punctuation chunk. Mail header chunks are identified by mail header keywords, followed by a colon, followed by a white space character (for example, "Subject: ", "From: ", or "To: ").

Parameters

A number of internal parameters control the processing and output format of Abstract. You can set these parameters by including a parameter assignment in the command string. Parameter assignments consist of a parameter keyword, an equals sign, and the parameter value. Parameter names are not case-sensitive, but case is preserved in string parameter values.

You can use "false", "no", or "0" to set Boolean parameters to False. To set these parameters to True, use "true", "yes", or "1", or simply include the parameter name with no value. The Boolean parameters are:

Trim white space to a single space when abbreviating the text. The default is True.

You can set string parameters to any character string that does not include white space, although ChunkSep is a special case. The string parameters are:

This string will be written after all chunks have been output. The default is "" (an empty string).

Commands

You can pass any number of commands (along with parameter settings) to Abstract in the null-terminated string argument szKeywords. Commands and parameter settings are delimited by whitespace chacters. The supported commands are:

The words in the text are tallied and a "significance" value is computed for each word.

Save the current state of the abstraction engine on an internal stack, including the current state of the text. You can use the restore command to restore these saved states.

Restore the most recently saved state of the abstraction engine. If no states have been stored, this command has no effect.

Determine whether the abstract will fit into the output buffer space provided. If it fits, the current state of the document is written to the output buffer and Abstract returns. Any remaining commands are ignored.

Apply the abbreviation rules to the text.

Disable use of the "stop list," which is a list of words that are normally insignificant, such as "the," "of," and "and."

Disable use of the significant word list, which is a list of words that are normally significant, such as "urgent" "important," or "priority."