![]() |
![]() |
![]() |
Dee Reference Manual | ![]() |
---|---|---|---|---|
Top | Description | Object Hierarchy |
#include <dee.h> struct DeeAnalyzer; struct DeeAnalyzerClass; gchar * (*DeeCollatorFunc) (const gchar *input
,gpointer data
); void (*DeeTermFilterFunc) (DeeTermList *terms_in
,DeeTermList *terms_out
,gpointer filter_data
); void dee_analyzer_add_term_filter (DeeAnalyzer *self
,DeeTermFilterFunc filter_func
,gpointer filter_data
,GDestroyNotify filter_destroy
); void dee_analyzer_analyze (DeeAnalyzer *self
,const gchar *data
,DeeTermList *terms_out
,DeeTermList *colkeys_out
); gint dee_analyzer_collate_cmp (DeeAnalyzer *self
,const gchar *key1
,const gchar *key2
); gint dee_analyzer_collate_cmp_func (const gchar *key1
,const gchar *key2
,gpointer analyzer
); gchar * dee_analyzer_collate_key (DeeAnalyzer *self
,const gchar *data
); DeeAnalyzer * dee_analyzer_new (void
); void dee_analyzer_tokenize (DeeAnalyzer *self
,const gchar *data
,DeeTermList *terms_out
);
A DeeAnalyzer takes a text stream, splits it into tokens, and runs the tokens through a series of filtering steps. Optionally outputs collation keys for the terms.
One of the important use cases of analyzers in Dee is as vessel for the indexing logic for creating a DeeIndex from a DeeModel.
The recommended way to implement your own custom analyzers are by either
adding term filters to a DeeAnalyzer or DeeTextAnalyzer instance with
dee_analyzer_add_term_filter()
and/or
derive your own subclass that overrides the dee_analyzer_tokenize()
method.
Should you have very special requirements it is possible to reimplement
all aspects of the analyzer class though.
struct DeeAnalyzer;
All fields in the DeeAnalyzer structure are private and should never be accessed directly
gchar * (*DeeCollatorFunc) (const gchar *input
,gpointer data
);
A collator takes an input string, most often a term produced from a DeeAnalyzer, and outputs a collation key.
|
The string to produce a collation key for |
|
User data set when registering the collator. [closure] |
Returns : |
The collation key. Free with g_free() when done
using it. [transfer full]
|
void (*DeeTermFilterFunc) (DeeTermList *terms_in
,DeeTermList *terms_out
,gpointer filter_data
);
A term filter takes a list of terms and runs it through a filtering and/or set of transformations and stores the output in a DeeTermList.
You can register term filters on a DeeAnalyzer with
dee_analyzer_add_term_filter()
.
|
A DeeTermList with the terms to filter |
|
A DeeTermList to write the filtered terms to |
|
User data set when registering the filter. [closure] |
void dee_analyzer_add_term_filter (DeeAnalyzer *self
,DeeTermFilterFunc filter_func
,gpointer filter_data
,GDestroyNotify filter_destroy
);
Register a DeeTermFilterFunc to be called whenever dee_analyzer_analyze()
is called.
Term filters can be used to normalize, add, or remove terms from an input data stream.
|
The analyzer to add a term filter to |
|
Function to call. [scope notified] |
|
Data to pass to filter_func when it is invoked. [closure]
|
|
Called on filter_data when the DeeAnalyzer
owning the filter is destroyed. [allow-none]
|
void dee_analyzer_analyze (DeeAnalyzer *self
,const gchar *data
,DeeTermList *terms_out
,DeeTermList *colkeys_out
);
Extract terms and or collation keys from some input data (which is normally, but not necessarily, a UTF-8 string).
The terms and corresponding collation keys will be written in order to the provided DeeTermLists.
Implementation notes for subclasses:
The analysis process must call dee_analyzer_tokenize()
and run the tokens
through all term filters added with dee_analyzer_add_term_filter()
.
Collation keys must be generated with dee_analyzer_collate_key()
.
|
The analyzer to use |
|
The input data to analyze |
|
A DeeTermList to place the generated terms in.
If NULL to terms are generated. [allow-none]
|
|
A DeeTermList to place generated collation keys in.
If NULL no collation keys are generated. [allow-none]
|
gint dee_analyzer_collate_cmp (DeeAnalyzer *self
,const gchar *key1
,const gchar *key2
);
Compare collation keys generated by dee_analyzer_collate_key()
with similar
semantics as strcmp()
. See also dee_analyzer_collate_cmp_func()
if you
need a version of this function that works as a GCompareDataFunc.
The default implementation in DeeAnalyzer just uses strcmp()
.
|
The analyzer to use when comparing collation keys |
|
The first collation key to compare |
|
The second collation key to compare |
Returns : |
-1, 0 or 1, if key1 is <, == or > than key2 . |
gint dee_analyzer_collate_cmp_func (const gchar *key1
,const gchar *key2
,gpointer analyzer
);
A GCompareDataFunc using a DeeAnalyzer to compare the keys. This is just
a convenience wrapper around dee_analyzer_collate_cmp()
.
|
The first key to compare |
|
The second key to compare |
|
The DeeAnalyzer to use for the comparison |
Returns : |
-1, 0 or 1, if key1 is <, == or > than key2 . |
gchar * dee_analyzer_collate_key (DeeAnalyzer *self
,const gchar *data
);
Generate a collation key for a set of input data (usually a UTF-8 string passed through tokenization and term filters of the analyzer).
The default implementation just calls g_strdup()
.
|
The analyzer to generate a collation key with |
|
The input data to generate a collation key for |
Returns : |
A newly allocated collation key. Use dee_analyzer_collate_cmp() or
dee_analyzer_collate_cmp_func() to compare collation keys. Free
with g_free() . |
void dee_analyzer_tokenize (DeeAnalyzer *self
,const gchar *data
,DeeTermList *terms_out
);
Tokenize some input data (which is normally, but not necessarily, a UTF-8 string).
Tokenization splits the input data into constituents (in most cases words), but does not run it through any of the term filters set for the analyzer. It is undefined if the tokenization process itself does any normalization.
|
The analyzer to use |
|
The input data to analyze |
|
A DeeTermList to place the generated tokens in. |