public class DefaultAnalyzer extends AbstractAnalyzer
| Modifier and Type | Field and Description |
|---|---|
protected static java.lang.String |
ANALYZER_TYPE |
protected static java.lang.String |
ATTRIBUTE_EXCLUDE_STEMS
The attribute indicating the use of exclusion stem words or not.
|
protected static java.lang.String |
ATTRIBUTE_USE_STOP_WORDS
The attribute indicating the use of stop words or not.
|
static java.lang.String[] |
DEFAULT_STOP_WORDS
An array containing some common English words that are not usually useful for searching.
|
protected java.lang.String |
EXCLUDE_STEM_ELEMENT
String representation of the element name in the analyzer config file
|
protected java.lang.String |
EXCLUDE_STEMS_ELEMENT
String representation of the element name in the analyzer config file
|
protected java.util.Set |
excludeTable
The table for stemming exclusions
|
protected java.util.Set |
stopTable
The list of stop words used.
|
logger| Constructor and Description |
|---|
DefaultAnalyzer()
Builds a default analyzer.
|
| Modifier and Type | Method and Description |
|---|---|
protected java.util.Set |
buildExcludeTable(org.apache.avalon.framework.configuration.Configuration conf)
Builds a stop word table from a configuration.
|
protected java.util.Set |
buildStopTable(org.apache.avalon.framework.configuration.Configuration conf)
Builds a stop word table from a configuration.
|
void |
configure(org.apache.avalon.framework.configuration.Configuration configuration)
Configures this analyzer.
|
protected java.lang.String |
getAnalyzerType() |
protected java.lang.String[] |
getDefaultStopWords()
Returns a default list of stop words.
|
org.apache.lucene.analysis.TokenStream |
tokenStream(java.io.Reader reader)
Deprecated.
use tokenStream(String, Reader) instead.
|
org.apache.lucene.analysis.TokenStream |
tokenStream(java.lang.String fieldName,
java.io.Reader reader)
Filters LowerCaseTokenizer with StopFilter.
|
enableLogging, toSAXprotected static final java.lang.String ATTRIBUTE_USE_STOP_WORDS
protected static final java.lang.String ATTRIBUTE_EXCLUDE_STEMS
protected static final java.lang.String ANALYZER_TYPE
protected java.util.Set stopTable
protected java.util.Set excludeTable
protected final java.lang.String EXCLUDE_STEMS_ELEMENT
protected final java.lang.String EXCLUDE_STEM_ELEMENT
public static final java.lang.String[] DEFAULT_STOP_WORDS
public DefaultAnalyzer()
This analyzer will use Lucene's StopAnalyzer.
protected java.lang.String getAnalyzerType()
getAnalyzerType in class AbstractAnalyzerfr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer#getAnalyserType()public void configure(org.apache.avalon.framework.configuration.Configuration configuration)
throws org.apache.avalon.framework.configuration.ConfigurationException
The class will search for <stopWord> elements and use them as a stop word list. If none is found or the configuration object is null, the default list wi be used.
If the top-level element <cconfiguration> has a false value for its useStopWords attribute, no stop words will be used.
configure in interface org.apache.avalon.framework.configuration.Configurableconfigure in class AbstractAnalyzerorg.apache.avalon.framework.configuration.ConfigurationExceptionpublic org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName,
java.io.Reader reader)
tokenStream in interface AnalyzertokenStream in class org.apache.lucene.analysis.Analyzerprotected java.util.Set buildStopTable(org.apache.avalon.framework.configuration.Configuration conf)
throws SDXException,
org.apache.avalon.framework.configuration.ConfigurationException
conf - The configuration to use.SDXExceptionorg.apache.avalon.framework.configuration.ConfigurationExceptionprotected java.lang.String[] getDefaultStopWords()
protected java.util.Set buildExcludeTable(org.apache.avalon.framework.configuration.Configuration conf)
throws org.apache.avalon.framework.configuration.ConfigurationException
conf - The configuration to use.org.apache.avalon.framework.configuration.ConfigurationExceptionpublic org.apache.lucene.analysis.TokenStream tokenStream(java.io.Reader reader)
Analyzer.tokenStream(java.io.Reader)Copyright © 2000-2010 Ministere de la culture et de la communication / AJLSM. All Rights Reserved.