|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Interface Summary | |
---|---|
ClassifiableContentIF | INTERNAL: Interface that holds the identifier and the actual content of a classifiable resource. |
ClassifyPluginIF | INTERNAL: Interface implemented by code that is able to locate classifiable content for topics. |
DelimiterTrimmerIF | INTERNAL: |
DocumentAnalyzerIF | INTERNAL: |
FormatModuleIF | INTERNAL: Interface that encapsulates the support for a given document format. |
HttpServletRequestAwareIF | INTERNAL: Interface implemented by ClassifyPluginIFs that want access to the current HTTP request in a servlet environment. |
TermAnalyzerIF | INTERNAL: |
TermNormalizerIF | INTERNAL: |
TermStemmerIF | INTERNAL: A stemmer produces the stem of a word from a form of the word. |
TextHandlerIF | INTERNAL: Callback interface used by format modules to tell the classification framework about the structure of classifiable content. |
TokenizerIF | INTERNAL: |
Class Summary | |
---|---|
AbstractDocumentAnalyzer | INTERNAL: |
BlackList | INTERNAL: |
CharacterAnalyzer | INTERNAL: |
Chew | PUBLIC: Command-line tool for extracting keywords from a document. |
ClassifiableContent | INTERNAL: |
ClassifyUtils | INTERNAL: |
CompoundAnalyzer | INTERNAL: |
ConferencePlugin | INTERNAL: |
DefaultPlugin | INTERNAL: |
DefaultTokenizer | INTERNAL: |
DistanceAnalyzer | INTERNAL: |
Document | INTERNAL: |
DocumentClassifier | INTERNAL: |
DocumentTokenizer | INTERNAL: |
DowncaseNormalizer | INTERNAL: |
FormatModule | INTERNAL: |
FrequencyAnalyzer | INTERNAL: A frequency table giving the frequency with which a particular word is used in a particular language. |
HTMLFormatModule | INTERNAL: |
JunkNormalizer | INTERNAL: |
Language | INTERNAL: Object representing a particular language. |
OOXMLPowerpointFormatModule | INTERNAL: A format module for the OOXML PresentationML format. |
OOXMLWordFormatModule | INTERNAL: A format module for the OOXML WordProcessingML format. |
PDFFormatModule | INTERNAL: |
PlainTextFormatModule | INTERNAL: |
PowerPointFormatModule | INTERNAL: |
RegexpTermAnalyzer | INTERNAL: A term analyzer which recognizes certain kinds of terms using regexps and adjusts their scores accordingly. |
Region | INTERNAL: |
RegionBooster | INTERNAL: |
RelativeScore | INTERNAL: |
SimpleClassifier | PUBLIC: A simple top-level API for classifying content. |
SnowballStemmer | INTERNAL: |
SpecialCharNormalizer | INTERNAL: |
StopList | INTERNAL: A set of words considered "stop words" in a particular language. |
Term | PUBLIC: Represents a concept which occurs in the classified content. |
TermDatabase | PUBLIC: A collection of terms representing the result of classifying a piece of content. |
TextBlock | INTERNAL: |
Token | INTERNAL: |
TokenVisitor | INTERNAL: |
TologRulePlugin | INTERNAL: |
TopicContentPlugin | INTERNAL: Classifier plugin which produces content from the topic itself. |
TopicContentPlugin.TopicAsContent | |
TopicMapAnalyzer | INTERNAL: |
TopicMapAnalyzer.AssociationType | |
TopicMapClassification | INTERNAL: |
Variant | PUBLIC: Represents a form of a term as it occurred in classified content. |
WebChew | INTERNAL: |
WordFormatModule | INTERNAL: A format module for the old binary Word format. |
XMLFormatModule | INTERNAL: |
To classify content, use the SimpleClassifier class. Note that most of the APIs are INTERNAL, and so may change at any time.
If you need more flexibility, it is possible to use the INTERNAL APIs directly. Below is example code showing how to output a ranked list of the terms found in a particular document.
// load the topic map TopicMapIF topicmap = ImportExportUtils.getReader(args[0]).read(); // create classifier TopicMapClassification tcl = new TopicMapClassification(topicmap); // read document ClassifiableContentIF cc = ClassifyUtils.getClassifiableContent(args[1]); // classify document tcl.classify(cc); // dump the ranked terms TermDatabase tdb = tcl.getTermDatabase(); tdb.dump(50);
|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |