Package net.ontopia.topicmaps.classify

To classify content, use the SimpleClassifier class.

See:
          Description

Interface Summary
ClassifiableContentIF INTERNAL: Interface that holds the identifier and the actual content of a classifiable resource.
ClassifyPluginIF INTERNAL: Interface implemented by code that is able to locate classifiable content for topics.
DelimiterTrimmerIF INTERNAL:
DocumentAnalyzerIF INTERNAL:
FormatModuleIF INTERNAL: Interface that encapsulates the support for a given document format.
HttpServletRequestAwareIF INTERNAL: Interface implemented by ClassifyPluginIFs that want access to the current HTTP request in a servlet environment.
TermAnalyzerIF INTERNAL:
TermNormalizerIF INTERNAL:
TermStemmerIF INTERNAL: A stemmer produces the stem of a word from a form of the word.
TextHandlerIF INTERNAL: Callback interface used by format modules to tell the classification framework about the structure of classifiable content.
TokenizerIF INTERNAL:
 

Class Summary
AbstractDocumentAnalyzer INTERNAL:
BlackList INTERNAL:
CharacterAnalyzer INTERNAL:
Chew PUBLIC: Command-line tool for extracting keywords from a document.
ClassifiableContent INTERNAL:
ClassifyUtils INTERNAL:
CompoundAnalyzer INTERNAL:
ConferencePlugin INTERNAL:
DefaultPlugin INTERNAL:
DefaultTokenizer INTERNAL:
DistanceAnalyzer INTERNAL:
Document INTERNAL:
DocumentClassifier INTERNAL:
DocumentTokenizer INTERNAL:
DowncaseNormalizer INTERNAL:
FormatModule INTERNAL:
FrequencyAnalyzer INTERNAL: A frequency table giving the frequency with which a particular word is used in a particular language.
HTMLFormatModule INTERNAL:
JunkNormalizer INTERNAL:
Language INTERNAL: Object representing a particular language.
OOXMLPowerpointFormatModule INTERNAL: A format module for the OOXML PresentationML format.
OOXMLWordFormatModule INTERNAL: A format module for the OOXML WordProcessingML format.
PDFFormatModule INTERNAL:
PlainTextFormatModule INTERNAL:
PowerPointFormatModule INTERNAL:
RegexpTermAnalyzer INTERNAL: A term analyzer which recognizes certain kinds of terms using regexps and adjusts their scores accordingly.
Region INTERNAL:
RegionBooster INTERNAL:
RelativeScore INTERNAL:
SimpleClassifier PUBLIC: A simple top-level API for classifying content.
SnowballStemmer INTERNAL:
SpecialCharNormalizer INTERNAL:
StopList INTERNAL: A set of words considered "stop words" in a particular language.
Term PUBLIC: Represents a concept which occurs in the classified content.
TermDatabase PUBLIC: A collection of terms representing the result of classifying a piece of content.
TextBlock INTERNAL:
Token INTERNAL:
TokenVisitor INTERNAL:
TologRulePlugin INTERNAL:
TopicContentPlugin INTERNAL: Classifier plugin which produces content from the topic itself.
TopicContentPlugin.TopicAsContent  
TopicMapAnalyzer INTERNAL:
TopicMapAnalyzer.AssociationType  
TopicMapClassification INTERNAL:
Variant PUBLIC: Represents a form of a term as it occurred in classified content.
WebChew INTERNAL:
WordFormatModule INTERNAL: A format module for the old binary Word format.
XMLFormatModule INTERNAL:
 

Package net.ontopia.topicmaps.classify Description

To classify content, use the SimpleClassifier class. Note that most of the APIs are INTERNAL, and so may change at any time.

If you need more flexibility, it is possible to use the INTERNAL APIs directly. Below is example code showing how to output a ranked list of the terms found in a particular document.

    // load the topic map
    TopicMapIF topicmap = ImportExportUtils.getReader(args[0]).read();

    // create classifier
    TopicMapClassification tcl = new TopicMapClassification(topicmap);

    // read document
    ClassifiableContentIF cc = ClassifyUtils.getClassifiableContent(args[1]);

    // classify document
    tcl.classify(cc);

    // dump the ranked terms
    TermDatabase tdb = tcl.getTermDatabase();
    tdb.dump(50);



Copyright © 2000-2012 Ontopia.