Class TextReader

java.lang.Object
java.io.Reader
com.norconex.commons.lang.io.TextReader
All Implemented Interfaces:
Closeable, AutoCloseable, Readable

public class TextReader extends Reader
Reads text form an input stream, splitting it wisely whenever the text is too large. First tries to split after the last paragraph. If there are no paragraph, it tries to split after the last sentence. If no sentence can be detected, it splits on the last word. If no words are found, it returns all it could read up to the maximum read size in characters. The default maximum number of characters to be read before splitting is 10 millions. Passing -1 as the maxReadSize will disable reading in batch and will read the entire text all at once.
Since:
1.6.0
  • Field Details

  • Constructor Details

    • TextReader

      public TextReader(Reader reader)
      Create a new text reader, reading a maximum of 10 million characters at a time when readText() is called.
      Parameters:
      reader - a Reader
    • TextReader

      public TextReader(Reader reader, int maxReadSize)
      Constructor.
      Parameters:
      reader - a Reader
      maxReadSize - maximum to read at once with readText().
    • TextReader

      public TextReader(Reader reader, int maxReadSize, boolean removeTrailingDelimiter)
      Constructor.
      Parameters:
      reader - a Reader
      maxReadSize - maximum to read at once with readText().
      removeTrailingDelimiter - whether to remove trailing delimiter
  • Method Details

    • read

      public int read(char[] cbuf, int off, int len) throws IOException
      Specified by:
      read in class Reader
      Throws:
      IOException
    • readText

      public String readText() throws IOException
      Reads the next chunk of text, up to the maximum read size specified. It tries as much as possible to break long text into paragraph, sentences or words, before returning. See class documentation.
      Returns:
      text read
      Throws:
      IOException - problem reading text.
    • close

      public void close() throws IOException
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Specified by:
      close in class Reader
      Throws:
      IOException