Class DataUnitParser

java.lang.Object
com.norconex.commons.lang.unit.DataUnitParser

public final class DataUnitParser extends Object

Parse a textual representation of a data unit and converts it into a BigDecimal representing the quantity for a given unit (e.g., bytes).

If the string is made of digits only, it is assumed to be bytes and the value will remain the same.

The data unit can be written in its prefix form, or in full, whether using binary or decimal notations. (e.g., "kB", "kilobyte", "kilobytes").

Languages supported are English (default) and French (since 2.0.0). The following are acceptable symbols for each data units, for bytes and bits. The symbols are case-insensitive, and accent-insensitive.

English decimal notation:

  • kB,kilobyte,kilobytes, kbit,kilobit,kilobits
  • MB,megabyte,megabytes, Mbit,megabit,megabits
  • GB,gigabyte,gigabytes, Gbit,gigabit,gigabits
  • TB,terabyte,terabytes, Tbit,terabit,terabits
  • PB,petabyte,petabytes, Pbit,petabit,petabits
  • EB,exabyte,exabytes, Ebit,exabit,exabits
  • ZB,zettabyte,zettabytes, Zbit,zettabit,zettabits
  • YB,yottabyte,yottabytes, Ybit,yottabit,yottabits

English binary notation:

  • KiB,kibibyte,kibibytes, kibit,kibibit,kibibits
  • MiB,mebibyte,mebibytes, Mibit,mebibit,mebibits
  • GiB,gibibyte,gibibytes, Gibit,gibibit,gibibits
  • TiB,tebibyte,tebibytes, Tibit,tebibit,tebibits
  • PiB,pebibyte,pebibytes, Pibit,pebibit,pebibits
  • EiB,exbibyte,exbiytes, Eibit,exbibit,exbibits
  • ZiB,zebibyte,zebibytes, Zibit,zebibit,zebibits
  • YiB,yobibyte,yobibytes, Yibit,yobibit,yobibits

French notations

French uses the same symbols. Accents aside, the names are also the same, except for replacing "byte" with "octet" (e.g., "gigabyte" becomes "gigaoctet").

French typically write the following prefixes with an "é": méga, mébi, téra, tébi péta, pébi. Both variations are supported (with or without accents).

Refer to DataUnit for extra information of what they values represent.

No distinction is made between plural and singular. Numeric values can be integers or decimals numbers (e.g., 2.5kB). A numeric value must be followed by a data unit. Other terms or characters are ignored.

Examples:

All of the following will be parsed properly:

  • 2 gigabytes, 530 megabytes, and 2 kilobytes
  • 6GB10MB23kB
  • 2.5MiB
  • 10PiB9 gibibytes, 8 MB, and 5.5 kibibytes
  • 2 mégaoctets et 3 kilooctet

This class is thread-safe and immutable.

Since:
2.0.0
  • Method Details

    • parse

      public static BigDecimal parse(String text)
      Parses a text representation of a data measurement and returns the number of bytes it represents. If the value cannot be parsed, a DataUnitParserException is thrown. Default value is zero byte.
      Parameters:
      text - the data measurement text to parse
      Returns:
      data measurement
    • parse

      public static BigDecimal parse(String text, BigDecimal defaultValue)
      Parses a text representation of a data measurement. If the value cannot be parsed, the default value is returned (no exception is thrown).
      Parameters:
      text - the data measurement text to parse
      defaultValue - default value
      Returns:
      data measurement
    • parse

      public static BigDecimal parse(String text, DataUnit targetUnit)
      Parses a text representation of a data measurement. If the value cannot be parsed, a DataUnitParserException is thrown. Default value is zero byte.
      Parameters:
      text - the data measurement text to parse
      targetUnit - desired target unit for the returned amount
      Returns:
      amount for unit
    • parse

      public static BigDecimal parse(String text, DataUnit targetUnit, BigDecimal defaultValue)
      Parses a text representation of a data measurement. If the value cannot be parsed, the default value is returned (no exception is thrown).
      Parameters:
      text - the data measurement text to parse
      targetUnit - desired target unit for the returned amount
      defaultValue - default value
      Returns:
      amount for unit