unicode.qbk 1.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142
  1. [/
  2. Copyright 2006-2007 John Maddock.
  3. Distributed under the Boost Software License, Version 1.0.
  4. (See accompanying file LICENSE_1_0.txt or copy at
  5. http://www.boost.org/LICENSE_1_0.txt).
  6. ]
  7. [section:unicode Unicode and Boost.Regex]
  8. There are two ways to use Boost.Regex with Unicode strings:
  9. [h4 Rely on wchar_t]
  10. If your platform's `wchar_t` type can hold Unicode strings, and your
  11. platform's C/C++ runtime correctly handles wide character constants
  12. (when passed to `std::iswspace` `std::iswlower` etc), then you can use
  13. `boost::wregex` to process Unicode. However, there are several
  14. disadvantages to this approach:
  15. * It's not portable: there's no guarantee on the width of `wchar_t`, or
  16. even whether the runtime treats wide characters as Unicode at all,
  17. most Windows compilers do so, but many Unix systems do not.
  18. * There's no support for Unicode-specific character classes: `[[:Nd:]]`, `[[:Po:]]` etc.
  19. * You can only search strings that are encoded as sequences of wide
  20. characters, it is not possible to search UTF-8, or even UTF-16 on many platforms.
  21. [h4 Use a Unicode Aware Regular Expression Type.]
  22. If you have the
  23. [@http://www.ibm.com/software/globalization/icu/ ICU library], then
  24. Boost.Regex can be
  25. [link boost_regex.install.building_with_unicode_and_icu_su
  26. configured to make use
  27. of it], and provide a distinct regular expression type (boost::u32regex),
  28. that supports both Unicode specific character properties, and the searching
  29. of text that is encoded in either UTF-8, UTF-16, or UTF-32. See:
  30. [link boost_regex.ref.non_std_strings.icu
  31. ICU string class support].
  32. [endsect]