Popularity

1.4

Stable

Activity

0.0

Stable

Stars 32

Watchers 2

Forks 1

Last Commit

Programming language: Rust

License: Mozilla Public License 2.0

Tags: Commandline Application Count Applications written in Rust System tools Text processing Utilities Unicode Wc Word

Latest version: v0.2.0

uwc alternatives and similar packages

Based on the "System tools" category.
Alternatively, view uwc alternatives based on common mentions on social networks and blogs.

fd

9.9 8.8 uwc VS fd

A simple, fast and user-friendly alternative to 'find'
exa

9.8 3.2 uwc VS exa

A modern replacement for ‘ls’.

WorkOS - The modern identity platform for B2B SaaS

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

Promo workos.com

coreutils

9.7 10.0 uwc VS coreutils

Cross-platform Rust rewrite of the GNU coreutils
tokei

9.4 6.2 uwc VS tokei

Count your code, quickly.
#<Sawyer::Resource:0x00007f0cdab48348>

9.3 9.2 uwc VS #<Sawyer::Resource:0x00007f0cdab48348>

Terminal bandwidth utilization tool
skim

8.9 1.9 uwc VS skim

Fuzzy Finder in rust!
watchexec

8.9 8.6 uwc VS watchexec

Executes commands in response to file modifications
fselect

8.6 8.4 uwc VS fselect

Find files with SQL-like queries
dotenv-linter

7.7 8.0 uwc VS dotenv-linter

⚡️Lightning-fast linter for .env files. Written in Rust 🦀
cita

7.5 0.0 uwc VS cita

A high performance blockchain kernel for enterprise users.
ion

7.4 5.2 uwc VS ion

Mirror of https://gitlab.redox-os.org/redox-os/ion
systemd-manager

7.3 0.0 uwc VS systemd-manager

DISCONTINUED. a systemd service manager written in Rust using GTK-rs.
snatch

6.1 0.0 uwc VS snatch

A simple, fast and interruptable download accelerator, written in Rust
netscanner

5.7 9.0 uwc VS netscanner

Network scanner
fontfinder

4.5 2.4 uwc VS fontfinder

GTK application for browsing and installing fonts from Google's font archive
zou

3.9 0.0 uwc VS zou

DISCONTINUED. A simple and fast download accelerator, written in Rust
tv-renamer

3.5 0.0 uwc VS tv-renamer

Mirror of https://gitlab.com/mmstick/tv-renamer
rrun

3.2 0.0 uwc VS rrun

minimalistic command launcher in rust
concurr

3.1 0.0 uwc VS concurr

Performs distributed command execution, written in Rust w/ Tokio
logram

2.8 0.0 uwc VS logram

Utility that takes logs from anywhere and sends them to Telegram.
ltg_push

0.8 5.1 uwc VS ltg_push

DISCONTINUED. Push log files' updates to Telegram
Mjolnir

0.3 0.0 uwc VS Mjolnir

chain database

Do you think we are missing an alternative of uwc or a related project?

Add another 'System tools' Package

Popular Comparisons

README

uwc

Like wc, but unicode-aware, and with line mode.

uwc can count:

Lines
Words
Bytes
Grapheme clusters
Unicode code points

Additionally, it can operate in line mode, which will count things within lines.

Usage example

By default, uwc will count lines, words, and bytes. You can specify the counters you'd like, or ask for all counters with the -a flag.

$ uwc tests/fixtures/**/input
lines  words  bytes  filename
8      5      29     tests/fixtures/all_newlines/input
0      0      0      tests/fixtures/empty/input
0      0      0      tests/fixtures/empty_line_mode/input
1      9      97     tests/fixtures/flags_bp/input
1      9      97     tests/fixtures/flags_cl/input
1      9      97     tests/fixtures/flags_w/input
0      1      5      tests/fixtures/hello/input
1      9      97     tests/fixtures/i_can_eat_glass/input
8      8      29     tests/fixtures/line_mode/input
7      8      28     tests/fixtures/line_mode_no_trailing_newline/input
7      8      28     tests/fixtures/line_mode_no_trailing_newline_count_newlines/input
34     66     507    total

$ uwc -a tests/fixtures/**/input
lines  words  bytes  graphemes  codepoints  filename
8      5      29     23         24          tests/fixtures/all_newlines/input
0      0      0      0          0           tests/fixtures/empty/input
0      0      0      0          0           tests/fixtures/empty_line_mode/input
1      9      97     51         51          tests/fixtures/flags_bp/input
1      9      97     51         51          tests/fixtures/flags_cl/input
1      9      97     51         51          tests/fixtures/flags_w/input
0      1      5      5          5           tests/fixtures/hello/input
1      9      97     51         51          tests/fixtures/i_can_eat_glass/input
8      8      29     28         28          tests/fixtures/line_mode/input
7      8      28     27         27          tests/fixtures/line_mode_no_trailing_newline/input
7      8      28     27         27          tests/fixtures/line_mode_no_trailing_newline_count_newlines/input
34     66     507    314        315         total

You can also switch into line mode with the --mode flag:

$ uwc -a --mode line tests/fixtures/line_mode/input
lines  words  bytes  graphemes  codepoints  filename
0      1      1      1          1           tests/fixtures/line_mode/input:1
0      1      2      2          2           tests/fixtures/line_mode/input:2
0      1      3      3          3           tests/fixtures/line_mode/input:3
0      1      5      4          4           tests/fixtures/line_mode/input:4
0      1      1      1          1           tests/fixtures/line_mode/input:5
0      1      4      4          4           tests/fixtures/line_mode/input:6
0      1      2      2          2           tests/fixtures/line_mode/input:7
0      1      3      3          3           tests/fixtures/line_mode/input:8
0      8      21     20         20          tests/fixtures/line_mode/input:total

Why?

The goal of this project is to consider unicode rules correctly when counting things. Specifically, it should:

Count all newline characters correctly. This includes lesser-known line breaks, like NEL (U+0085), FF (U+000C), LS (U+2028), and PS (U+2029).
Count all words using the Unicode standard's word boundary rules.
Count all complete grapheme clusters correctly, so that even edge cases like Z҉͈͓͈͎a̘͈̠̭l̨̯g̶̬͇̭o̝̹̗͎̙ ͟t͖̙̟̹͇̥̝͡e̥͘x͚̺̭̻͘t͉͔̩̲̘, for example, are counted correctly.

It does not aim to implement these unicode algorithms, however, so it makes use of the unicode-segmentation library for most of the heavy lifting. And since Unicode support in the Rust ecosystem is not quite mature yet, that has some consequences for this project. See the caveats below.

It is primarily a fun side project for me, and an excuse to learn more about Rust and unicode.

Installation

It is published on crates.io, so simply:

$ cargo install uwc

Caveats

UTF-8

It only supports UTF-8 files. UTF-16 can go on my to-do list if there is demand. For now, you can use iconv to convert non-UTF-8 files first.

Speed

It is slower than wc. My analysis hasn't been extensive, but as far as I can tell, the reasons are:

It is using unicode algorithms, which are just going to be slower than ASCII no matter what.
I am not that experienced with Rust, so it's quite possible I'm not doing something as efficiently as possible.
My free time is limited, and I am prioritizing correctness over speed (though speed is good).

With that said, parallelization helps. With testing on my local laptop with larger data sets, the speed is within an order of magnitude of wc. I measured uwc being 3x slower than wc on a collection of 18 MiB of text files.

Localization

Rust, as yet, has no localization libraries, so this has some consequences. Some counts will just be wrong, such as hyphenated words, which is locale-specific and requires language dictionary lookups to be correct. Also, there are some languages that have no syntactic word separators, such as Japanese, so e.g.

私はガラスを食べられます。

should be 5 words, but without localization, we cannot determine that.

uwc

Like wc, but unicode-aware, and with per-line mode.