Popularity
6.7
Growing
Activity
8.4
Growing
612
20
86

Programming language: Rust
License: MIT License
Tags: Text     Text processing     language     Nlp     Lang     Whatlang    
Latest version: v0.12.0

whatlang-rs alternatives and similar packages

Based on the "Text processing" category.
Alternatively, view whatlang-rs alternatives based on common mentions on social networks and blogs.

Do you think we are missing an alternative of whatlang-rs or a related project?

Add another 'Text processing' Package

README

Whatlang

Natural language detection for Rust with focus on simplicity and performance. Try online demo.

Content

Features

  • Supports 78 languages
  • 100% written in Rust
  • Lightweight, fast and simple
  • Recognizes not only a language, but also a script (Latin, Cyrillic, etc)
  • Provides reliability information

Get started

Add to you Cargo.toml:

[dependencies]

whatlang = "0.12.0"

Example:

extern crate whatlang;

use whatlang::{detect, Lang, Script};

fn main() {
    let text = "Ĉu vi ne volas eklerni Esperanton? Bonvolu! Estas unu de la plej bonaj aferoj!";

    let info = detect(text).unwrap();
    assert_eq!(info.lang(), Lang::Epo);
    assert_eq!(info.script(), Script::Latin);
    assert_eq!(info.confidence(), 1.0);
    assert!(info.is_reliable());
}

For more details (e.g. how to blacklist some languages) please check the documentation.

Feature toggles

Feature Description
enum-map Lang and Script implement Enum trait from enum-map

How does it work?

How does the language recognition work?

The algorithm is based on the trigram language models, which is a particular case of n-grams. To understand the idea, please check the original whitepaper Cavnar and Trenkle '94: N-Gram-Based Text Categorization'.

How is is_reliable calculated?

It is based on the following factors:

  • How many unique trigrams are in the given text
  • How big is the difference between the first and the second(not returned) detected languages? This metric is called rate in the code base.

Therefore, it can be presented as 2d space with threshold functions, that splits it into "Reliable" and "Not reliable" areas. This function is a hyperbola and it looks like the following one:

For more details, please check a blog article Introduction to Rust Whatlang Library and Natural Language Identification Algorithms.

Running benchmarks

This is mostly useful to test performance optimizations.

cargo bench

Comparison with alternatives

Whatlang CLD2 CLD3
Implementation language Rust C++ C++
Languages 87 83 107
Algorithm trigrams quadgrams neural network
Supported Encoding UTF-8 UTF-8 ?
HTML support no yes ?

Ports and clones

Donations

You can support the project by donating NEAR tokens.

Our NEAR wallet address is whatlang.near

Derivation

Whatlang is a derivative work from Franc (JavaScript, MIT) by Titus Wormer.

License

MIT © Sergey Potapov

Contributors


*Note that all licence references and agreements mentioned in the whatlang-rs README section above are relevant to that project's source code only.