Popularity

2.6

Stable

Activity

0.0

Stable

Stars 53

Watchers 7

Forks 9

Last Commit over 3 years ago

Description

A configurable parallel web crawler, designed to crawl a website for URLs and file content.

Programming language: Rust

License: MIT License

Tags: Web Web Programming URL HTML Crawler

Latest version: v0.3.0

url-crawler alternatives and similar packages

Based on the "Web programming" category.
Alternatively, view url-crawler alternatives based on common mentions on social networks and blogs.

yew

9.9 8.6 url-crawler VS yew

Rust / Wasm framework for creating reliable and efficient web applications
Rocket

9.8 8.9 url-crawler VS Rocket

A web framework for Rust.

WorkOS - The modern identity platform for B2B SaaS

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

Promo workos.com

actix-web

9.8 9.1 url-crawler VS actix-web

Actix Web is a powerful, pragmatic, and extremely fast web framework for Rust.
hyper

9.7 9.2 url-crawler VS hyper

An HTTP library for Rust
gutenberg

9.6 8.3 url-crawler VS gutenberg

A fast static site generator in a single binary with everything built-in. https://www.getzola.org
reqwest

9.5 8.9 url-crawler VS reqwest

An easy and powerful Rust HTTP Client
Iron

9.2 0.0 url-crawler VS Iron

An Extensible, Concurrent Web Framework for Rust
Tide

9.0 6.6 url-crawler VS Tide

Fast and friendly HTTP server framework for async Rust
burn

9.0 8.9 url-crawler VS burn

DISCONTINUED. Burn is a new comprehensive dynamic Deep Learning Framework built using Rust with extreme flexibility, compute efficiency and portability as its primary goals. [Moved to: https://github.com/Tracel-AI/burn]
Seed

8.7 4.2 url-crawler VS Seed

A Rust framework for creating web apps
stdweb

8.6 0.0 url-crawler VS stdweb

A standard library for the client-side Web
loco

8.5 9.8 url-crawler VS loco

🚂 🦀 The one-person framework for Rust for side-projects and startups
Nickel

8.5 0.0 url-crawler VS Nickel

An expressjs inspired web framework for Rust
Gotham

8.0 7.0 url-crawler VS Gotham

A flexible web framework that promotes stability, safety, security and speed.
Percy

7.9 5.3 url-crawler VS Percy

Build frontend browser apps with Rust + WebAssembly. Supports server side rendering.
License

7.8 10.0 url-crawler VS License

DISCONTINUED. Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference [Moved to: https://github.com/sonos/tract]
tungstenite-rs

7.7 7.1 url-crawler VS tungstenite-rs

Lightweight stream-based WebSocket implementation for Rust.
rust-musl-builder

7.5 0.0 url-crawler VS rust-musl-builder

Docker images for compiling static Rust binaries using musl-libc and musl-gcc, with static versions of useful C libraries. Supports openssl and diesel crates.
ws-rs

7.5 2.0 url-crawler VS ws-rs

Lightweight, event-driven WebSockets for Rust.
ureq

7.4 8.5 url-crawler VS ureq

A simple, safe HTTP client
curl-rust

7.1 7.1 url-crawler VS curl-rust

Rust bindings to libcurl
Rouille, Rust web server middleware

6.9 1.0 url-crawler VS Rouille, Rust web server middleware

Web framework in Rust
tiny-http

6.8 1.5 url-crawler VS tiny-http

Low level HTTP server library in Rust
cargo-web

6.8 0.0 url-crawler VS cargo-web

A Cargo subcommand for the client-side Web
rustful

6.5 0.0 url-crawler VS rustful

DISCONTINUED. [OUTDATED] A light HTTP framework for Rust
heroku-buildpack-rust

6.2 0.0 url-crawler VS heroku-buildpack-rust

A buildpack for Rust applications on Heroku, with full support for Rustup, cargo and build caching.
OBSOLETION NOTICE

6.0 10.0 url-crawler VS OBSOLETION NOTICE

DISCONTINUED. Completely OBSOLETE Rust HTTP library (server and client)
Sapper

5.9 0.0 url-crawler VS Sapper

A lightweight web framework built on hyper, implemented in Rust language.
Rustless

5.9 0.0 url-crawler VS Rustless

REST-like API micro-framework for Rust. Works with Iron.
Teepee

5.7 10.0 url-crawler VS Teepee

DISCONTINUED. Teepee, the Rust HTTP toolkit
jsonschema

5.5 7.5 url-crawler VS jsonschema

JSON Schema validation library
frank_jwt

4.6 3.1 url-crawler VS frank_jwt

JSON Web Token implementation in Rust.
kubernetes-rust

4.4 1.9 url-crawler VS kubernetes-rust

Rust client for Kubernetes
css-inline

4.1 9.4 url-crawler VS css-inline

High-performance library for inlining CSS into HTML 'style' attributes
cargonauts

3.9 0.0 url-crawler VS cargonauts

A Rust web framework
handlebars-iron

3.3 0.0 url-crawler VS handlebars-iron

Handlebars middleware for Iron web framework
sockjs

3.0 0.0 url-crawler VS sockjs

DISCONTINUED. SockJS server for Rust
zap

2.1 0.0 url-crawler VS zap

DISCONTINUED. A lightning fast http framework for Rust
rust-app-engine

2.1 0.0 url-crawler VS rust-app-engine

App Engine Rust boilerplate
urlshortener-rs

2.0 5.9 url-crawler VS urlshortener-rs

A very-very simple url shortener (client) for Rust.
rust-websocket

1.1 0.0 url-crawler VS rust-websocket

A WebSocket (RFC6455) library written in Rust
curl-rs

0.7 0.0 url-crawler VS curl-rs

A curl(libcurl) mod for rust.
The FastCGI Rust implementation.

0.6 0.0 url-crawler VS The FastCGI Rust implementation.

Native Rust library for FastCGI

Do you think we are missing an alternative of url-crawler or a related project?

Add another 'Web programming' Package

Popular Comparisons

README

url-crawler

A configurable parallel web crawler, designed to crawl a website for content.

[Changelog](./CHANGELOG.md)
Docs.rs

Example

extern crate url_crawler;
use std::sync::Arc;
use url_crawler::*;

/// Function for filtering content in the crawler before a HEAD request.
///
/// Only allow directory entries, and files that have the `deb` extension.
fn apt_filter(url: &Url) -> bool {
    let url = url.as_str();
    url.ends_with("/") || url.ends_with(".deb")
}

pub fn main() {
    // Create a crawler designed to crawl the given website.
    let crawler = Crawler::new("http://apt.pop-os.org/".to_owned())
        // Use four threads for fetching
        .threads(4)
        // Check if a URL matches this filter before performing a HEAD request on it.
        .pre_fetch(Arc::new(apt_filter))
        // Initialize the crawler and begin crawling. This returns immediately.
        .crawl();

    // Process url entries as they become available
    for file in crawler {
        println!("{:#?}", file);
    }
}

Output

The folowing includes two snippets from the combined output.

...
Html {
    url: "http://apt.pop-os.org/proprietary/pool/bionic/main/source/s/system76-cudnn-9.2/"
}
Html {
    url: "http://apt.pop-os.org/proprietary/pool/bionic/main/source/t/tensorflow-1.9-cuda-9.2/"
}
Html {
    url: "http://apt.pop-os.org/proprietary/pool/bionic/main/source/t/tensorflow-1.9-cpu/"
}
...
File {
    url: "http://apt.pop-os.org/proprietary/pool/bionic/main/binary-amd64/a/atom/atom_1.30.0_amd64.deb",
    content_type: "application/octet-stream",
    length: 87689398,
    modified: Some(
        2018-09-25T17:54:39+00:00
    )
}
File {
    url: "http://apt.pop-os.org/proprietary/pool/bionic/main/binary-amd64/a/atom/atom_1.31.1_amd64.deb",
    content_type: "application/octet-stream",
    length: 90108020,
    modified: Some(
        2018-10-03T22:29:15+00:00
    )
}
...

url-crawler

Rust crate for configurable parallel web crawling, designed to crawl for content

Description

url-crawler alternatives and similar packages

Popular Comparisons

README

url-crawler

Example

Output