Popularity
2.6
Stable
Activity
0.0
Stable
53
7
9
Description
A configurable parallel web crawler, designed to crawl a website for URLs and file content.
Programming language: Rust
License: MIT License
Latest version: v0.3.0
url-crawler alternatives and similar packages
Based on the "Web programming" category.
Alternatively, view url-crawler alternatives based on common mentions on social networks and blogs.
-
gutenberg
A fast static site generator in a single binary with everything built-in. https://www.getzola.org -
burn
DISCONTINUED. Burn is a new comprehensive dynamic Deep Learning Framework built using Rust with extreme flexibility, compute efficiency and portability as its primary goals. [Moved to: https://github.com/Tracel-AI/burn] -
License
DISCONTINUED. Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference [Moved to: https://github.com/sonos/tract] -
rust-musl-builder
Docker images for compiling static Rust binaries using musl-libc and musl-gcc, with static versions of useful C libraries. Supports openssl and diesel crates. -
heroku-buildpack-rust
A buildpack for Rust applications on Heroku, with full support for Rustup, cargo and build caching.
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
Promo
coderabbit.ai

Do you think we are missing an alternative of url-crawler or a related project?
README
url-crawler
A configurable parallel web crawler, designed to crawl a website for content.
- [Changelog](./CHANGELOG.md)
- Docs.rs
Example
extern crate url_crawler;
use std::sync::Arc;
use url_crawler::*;
/// Function for filtering content in the crawler before a HEAD request.
///
/// Only allow directory entries, and files that have the `deb` extension.
fn apt_filter(url: &Url) -> bool {
let url = url.as_str();
url.ends_with("/") || url.ends_with(".deb")
}
pub fn main() {
// Create a crawler designed to crawl the given website.
let crawler = Crawler::new("http://apt.pop-os.org/".to_owned())
// Use four threads for fetching
.threads(4)
// Check if a URL matches this filter before performing a HEAD request on it.
.pre_fetch(Arc::new(apt_filter))
// Initialize the crawler and begin crawling. This returns immediately.
.crawl();
// Process url entries as they become available
for file in crawler {
println!("{:#?}", file);
}
}
Output
The folowing includes two snippets from the combined output.
...
Html {
url: "http://apt.pop-os.org/proprietary/pool/bionic/main/source/s/system76-cudnn-9.2/"
}
Html {
url: "http://apt.pop-os.org/proprietary/pool/bionic/main/source/t/tensorflow-1.9-cuda-9.2/"
}
Html {
url: "http://apt.pop-os.org/proprietary/pool/bionic/main/source/t/tensorflow-1.9-cpu/"
}
...
File {
url: "http://apt.pop-os.org/proprietary/pool/bionic/main/binary-amd64/a/atom/atom_1.30.0_amd64.deb",
content_type: "application/octet-stream",
length: 87689398,
modified: Some(
2018-09-25T17:54:39+00:00
)
}
File {
url: "http://apt.pop-os.org/proprietary/pool/bionic/main/binary-amd64/a/atom/atom_1.31.1_amd64.deb",
content_type: "application/octet-stream",
length: 90108020,
modified: Some(
2018-10-03T22:29:15+00:00
)
}
...