Popularity
2.6
Stable
Activity
0.0
Stable
53
7
9
Description
A configurable parallel web crawler, designed to crawl a website for URLs and file content.
Programming language: Rust
License: MIT License
Latest version: v0.3.0
url-crawler alternatives and similar packages
Based on the "Web programming" category.
Alternatively, view url-crawler alternatives based on common mentions on social networks and blogs.
-
actix-web
Actix Web is a powerful, pragmatic, and extremely fast web framework for Rust. -
gutenberg
A fast static site generator in a single binary with everything built-in. https://www.getzola.org -
burn
Burn - A Flexible and Comprehensive Deep Learning Framework in Rust -
Gotham
A flexible web framework that promotes stability, safety, security and speed. -
Percy
Build frontend browser apps with Rust + WebAssembly. Supports server side rendering. -
License
Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference [Moved to: https://github.com/sonos/tract] -
rust-musl-builder
Docker images for compiling static Rust binaries using musl-libc and musl-gcc, with static versions of useful C libraries. Supports openssl and diesel crates. -
tungstenite-rs
Lightweight stream-based WebSocket implementation for Rust. -
Rouille, Rust web server middleware
Web framework in Rust -
heroku-buildpack-rust
A buildpack for Rust applications on Heroku, with full support for Rustup, cargo and build caching. -
Sapper
A lightweight web framework built on hyper, implemented in Rust language. -
OBSOLETION NOTICE
Completely OBSOLETE Rust HTTP library (server and client) -
urlshortener-rs
A very-very simple url shortener (client) for Rust. -
The FastCGI Rust implementation.
Native Rust library for FastCGI
Clean code begins in your IDE with SonarLint
Up your coding game and discover issues early. SonarLint is a free plugin that helps you find & fix bugs and security issues from the moment you start writing code. Install from your favorite IDE marketplace today.
Promo
www.sonarlint.org
Do you think we are missing an alternative of url-crawler or a related project?
README
url-crawler
A configurable parallel web crawler, designed to crawl a website for content.
- [Changelog](./CHANGELOG.md)
- Docs.rs
Example
extern crate url_crawler;
use std::sync::Arc;
use url_crawler::*;
/// Function for filtering content in the crawler before a HEAD request.
///
/// Only allow directory entries, and files that have the `deb` extension.
fn apt_filter(url: &Url) -> bool {
let url = url.as_str();
url.ends_with("/") || url.ends_with(".deb")
}
pub fn main() {
// Create a crawler designed to crawl the given website.
let crawler = Crawler::new("http://apt.pop-os.org/".to_owned())
// Use four threads for fetching
.threads(4)
// Check if a URL matches this filter before performing a HEAD request on it.
.pre_fetch(Arc::new(apt_filter))
// Initialize the crawler and begin crawling. This returns immediately.
.crawl();
// Process url entries as they become available
for file in crawler {
println!("{:#?}", file);
}
}
Output
The folowing includes two snippets from the combined output.
...
Html {
url: "http://apt.pop-os.org/proprietary/pool/bionic/main/source/s/system76-cudnn-9.2/"
}
Html {
url: "http://apt.pop-os.org/proprietary/pool/bionic/main/source/t/tensorflow-1.9-cuda-9.2/"
}
Html {
url: "http://apt.pop-os.org/proprietary/pool/bionic/main/source/t/tensorflow-1.9-cpu/"
}
...
File {
url: "http://apt.pop-os.org/proprietary/pool/bionic/main/binary-amd64/a/atom/atom_1.30.0_amd64.deb",
content_type: "application/octet-stream",
length: 87689398,
modified: Some(
2018-09-25T17:54:39+00:00
)
}
File {
url: "http://apt.pop-os.org/proprietary/pool/bionic/main/binary-amd64/a/atom/atom_1.31.1_amd64.deb",
content_type: "application/octet-stream",
length: 90108020,
modified: Some(
2018-10-03T22:29:15+00:00
)
}
...