During my internship, my supervisor wanted a program to download an .xls file from a French government website. He wanted to be able to repeat this action with the task scheduler integrated in Windows Server every month.
I made this little application that takes two parameters, the first is the name of the file, the second is the folder or more precisely the location where to save this file.
Here is a demo:
https://i.imgur.com/d9yWKVC.gif
To know that it link download, I base myself on this url: https://communaute.chorus-pro.gouv.fr/annuaire-cpro.
With Rust, I use the crate select to be able to target my HTML element, here is an example:
pub fn get_download_link(url: &str) -> Result<String> {
let res = reqwest::get(url).unwrap();
let document = Document::from_read(res)?;
let web_url = document
.find(Name("input"))
.filter_map(|n| n.attr("onclick")).nth(0).unwrap();
let string_web_url: String = web_url.to_owned();
let url_splited: Vec<String> = string_web_url.split("href=").map(|c| c.replace("'", "")).collect();
Ok(url_splited[1].to_owned())
}
You can see, that I target the first input, and get the onclick attribute that contains the link.
After that, all I have to do is download the file with the reqwest crate.
Exemple :
pub fn download_link(url: &str, file_name: &str, dir_name: &str) {
let mut resp = reqwest::get(url).expect("request failed");
let mut out = File::create(format!("{}/{}", dir_name, file_name)).expect("failed to create file");
io::copy(&mut resp, &mut out).expect("failed to copy content");
}
It is necessary to know, when I get my arguments, I make a check for the parameter which gives the path of the folder where we are going to save the file, I look if there is a /
or a \
at the end, if yes, I delete this character.
Exemple :
// Just remove / or \ character if available
if dir_name.ends_with("/") || dir_name.ends_with("\\") {
dir_name.pop();
}
This allows me to easily manage the downloading of the file:
let mut out = File::create(format!("{}/{}", dir_name, file_name)).expect("failed to create file");
This is the github repository: https://github.com/loicngr/LinkExtracter
Feel free to make your improvements :)
Peace =)
Top comments (2)
Here’s for more complete project using select, and scraper github.com/alfinsuryaS/reason-rust...
Oh it's nice! Thanks.
I really appreciate.