{"id":4129,"date":"2025-02-17T12:01:55","date_gmt":"2025-02-17T12:01:55","guid":{"rendered":"https:\/\/www.gubatron.com\/blog\/?p=4129"},"modified":"2025-02-17T12:04:23","modified_gmt":"2025-02-17T12:04:23","slug":"introducing-uninews-a-universal-news-scraper-in-rust","status":"publish","type":"post","link":"https:\/\/www.gubatron.com\/blog\/introducing-uninews-a-universal-news-scraper-in-rust\/","title":{"rendered":"Introducing Uninews: A Universal News Scraper in Rust"},"content":{"rendered":"<p data-pm-slice=\"1 1 []\">The internet is overflowing with news, but extracting clean, readable content from articles can be a tedious task. Whether you&#8217;re aggregating news for personal consumption, research, or AI training, automating this process is a must. Enter <a href=\"https:\/\/crates.io\/crates\/uninews\"><strong>Uninews<\/strong><\/a>, a powerful, lightweight, and efficient <strong>Rust-based<\/strong> news scraper that simplifies content extraction and conversion into Markdown format.<\/p>\n<p data-pm-slice=\"1 1 []\"><strong><a href=\"https:\/\/crates.io\/crates\/uninews\">Uninews on crates.io<\/a><\/strong><\/p>\n<p data-pm-slice=\"1 1 []\"><strong><a href=\"https:\/\/github.com\/gubatron\/uninews\">Uninews repo on github.com<\/a><\/strong><\/p>\n<h2>What is Uninews?<\/h2>\n<p>Uninews is a universal news scraper that <strong>downloads an article from a given URL<\/strong>, <strong>cleans up the HTML<\/strong>, and <strong>formats the content into Markdown<\/strong> using OpenAI\u2019s GPT-4o via <a href=\"https:\/\/github.com\/CloudLLM-ai\/cloudllm\/tree\/main\"><strong>CloudLLM<\/strong><\/a>. The final output is a structured JSON response containing:<\/p>\n<ul data-spread=\"false\">\n<li><strong>Title<\/strong> of the article<\/li>\n<li><strong>Markdown-formatted content<\/strong><\/li>\n<li><strong>Featured image URL<\/strong><\/li>\n<\/ul>\n<p>When used as a command-line tool, Uninews simply outputs the extracted Markdown, making it easy to read or integrate into your workflow.<\/p>\n<h2>Key Features<\/h2>\n<p>\u2705 <strong>Smart Content Extraction<\/strong>: Targets <code>&lt;article&gt;<\/code> tags to get the main content, falling back to <code>&lt;body&gt;<\/code> if needed.<\/p>\n<p>\u2705 <strong>Clean Markdown Conversion<\/strong>: Uses GPT-4o (via CloudLLM) to generate clean, structured Markdown from raw HTML.<\/p>\n<p>\u2705 <strong>Reusable Rust Library<\/strong>: The <code>universal_scrape<\/code> function can be integrated into any Rust project.<\/p>\n<p>\u2705 <strong>Multilingual Support<\/strong>: Specify a language for the output, defaulting to English.<\/p>\n<h2>Installation<\/h2>\n<p>You need <strong><a href=\"https:\/\/www.rust-lang.org\/\">Rust<\/a> and Cargo<\/strong> installed to get started.<\/p>\n<h3>Install via Cargo<\/h3>\n<pre><code>cargo install uninews<\/code><\/pre>\n<h3>Or Build from Source<\/h3>\n<pre><code>git clone https:\/\/github.com\/gubatron\/uninews.git\ncd uninews\nmake build\nmake install<\/code><code><\/code><\/pre>\n<h2>Running Uninews<\/h2>\n<p>Before running Uninews, set your <strong>OpenAI API key<\/strong>:<\/p>\n<pre><code>export OPEN_AI_SECRET=sk-xxxxxxxxxxxxxxxxxxxxxxxxxx<\/code><\/pre>\n<p>Then, scrape a news article:<\/p>\n<pre><code>uninews https:\/\/example.com\/news-article<\/code><\/pre>\n<p>You can also specify the output language:<\/p>\n<pre><code>uninews -l spanish https:\/\/example.com\/news-article<\/code><\/pre>\n<h2>Command-line Options<\/h2>\n<pre><code>Usage: uninews [OPTIONS] &lt;URL&gt;\n\nArguments:\n  &lt;URL&gt;  The URL of the news article to scrape\n\nOptions:\n  -l, --language &lt;LANGUAGE&gt;  Output language (default: English)\n  -h, --help                 Print help\n  -V, --version              Print version<\/code><\/pre>\n<h2>Integrating Uninews in Your Rust Project<\/h2>\n<p>Uninews can be used as a library to <strong>scrape news articles programmatically<\/strong>:<\/p>\n<pre><code>use uninews::{universal_scrape, Post};\n\n\/\/ Scrape and convert a news article into Markdown\nlet post = universal_scrape(\"https:\/\/example.com\/news\", \"english\").await;\nif !post.error.is_empty() {\n    eprintln!(\"Error: {}\", post.error);\n    return;\n}\n\nprintln!(\"{}\\n\\n{}\", post.title, post.content);<\/code><\/pre>\n<p>Make sure to <strong>set the OpenAI API key<\/strong> before calling <code>universal_scrape<\/code>:<\/p>\n<pre><code>std::env::set_var(\"OPEN_AI_SECRET\", my_open_ai_secret);<\/code><\/pre>\n<h2>Why Use Uninews?<\/h2>\n<p>\ud83d\ude80 <strong>Fast<\/strong>: Written in Rust for optimal performance.<\/p>\n<p>\ud83d\udee0 <strong>Easy to Use<\/strong>: Simple CLI and library interface.<\/p>\n<p>\ud83d\udcd6 <strong>Readable Output<\/strong>: Well-formatted Markdown conversion.<\/p>\n<p>\ud83d\udd04 <strong>Reusable<\/strong>: Works as both a command-line tool and a Rust library.<\/p>\n<h2>License<\/h2>\n<p>Uninews is open-source and licensed under <strong>MIT License<\/strong>.<\/p>\n<p>Copyright (c) 2025 <strong>\u00c1ngel Le\u00f3n<\/strong>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The internet is overflowing with news, but extracting clean, readable content from articles can be a tedious task. Whether you&#8217;re aggregating news for personal consumption, research, or AI training, automating this process is a must. Enter Uninews, a powerful, lightweight, and efficient Rust-based news scraper that simplifies content extraction and conversion into Markdown format. Uninews [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":4130,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[15],"tags":[],"class_list":["post-4129","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-code"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2025\/02\/gubatron_Logo_for_Uninews_a_universal_news_scraper_command_li_4f376071-18e3-400e-9644-8efc878465e4_3.png?fit=1024%2C1024&ssl=1","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p5Unzf-14B","jetpack-related-posts":[{"id":4184,"url":"https:\/\/www.gubatron.com\/blog\/agentes-automatizacion-y-el-futuro-de-los-medios\/","url_meta":{"origin":4129,"position":0},"title":"Agentes, Automatizaci\u00f3n y el Futuro de los Medios","author":"gubatron","date":"March 9, 2026","format":false,"excerpt":"Explico c\u00f3mo converti una publicaci\u00f3n sobre Bitcoin en una operaci\u00f3n editorial fuertemente automatizada con modelos de IA, flujos agenticos y herramientas desarrolladas en Rust. Su relato no solo describe un salto de 20 a 80 art\u00edculos diarios, sino que tambi\u00e9n plantea una idea m\u00e1s ambiciosa: usar blockchain como memoria persistente\u2026","rel":"","context":"In &quot;AI&quot;","block_context":{"text":"AI","link":"https:\/\/www.gubatron.com\/blog\/category\/ai\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2026\/03\/canuto-imagine-1773077597.jpg?fit=1200%2C720&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2026\/03\/canuto-imagine-1773077597.jpg?fit=1200%2C720&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2026\/03\/canuto-imagine-1773077597.jpg?fit=1200%2C720&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2026\/03\/canuto-imagine-1773077597.jpg?fit=1200%2C720&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2026\/03\/canuto-imagine-1773077597.jpg?fit=1200%2C720&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":4119,"url":"https:\/\/www.gubatron.com\/blog\/makefile-for-rust-projects\/","url_meta":{"origin":4129,"position":1},"title":"Makefile for rust projects","author":"gubatron","date":"January 30, 2025","format":false,"excerpt":"","rel":"","context":"In &quot;Code&quot;","block_context":{"text":"Code","link":"https:\/\/www.gubatron.com\/blog\/category\/code\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-02-17-at-10.35.44%E2%80%AFAM.png?fit=1200%2C726&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-02-17-at-10.35.44%E2%80%AFAM.png?fit=1200%2C726&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-02-17-at-10.35.44%E2%80%AFAM.png?fit=1200%2C726&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-02-17-at-10.35.44%E2%80%AFAM.png?fit=1200%2C726&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-02-17-at-10.35.44%E2%80%AFAM.png?fit=1200%2C726&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":3996,"url":"https:\/\/www.gubatron.com\/blog\/the-difference-between-a-slice-and-an-array-in-rust\/","url_meta":{"origin":4129,"position":2},"title":"The difference between a Slice and an Array in Rust","author":"gubatron","date":"December 21, 2022","format":false,"excerpt":"In Rust, a slice is a reference to a contiguous section of a larger data structure, such as an array or a vector. It is represented using the syntax &[T], where T is the type of the elements in the slice. A slice does not own the data it refers\u2026","rel":"","context":"In &quot;Code&quot;","block_context":{"text":"Code","link":"https:\/\/www.gubatron.com\/blog\/category\/code\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":4000,"url":"https:\/\/www.gubatron.com\/blog\/what-is-the-rust-equivalent-to-javas-printwriter\/","url_meta":{"origin":4129,"position":3},"title":"What is the Rust equivalent to Java&#8217;s PrintWriter?","author":"gubatron","date":"December 21, 2022","format":false,"excerpt":"In Rust, the equivalent of Java's PrintWriter is the std::io::Write trait, which is implemented by a number of types that can be used to write data to an output stream, such as a file or a network socket. To use Write to write text to an output stream, you can\u2026","rel":"","context":"In &quot;Code&quot;","block_context":{"text":"Code","link":"https:\/\/www.gubatron.com\/blog\/category\/code\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2022\/12\/progress_image_100_7fdc7b72-6c19-42f5-affe-d055d02d6f8e.webp?fit=1024%2C1024&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2022\/12\/progress_image_100_7fdc7b72-6c19-42f5-affe-d055d02d6f8e.webp?fit=1024%2C1024&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2022\/12\/progress_image_100_7fdc7b72-6c19-42f5-affe-d055d02d6f8e.webp?fit=1024%2C1024&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2022\/12\/progress_image_100_7fdc7b72-6c19-42f5-affe-d055d02d6f8e.webp?fit=1024%2C1024&ssl=1&resize=700%2C400 2x"},"classes":[]},{"id":4135,"url":"https:\/\/www.gubatron.com\/blog\/the-curious-case-of-inconsistent-cargo-fmt-formatting-and-how-to-fix-it\/","url_meta":{"origin":4129,"position":4},"title":"The Curious Case of Inconsistent cargo fmt Formatting (and How to Fix It)","author":"gubatron","date":"February 17, 2025","format":false,"excerpt":"Have you ever run into a situation where\u00a0cargo fmt, Rust's code formatter, produces different output on different machines, even though you're working on the same project? This can be incredibly frustrating, especially when you're trying to maintain consistent code style across a team or between your own development environments. I\u2026","rel":"","context":"In &quot;Code&quot;","block_context":{"text":"Code","link":"https:\/\/www.gubatron.com\/blog\/category\/code\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2025\/02\/gubatron_Abstract_digital_art_fragmented_code_snippets_floati_e3f2c486-78ad-4b2a-b86a-e792a1970064_2.png?fit=1024%2C1024&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2025\/02\/gubatron_Abstract_digital_art_fragmented_code_snippets_floati_e3f2c486-78ad-4b2a-b86a-e792a1970064_2.png?fit=1024%2C1024&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2025\/02\/gubatron_Abstract_digital_art_fragmented_code_snippets_floati_e3f2c486-78ad-4b2a-b86a-e792a1970064_2.png?fit=1024%2C1024&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2025\/02\/gubatron_Abstract_digital_art_fragmented_code_snippets_floati_e3f2c486-78ad-4b2a-b86a-e792a1970064_2.png?fit=1024%2C1024&ssl=1&resize=700%2C400 2x"},"classes":[]},{"id":4154,"url":"https:\/\/www.gubatron.com\/blog\/the-last-nation\/","url_meta":{"origin":4129,"position":5},"title":"The Last Nation","author":"gubatron","date":"June 2, 2025","format":false,"excerpt":"How an AI Rewrote Civilization https:\/\/www.youtube.com\/watch?v=Cd5DElWgDE8 In the late 2020s, the collapse began. It started subtly at first, white-collar workers phased out by smart automation, call centers replaced with LLMs, design agencies gutted by generative tools. But within five years, the shift was absolute. Hundreds of millions around the globe\u2026","rel":"","context":"In &quot;Historias&quot;","block_context":{"text":"Historias","link":"https:\/\/www.gubatron.com\/blog\/category\/historias\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2025\/06\/1747626439809.png?fit=1200%2C904&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2025\/06\/1747626439809.png?fit=1200%2C904&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2025\/06\/1747626439809.png?fit=1200%2C904&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2025\/06\/1747626439809.png?fit=1200%2C904&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2025\/06\/1747626439809.png?fit=1200%2C904&ssl=1&resize=1050%2C600 3x"},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/posts\/4129","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/comments?post=4129"}],"version-history":[{"count":2,"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/posts\/4129\/revisions"}],"predecessor-version":[{"id":4132,"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/posts\/4129\/revisions\/4132"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/media\/4130"}],"wp:attachment":[{"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/media?parent=4129"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/categories?post=4129"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/tags?post=4129"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}