Announcement
Collapse
No announcement yet.
Script to automate building an adblocking hosts file
Collapse
This topic is closed.
X
This is a sticky topic.
X
X
-
I don't want to take over Steve's thread, so I created a new one here for the router project, so I don't clutter this one with stuff not specific to Kubuntu.
- Top
- Bottom
Leave a comment:
-
Google Analytics for Wordpress works by inserting the analytics code into the header of each page. This is the code:
Code:<script type="text/javascript">//<![CDATA[ // Google Analytics for WordPress by Yoast v4.3.3 | http://yoast.com/wordpress/google-analytics/ var _gaq = _gaq || []; _gaq.push(['_setAccount', 'XXXXXXXXX']); _gaq.push(['_setCustomVar',2,'post_type','page',3],['_setCustomVar',4,'year','2013',3],['_trackPageview']); (function () { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); //]]></script>
FeathersLast edited by Feathers McGraw; Oct 28, 2013, 12:11 PM.
- Top
- Bottom
Leave a comment:
-
Plus, if either of us have done something stupid (even if it does work), people can suggest an alternative, and we all learn. Now all I need to do is work out which Google things to enable, which may turn out to be the most difficult bit!
- Top
- Bottom
Leave a comment:
-
Wow! This is a PRIME example of how Open Source is supposed to work. Steve creates a very neat Bash script, his first, to create a specialized /etc/hosts file and folks jump in and add mods for their special purposes. Everyone benefits! Now, suppose he had created a binary to sell as shareware? Only he could have made changes, depriving himself and others of improvements, changes, bug fixes, etc..., that other more experienced Bash script writers could have contributed. Everyone benefits from Steve, Feathers and the other contributers.Last edited by GreyGeek; Oct 28, 2013, 05:22 AM.
- Top
- Bottom
Leave a comment:
-
Didn't fancy trawling through 33,000 lines for certain things so I thought I'd automate it. Was a good learning experience.
Code:#!/bin/bash #Before calling this script, create a whitelist file containing phrases to allow, one phrase per line if [ $# -ne 1 ]; then echo "Usage: $0 whitelist_file_location" exit fi INPUT_FILE=~/hosts-block OUTPUT_FILE=~/hosts-block-less-whitelist #first, remove empty lines from whitelist_file (or next step will throw an error) sed '/^$/d' $1 > tt mv tt $1 echo 'Removed empty lines from whitelist_file' cp $INPUT_FILE $OUTPUT_FILE #now, read lines from whitelist file and remove entries with matching content from OUTPUT_FILE cat $1 | while read line; do sed -e '/'$line'/d' $OUTPUT_FILE > tt mv tt $OUTPUT_FILE echo 'Removed any lines containing' $line done
- Top
- Bottom
Leave a comment:
-
You can edit the output of my script and remove any references to Google Analytics before you copy the file to your router.
- Top
- Bottom
Leave a comment:
-
Thanks, that's really interesting!
Originally posted by SteveRiley View PostYou will see that it contains a number of links to sites my script does block (google-analytics.com, quantcast.com)
Blocking google-analytics at the router would break the connection from the Pi to the Google server. Finding a local equivalent would be ideal, I've tried a couple of wordpress plugins but unfortunately the counts were pretty wild.
Feathers
- Top
- Bottom
Leave a comment:
-
Originally posted by Feathers McGraw View PostTested using the site below, and some adverts still showed, but they're not real adverts, so I'm not sure what that means! Haven't had any real ones get through. Browsing seems snappier with AdBlock turned off.
Code:<img src="http://img236.echo.cx/img236/5108/adbannersportedtop9tr.gif" alt="Ad banner should be blocked" title=" Ad banner should be blocked"> <h3>[^ You should NOT be seeing this image above Ad banner was here ^]</h3> <br> <img src="http://img145.echo.cx/img145/3690/atribalfushionsported3ti.gif" alt="Ad should be blocked" title="Ad should be blocked"> <h3>[^ You should NOT be seeing this image above Ad image was here ^]</h3> <br> <img src="http://img207.echo.cx/img207/1241/realmedia6iw.gif" alt="Ad should be blocked" title="Ad should be blocked"> <h3>[^ You should NOT be seeing this image above Ad image was here ^]</h3> <br> <img src="http://img61.echo.cx/img61/2681/adtrackingpromo1gl.gif" alt=" Ad should be blocked" title="Ad should be blocked"> <h3>[^ You should NOT be seeing this image above Ad image was here ^]</h3> <br> <img src="http://img104.echo.cx/img104/9528/friendsaffiliatessported8zx.gif" alt="Ad should be blocked" title="This is NOT an AD"> <h3>[^ You SHOULD be seeing this image above ^]</h3> <br> <img src="http://img64.echo.cx/img64/6751/doubleclickaffsportedbottom6cf.gif" alt="Ad should be blocked" title="Ad should be blocked"> <h3>[^ You should NOT be seeing this image above Affilates was here ^]</h3>
Upon first glance, then, my script shouldn't block any of the images, because it has no entries for hosts in the echo.cx domain. However, the snippet of HTML above deserves a bit more investigation. The first, fourth, fifth, and sixth image links point to true images. But the second and third do not: instead, they point to HTML files! Let's download the second:
Code:steve@t520:~/junk$ [B]wget -S http://img145.echo.cx/img145/3690/atribalfushionsported3ti.gif[/B] --2013-10-27 14:32:06-- http://img145.echo.cx/img145/3690/atribalfushionsported3ti.gif Resolving img145.echo.cx (img145.echo.cx)... 208.94.1.239 Connecting to img145.echo.cx (img145.echo.cx)|208.94.1.239|:80... connected. HTTP request sent, awaiting response... HTTP/1.1 200 OK Server: nginx/1.0.4 Date: Sun, 27 Oct 2013 21:32:07 GMT Content-Type: text/html Transfer-Encoding: chunked Connection: close X-Powered-By: PHP/5.2.9 X-Server-Name-And-Port: _:14000 Expires: Sun, 27 Oct 2013 21:32:06 GMT Cache-Control: no-cache X-Server-Name-And-Port: _:14000 Length: unspecified [text/html] Saving to: ‘atribalfushionsported3ti.gif’ [ <=> ] 19,145 --.-K/s in 0.09s 2013-10-27 14:32:07 (207 KB/s) - ‘atribalfushionsported3ti.gif’ saved [19145]
You will see that it contains a number of links to sites my script does block (google-analytics.com, quantcast.com) and also tries to open a popup. My script plus your browser's pop-up blocker prevent the second "image" from loading. (The third "image" grabs exactly the same HTML as the second.)
Browser-based ad blockers will likely catch more ads, but they are slower and they work only in browsers. DNS blocking catches fewer ads but is faster and will work for every application that makes an Internet connection, including email clients, RSS readers, and more. It's up to each individual to determine which set of tradeoffs matter most.
Originally posted by Feathers McGraw View PostIs there any reason why it would be a bad idea to do this on a router? Would it filter ads for every device connected?
- Top
- Bottom
Leave a comment:
-
Trying this now, so far so good!
Tested using the site below, and some adverts still showed, but they're not real adverts, so I'm not sure what that means! Haven't had any real ones get through. Browsing seems snappier with AdBlock turned off.
http://www.angelfire.com/alt2/entert...lock_test.html
Is there any reason why it would be a bad idea to do this on a router? Would it filter ads for every device connected?
Feathers
- Top
- Bottom
Leave a comment:
-
-
Originally posted by SteveRiley View Post"Regular" Linux always consults /etc/hosts before DNS every time an application performs a host name lookup.
Config in /etc/nsswitch.conf (and older /etc/host.conf), man pages will give details.
- Top
- Bottom
Leave a comment:
-
Originally posted by jlittle View PostIf you have a blank entry in your $PATH, including starting or ending with the separator colon, that means the cwd. I've done that for three decades. It's only an issue if the cwd is writable by people (or bots or software) you don't trust; we don't do that in a typical linux install.
I'm not terribly fond of relative path elements, especially if those are before the absolute path elements...as one might run something malicious by accident (like a browser plugin that places a modified sudo executable in your $HOME).
It is usually much more convenient to place your executables in $PATH, but I do understand the reason why someone might wish to add cwd as a fallback.Last edited by kubicle; Oct 27, 2013, 12:29 AM.
- Top
- Bottom
Leave a comment:
Leave a comment: