Announcement

Collapse
No announcement yet.

Script to automate building an adblocking hosts file

Collapse
This topic is closed.
X
This is a sticky topic.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SteveRiley
    replied
    Script to automate building an adblocking hosts file

    Originally posted by GreyGeek
    Did you ever get around to making that script smarter?
    Not yet...got distracted by other things, mostly my KDE-from-scratch experiment on my Mini. Still planning to extend the script's capabilities, though. Besides the VM integration, is there anything else you'd like to see?

    Leave a comment:


  • GreyGeek
    replied
    Script to automate building an adblocking hosts file

    Originally posted by SteveRiley
    ......
    Over the long weekend I plan to make this script smarter. I hope to figure out a way to incorporate it into VMs, too... if you're using Windows on VirtualBox, say, then when you update your hosts's hosts (haha), you can pull that into your VM's hosts file, too.
    ....
    Steve,
    Did you ever get around to making that script smarter?
    If so, I'd like to mooch it!
    GG

    Leave a comment:


  • SteveRiley
    replied
    Script to automate building an adblocking hosts file

    Already discovered a minor inconvenient side effect... if you receive marketing email and want to click the opt-out link, many of the hostnames in those URLs are included in the block lists. You'll have to temporarily disable the blocking hosts file (sudo mv, then sudo mv back) for the opt-out to work.

    Over the long weekend I plan to make this script smarter. I hope to figure out a way to incorporate it into VMs, too... if you're using Windows on VirtualBox, say, then when you update your hosts's hosts (haha), you can pull that into your VM's hosts file, too.

    I will likely start a new thread and move this over there, to improve discoverability.

    Leave a comment:


  • dibl
    replied
    Script to automate building an adblocking hosts file

    Wow -- verrrrry cool, Steve! I wish I could script like that.

    I might add, chromium-browser has the ghostery plugin that does a nice job of blocking trackers (and letting you see it).

    Leave a comment:


  • ScottyK
    replied
    Script to automate building an adblocking hosts file

    Originally posted by SteveRiley
    After comparing the performance of browser-based ad blockers to custom-crafted hosts files, I've concluded that the latter is better. I've found four reasonably updated sources -- the winhelp one is the largest and probably most familiar, but it seems to be updated less frequently than some of the others.
    I second this observation, however I don't have any technical information to back it up. Just notice that the browser seems to be a bit faster.

    The only advertisements that do sneak through that the browser ad-blocker used to catch are the occasional flash based. Looking for ways to block those, but at this point, I like the performance of the hosts file.

    Leave a comment:


  • Script to automate building an adblocking hosts file

    After comparing the performance of browser-based ad blockers to custom-crafted hosts files, I've concluded that the latter is better. I've found four reasonably updated sources -- the winhelp one is the largest and probably most familiar, but it seems to be updated less frequently than some of the others.

    So I've spent the last couple hours teaching myself bash scripts and especially the handy little sed utility. I've built a script that downloads the files, cleans out all their comments, de-duplicates entries, and merges the result with your system's original hosts file.

    Code:
    #!/bin/bash
    
    # If this is our first run, save a copy of the system's original hosts file and set to read-only for safety
    if [ ! -f ~/hosts-system ]
    then
     echo "Saving copy of system's original hosts file..."
     cp /etc/hosts ~/hosts-system
     chmod 444 ~/hosts-system
    fi
    
    # Perform work in temporary files
    temphosts1=$(mktemp)
    temphosts2=$(mktemp)
    
    # Obtain various hosts files and merge into one
    echo "Downloading ad-blocking hosts files..."
    wget -nv -O - http://winhelp2002.mvps.org/hosts.txt >> $temphosts1
    wget -nv -O - http://hosts-file.net/ad_servers.asp >> $temphosts1
    wget -nv -O - http://someonewhocares.org/hosts/hosts >> $temphosts1
    wget -nv -O - "http://pgl.yoyo.org/adservers/serverlist.php?hostformat=hosts&showintro=0&mimetype=plaintext" >> $temphosts1
    
    # Do some work on the file:
    # 1. Remove MS-DOS carriage returns
    # 2. Delete all lines that don't begin with 127.0.0.1
    # 3. Delete any lines containing the word localhost because we'll obtain that from the original hosts file
    # 4. Replace 127.0.0.1 with 0.0.0.0 because then we don't have to wait for the resolver to fail
    # 5. Scrunch extraneous spaces separating address from name into a single tab
    # 6. Delete any comments on lines
    # 7. Clean up leftover trailing blanks
    # Pass all this through sort with the unique flag to remove duplicates and save the result
    echo "Parsing, cleaning, de-duplicating, sorting..."
    sed -e 's/\r//' -e '/^127.0.0.1/!d' -e '/localhost/d' -e 's/127.0.0.1/0.0.0.0/' -e 's/ \+/\t/' -e 's/#.*$//' -e 's/[ \t]*$//' < $temphosts1 | sort -u > $temphosts2
    
    # Combine system hosts with adblocks
    echo Merging with original system hosts...
    echo -e "\n# Ad blocking hosts generated "$(date) | cat ~/hosts-system - $temphosts2 > ~/hosts-block
    
    # Clean up temp files and remind user to copy new file
    echo "Cleaning up..."
    rm $temphosts1 $temphosts2
    echo "Done."
    echo
    echo "Copy ad-blocking hosts file with this command:"
    echo " sudo cp ~/hosts-block /etc/hosts"
    echo
    echo "You can always restore your original hosts file with this command:"
    echo " sudo cp ~/hosts-system /etc/hosts"
    echo "so don't delete that file! (It's saved read-only for your protection.)"
    echo
    Save the text above into a file called ~/gethosts. Make it executable with this command:
    Code:
    chmod +x ~/gethosts
    To run the script, simply:
    Code:
    ~/gethosts
    The first time you run the script, it saves your existing /etc/hosts to ~/hosts-system because it will reuse this each time you run it. You can re-run the script whenever you feel like updating your ad blocking hosts.

    The script outputs the file ~/hosts-block. Each time you run it, you'll need to manually replace your existing host file with this command:
    Code:
    sudo cp ~/hosts-block /etc/hosts
    I think this would be a neat thing to schedule in /etc/cron.weekly, but I'll need to put some more smarts into it first. At least you can start playing around with it on your own now. Enjoy.

    Minor addition
    If you want slightly shorten the number of keystrokes required to run the utility, you can create a bin subdirectory in your home folder. When you start a shell, if the directory ~/bin exists, it is automatically added to your $PATH. Now, place the script in this subdirectory. Then you can simply run
    Code:
    gethosts
    without any of that extra tedious punctuation

    (Thanks to SecretCode for the idea!)
    Last edited by SteveRiley; Nov 14, 2013, 03:33 AM.
Working...
X