After comparing the performance of browser-based ad blockers to custom-crafted hosts files, I've concluded that the latter is better. I've found four reasonably updated sources -- the winhelp one is the largest and probably most familiar, but it seems to be updated less frequently than some of the others.
So I've spent the last couple hours teaching myself bash scripts and especially the handy little sed utility. I've built a script that downloads the files, cleans out all their comments, de-duplicates entries, and merges the result with your system's original hosts file.
Save the text above into a file called ~/gethosts. Make it executable with this command:
To run the script, simply:
The first time you run the script, it saves your existing /etc/hosts to ~/hosts-system because it will reuse this each time you run it. You can re-run the script whenever you feel like updating your ad blocking hosts.
The script outputs the file ~/hosts-block. Each time you run it, you'll need to manually replace your existing host file with this command:
I think this would be a neat thing to schedule in /etc/cron.weekly, but I'll need to put some more smarts into it first. At least you can start playing around with it on your own now. Enjoy.
Minor addition
If you want slightly shorten the number of keystrokes required to run the utility, you can create a bin subdirectory in your home folder. When you start a shell, if the directory ~/bin exists, it is automatically added to your $PATH. Now, place the script in this subdirectory. Then you can simply run
without any of that extra tedious punctuation
(Thanks to SecretCode for the idea!)
So I've spent the last couple hours teaching myself bash scripts and especially the handy little sed utility. I've built a script that downloads the files, cleans out all their comments, de-duplicates entries, and merges the result with your system's original hosts file.
Code:
#!/bin/bash # If this is our first run, save a copy of the system's original hosts file and set to read-only for safety if [ ! -f ~/hosts-system ] then echo "Saving copy of system's original hosts file..." cp /etc/hosts ~/hosts-system chmod 444 ~/hosts-system fi # Perform work in temporary files temphosts1=$(mktemp) temphosts2=$(mktemp) # Obtain various hosts files and merge into one echo "Downloading ad-blocking hosts files..." wget -nv -O - http://winhelp2002.mvps.org/hosts.txt >> $temphosts1 wget -nv -O - http://hosts-file.net/ad_servers.asp >> $temphosts1 wget -nv -O - http://someonewhocares.org/hosts/hosts >> $temphosts1 wget -nv -O - "http://pgl.yoyo.org/adservers/serverlist.php?hostformat=hosts&showintro=0&mimetype=plaintext" >> $temphosts1 # Do some work on the file: # 1. Remove MS-DOS carriage returns # 2. Delete all lines that don't begin with 127.0.0.1 # 3. Delete any lines containing the word localhost because we'll obtain that from the original hosts file # 4. Replace 127.0.0.1 with 0.0.0.0 because then we don't have to wait for the resolver to fail # 5. Scrunch extraneous spaces separating address from name into a single tab # 6. Delete any comments on lines # 7. Clean up leftover trailing blanks # Pass all this through sort with the unique flag to remove duplicates and save the result echo "Parsing, cleaning, de-duplicating, sorting..." sed -e 's/\r//' -e '/^127.0.0.1/!d' -e '/localhost/d' -e 's/127.0.0.1/0.0.0.0/' -e 's/ \+/\t/' -e 's/#.*$//' -e 's/[ \t]*$//' < $temphosts1 | sort -u > $temphosts2 # Combine system hosts with adblocks echo Merging with original system hosts... echo -e "\n# Ad blocking hosts generated "$(date) | cat ~/hosts-system - $temphosts2 > ~/hosts-block # Clean up temp files and remind user to copy new file echo "Cleaning up..." rm $temphosts1 $temphosts2 echo "Done." echo echo "Copy ad-blocking hosts file with this command:" echo " sudo cp ~/hosts-block /etc/hosts" echo echo "You can always restore your original hosts file with this command:" echo " sudo cp ~/hosts-system /etc/hosts" echo "so don't delete that file! (It's saved read-only for your protection.)" echo
Code:
chmod +x ~/gethosts
Code:
~/gethosts
The script outputs the file ~/hosts-block. Each time you run it, you'll need to manually replace your existing host file with this command:
Code:
sudo cp ~/hosts-block /etc/hosts
Minor addition
If you want slightly shorten the number of keystrokes required to run the utility, you can create a bin subdirectory in your home folder. When you start a shell, if the directory ~/bin exists, it is automatically added to your $PATH. Now, place the script in this subdirectory. Then you can simply run
Code:
gethosts
(Thanks to SecretCode for the idea!)
Comment