I am experimenting with getting the status code from the source domain
before trying to download from it as sometimes they have been down or
was blocked.
I also added an important awk command which removes all the subdomains
so the list only contains the TLD. This will make the list smaller and
increase performance. You can always remove whatever lists you want or
don't want but at some point, the list just gets too long for the lowly
Pi to parse through every time there is a query.
Also added a variable so this script knows about the wormhole.
4[0-9][0-9])echo"$httpResponse response from $domain: list will NOT be downloaded.";continue;;
5[0-9][0-9])echo"$httpResponse response from $domain: list will NOT be downloaded.";continue;;
*)echo"$httpResponse response from $domain";;
esac
# Download file only if newer
data=$(curl -s -z $saveLocation."$justDomainsExtension" -A "Mozilla/10.0""${sources[$i]}")
if[ -n "$data"];then
echo"Getting $domain list..."
echo"$data"|
# Parse out just the domains
# If field 1 has a "#" and field one has a "/" and field 2 has a "#" and if the line ($0) is not empty and field 2 is not empty, print the 2nd field, which should be just the domain name