Scripts

Bash snippets to download data from PhoneArena and extract features.

Download PhoneArena index pages

curl -s "http://www.phonearena.com/phones/page/[1-208]" -o "page_#1.html"

Extract phones list:

sed -n 's|.*href="/phones/\([^"]\+\)".*|\1|p' page_*.html

Download PhoneArena phone pages using curl

mkdir phones
for phone in `cat phones.txt`; do
    if ! [ -f phones/${phone}.html ]; then
        curl -s "http://www.phonearena.com/phones/$phone" -o phones/${phone}.html;
    fi;
done

Extract phones pros-cons lists to a colon-separated file, first value is the phone, the rest are sentences

 for phone in `cat phones.txt`; do
     echo -n $phone"|";
     echo `sed -n '/<div class="proscons">/,/<\/div>/p' phones/${phone}.html | \
     sed -n 's/.*<li>\([^<]*\)<.*/\1/p' | paste -sd "|" -`;
 done > proscons.csv

Extract all unique pros and cons items

There are only about 37 of them

 cat pros_cons.csv | cut -d'|' -f2- | tr "|" "\n" | sed -e 's/^[ \t]*//;s/[ \t]*$//' | sort | uniq > tagged.csv

Extract all descriptions from phone pages

for phone in `cat phones.txt`; do
    echo -n $phone"|";
    echo `sed -n 's| *<meta name="description" content="\([^"]*\)"/>|\1|p' phones/$phone.html | \
    sed 's/[&#][^;]*;//g'`;
done > desc.csv

Download PhoneArena phone reviews using curl

For future reference. I did not have the time to analyze them during this project

mkdir reviews
for phone in `cat phones.txt`; do
    if ! [ -f reviews/${phone}.html ]; then
        curl -s "http://www.phonearena.com/phones/$phone/reviews" -o reviews/${phone}.html;
    fi;
done

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts.md

scripts.md

Scripts

Download PhoneArena index pages

Extract phones list:

Download PhoneArena phone pages using curl

Extract phones pros-cons lists to a colon-separated file, first value is the phone, the rest are sentences

Extract all unique pros and cons items

Extract all descriptions from phone pages

Download PhoneArena phone reviews using curl

Files

scripts.md

Latest commit

History

scripts.md

File metadata and controls

Scripts

Download PhoneArena index pages

Extract phones list:

Download PhoneArena phone pages using curl

Extract phones pros-cons lists to a colon-separated file, first value is the phone, the rest are sentences

Extract all unique pros and cons items

Extract all descriptions from phone pages

Download PhoneArena phone reviews using curl