{"id":516,"date":"2007-05-10T07:24:55","date_gmt":"2007-05-10T14:24:55","guid":{"rendered":"http:\/\/www.gubatron.com\/blog\/2007\/05\/10\/ejemplo-de-automatizacion-entre-2-maquinas-remotas-con-bash-scripting-y-python\/"},"modified":"2007-05-10T07:24:55","modified_gmt":"2007-05-10T14:24:55","slug":"ejemplo-de-automatizacion-entre-2-maquinas-remotas-con-bash-scripting-y-python","status":"publish","type":"post","link":"https:\/\/www.gubatron.com\/blog\/ejemplo-de-automatizacion-entre-2-maquinas-remotas-con-bash-scripting-y-python\/","title":{"rendered":"Ejemplo de automatizacion entre 2 maquinas remotas con bash scripting y Python"},"content":{"rendered":"<p>Para los amigos que se inician en el mundo *nix, ya sea con su nueva Mac, o con una PC corriendo Linux,<br \/>\nles recomiendo que aprendan a manejar bien los siguientes lenguajes, y el mundo sera suyo:<\/p>\n<p>&#8211; <strong>bash scripting<\/strong> (aliases, variables, exports, iteraciones, condicionales)<br \/>\n&#8211; <strong>python<\/strong> (para programar logica mas compleja y portatil en cualquier sistema operativo)<br \/>\n&#8211; Uso de comandos como grep, egrep, awk (editor de streams) entre otros<br \/>\n&#8211; Expresiones regulares<\/p>\n<p>En wedoit4you.com hicimos un simple script que va logeando visitas desde los blogs registrados.<br \/>\nLos blogeros ponen un pedacito de javascript, que al ser invocado, escribe una entrada en un log en el servidor.<\/p>\n<p>Luego tenemos un script que analiza ese log, elimina cualquier intento de hacer muchos clicks, etc. etc.<br \/>\nEse script se encarga luego de hacer matching de las URLs entrantes, con las URLs de los posts que wedoit4you.com<br \/>\nya leyo. Este script lamentablemente tarda mucho en analizar mas de 150mb de data, mas lo que haya en el log,<br \/>\ny Dreamhost lo mata si dura mas de un minuto, o si hay mas de N procesos corriendo.<\/p>\n<p>Que hacemos entonces?<\/p>\n<p>Ponemos ese script en una maquina local, donde tenemos todo el cpu, y hacemos que el servidor a ciertas horas<br \/>\ndel dia, haga un mysqldump de las tablas que me interesan (BLOGS, BLOG_POSTS, POST_HITS) y meta eso en un archivo<br \/>\ndisponible via HTTP.<\/p>\n<p><code><br \/>\n#!\/bin\/bash<br \/>\nDUMP_DIR=\/home\/cuenta_en_server\/sitio.com\/temp\/<br \/>\nSQL_FILE_NAME=clicktrackr_dump.sql<br \/>\nSQL_FILE=${DUMP_DIR}\/${SQL_FILE_NAME}<br \/>\nTGZ_FILE=${SQL_FILE_FNAME}.tar.gz<br \/>\nmysqldump bd_en_servidor BLOGS BLOG_POSTS POST_HITS > ${SQL_FILE}<br \/>\ncd ${DUMP_DIR}<br \/>\npwd<br \/>\necho Making tar<br \/>\ntar cfz ${TGZ_FILE} ${SQL_FILE}<br \/>\necho Tar with SQL dump ready to be downloaded.<br \/>\necho Finished.<br \/>\n<\/code><\/p>\n<p>Este script corre digamos a las 4am en el servidor.<\/p>\n<p><strong>Luego desde la casa<\/strong><br \/>\nLuego en la maquina local corre un cronjob a las 4:30am, En mi caso una apple iMac Intel, y he aqui el poder de tener una Mac basada en Unix, y no la cagada de windows de mierda.<\/p>\n<p>Hice sencillo bash script que se baja ese dump de la base de datos, y se baja el log del clickTracker para hacer los calculos en mi cpu,<br \/>\n(con el cual hago lo que me da la gana), los calculos son realizados con un script en python (click_tracker.py, incluido al final)<br \/>\ny una vez que termina de calcular, hace ftp de vuelta hacia el servidor y sube un archivo SQL<br \/>\ncon instrucciones SQL para que se actualizen los hits de los posts. Este archivo de lolas que no lo subo a un directorio disponible en apache<br \/>\npq alguien podria meterse con el y alterarnos los hits&#8230; Este Script local luce asi:<\/p>\n<p><code><br \/>\n#!\/bin\/bash<br \/>\nrm \/Users\/gubatron\/clicktrackr\/*.tar.gz<br \/>\nrm \/Users\/gubatron\/clicktrackr\/*.sql<br \/>\nrm \/Users\/gubatron\/clicktrackr\/*.dat<br \/>\nrm \/Users\/gubatron\/clicktrackr\/*.log<br \/>\necho \"Downloading dump from server...\"<br \/>\nwget http:\/\/www.wedoit4you.com\/dir_del_dump\/clicktrackr_dump.sql.tar.gz -O \/Users\/gubatron\/clicktrackr\/clicktrackr_dump.sql.tar.gz<br \/>\ncd \/Users\/gubatron\/clicktrackr\/<br \/>\necho \"Uncompressing Dump...\"<br \/>\ntar xfz clicktrackr_dump.sql.tar.gz<br \/>\necho \"Loading data in MySQL\"<br \/>\nmysql --user=usuario --password=password --database=bd_local < clicktrackr_dump.sql\necho \"Downloading latest tracker.log\"\nwget http:\/\/www.wedoit4you.com\/xxxxxxxxx\/logs\/tracker.log -O \/Users\/gubatron\/clicktrackr\/tracker.log\necho \"Crunching Data with python script\"\npython click_tracker.py\necho \"Compressing data crunched\"\ntar cvfz clicktrackr_update_tables.sql.tar.gz clicktrackr_update_tables.sql\necho \"Uploading data\"\n#then upload tar.gz clicktrackr_update_tables.sql                                                                                                      \nftp -u ftp:\/\/usuario:password@wedoit4you.com\/directorioNoAccesiblePorApache\/ clicktrackr_update_tables.sql.tar.gz\necho \"Finished\"\n<\/code><\/p>\n<p>La salida de este script cuando se ejecuta es similar ea esto<br \/>\n<code><br \/>\nimac:~ gubatron$ clicktrackr_processing<br \/>\nDownloading dump from server...<br \/>\n--09:45:14--  http:\/\/www.wedoit4you.com\/xxxxxxx\/clicktrackr_dump.sql.tar.gz<br \/>\n           => `\/Users\/gubatron\/clicktrackr\/clicktrackr_dump.sql.tar.gz'<br \/>\nResolving www.wedoit4you.com... 208.113.146.143<br \/>\nConnecting to www.wedoit4you.com|208.113.146.143|:80... connected.<br \/>\nHTTP request sent, awaiting response... 200 OK<br \/>\nLength: 41,227,814 [application\/x-tar]<\/p>\n<p>100%[====================================&gt;] 41,227,814    86.84K\/s    ETA 00:00<\/p>\n<p>09:53:18 (83.35 KB\/s) - `\/Users\/gubatron\/clicktrackr\/clicktrackr_dump.sql.tar.gz' saved [41227814\/41227814]<\/p>\n<p>Uncompressing Dump...<br \/>\nLoading data in MySQL<br \/>\n\/Users\/gubatron\/bin\/clicktrackr_processing: line 12: clicktrackr_dump.sql: No such file or directory<br \/>\nDownloading latest tracker.log<br \/>\n--09:53:20--  http:\/\/www.wedoit4you.com\/xxxxxxxxxxxxxx\/tracker.log<br \/>\n           =&gt; `\/Users\/gubatron\/clicktrackr\/tracker.log'<br \/>\nResolving www.wedoit4you.com... 208.113.146.143<br \/>\nConnecting to www.wedoit4you.com|208.113.146.143|:80... connected.<br \/>\nHTTP request sent, awaiting response... 200 OK<br \/>\nLength: 3,220 [text\/plain]<\/p>\n<p>100%[====================================&gt;] 3,220         --.--K\/s             <\/p>\n<p>09:53:23 (33.72 KB\/s) - `\/Users\/gubatron\/clicktrackr\/tracker.log' saved [3220\/3220]<\/p>\n<p>Crunching Data with python script<br \/>\n\/Users\/gubatron\/clicktrackr<br \/>\nNo timestamp from last time found.<br \/>\nLoading data from ClickTrackr log...<br \/>\nSaving ClickTrackr data to File...<br \/>\nSaving completed.<br \/>\nLoading Blogs and Last Posts from DB<br \/>\nSaving Blogs to File...<br \/>\nSaving completed.<br \/>\nLoading Posts from DB...<br \/>\nSaving Posts from DB on file<br \/>\nSaving completed.<br \/>\nCrunching data...<br \/>\n0 converted from blog to last post<br \/>\nDidnt find total 11 urls.<br \/>\nDidn't find distinct 11 urls.<br \/>\nSaving crunched data...<br \/>\nData saved.<br \/>\nWriting SQL...<br \/>\nFinished Writing SQL<br \/>\nWrote last timestamp.<br \/>\nCompressing data crunched<br \/>\nclicktrackr_update_tables.sql -&gt; clicktrackr_update_tables.sql.tar.gz<br \/>\nUploading data<br \/>\nConnected to wedoit4you.com.<br \/>\n220 ProFTPD 1.3.0rc2 Server (DreamHost FTP) [208.113.146.143]<br \/>\n331 Password required for wedoit4y.<br \/>\n230 User wedoit4y logged in.<br \/>\nRemote system type is UNIX.<br \/>\nUsing binary mode to transfer files.<br \/>\n200 Type set to I<br \/>\n250 CWD command successful<br \/>\nlocal: clicktrackr_update_tables.sql.tar.gz remote: clicktrackr_update_tables.sql.tar.gz<br \/>\n229 Entering Extended Passive Mode (|||57539|)<br \/>\n150 Opening BINARY mode data connection for clicktrackr_update_tables.sql.tar.gz<br \/>\n100% |*************************************|   550 KB  155.72 KB\/s    00:03<br \/>\n226 Transfer complete.<br \/>\n563661 bytes sent in 00:03 (143.48 KB\/s)<br \/>\nFinished<br \/>\n<\/code><\/p>\n<p>Una vez que la data fue procesada y FTPeada al servidor, hay otro cronjob que corre una hora mas<br \/>\ntarde, y este asume que el nuevo archivo con la data procesada estara ahi, podriamos agregarle mas<br \/>\nchecks, utilizando \"stat\" y anotando el ultimo timestamp del sql utlizado la vez anterior cosa que no<br \/>\nvolvamos a anotar los hits del dia anterior...<\/p>\n<p>Asi luce el script para actualizar finalmente en el servidor:<\/p>\n<p><code><br \/>\n#!\/bin\/bash<br \/>\nDIR_PRIVADO=\/home\/usuario\/dirPrivado<br \/>\nPATH_DEL_TRACKER_LOG=\/home\/usuario\/algunaCarpeta\/tracker.log<br \/>\ncd ${DIR_PRIVADO}<br \/>\nrm *.sql<br \/>\ntar xvfz clicktrackr_update_tables.sql.tar.gz<br \/>\nmysql bd_en_servidor < ${DIR_PRIVADO}\/clicktrackr_update_tables.sql\nrm *.tar.gz\nrm *.sql\nrm \ntouch ${PATH_DEL_TRACKER_LOG}\nchmod 777 ${PATH_DEL_TRACKER_LOG}\n<\/code><\/p>\n<p>Si tienes curiosidad de ver como cruncheo la data localmente, aqui esta el codigo en python.<br \/>\n(Es aun un trabajo en progreso)<br \/>\n<code><\/p>\n<pre>\n#!\/home\/wedoit4y\/bin\/python\/bin\/python2.5\n# This is the script that processes the ClickTrackr Log\nimport os\nimport sys\nimport pickle\nimport time\n\n#NAMES OF FILES WHERE WE'LL STORE THE DIFFERENT STAGES OF RETRIEVED\n#AND PROCESSED DATA.\n\n#File that holds a dictionary with URLs and HITs we got from the original log file\nFILE_MAX_AGE=3600*1\nFILE_TIMESTAMP=\"clicktrackr_last_timestamp.dat\"\nFILE_001=\"clicktrackr_001_url_hits.dat\"\nFILE_002=\"clicktrackr_002_blogs_lastposts.dat\"\nFILE_003=\"clicktrackr_003_posts_hits.dat\"\nFILE_004=\"clicktrackr_004_processed_hits.dat\" #...and urls not found\nFILE_SQL=\"clicktrackr_update_tables.sql\"\n\ntry:\n    import snowrss_config\n    from snowrss_config import getDbCursor\n    #from snowrss import *\nexcept Exception,e:\n    print \"Could not import snowrss_config [%s]\" % e\n    sys.exit()\n\ndef dbExec(sql):\n    \"\"\"Give it some SQL and it will return the returning cursor\"\"\"\n    try:\n        cursor = getDbCursor()\n        cursor.execute(sql)\n        cursor.connection.close()\n    except Exception, e:\n        #MySQL has gone away\n        print 'dbExec(%s): ' % unicode(sql)\n        print e\n        return None\n    return cursor\n\ndef isFileFresh(fileName):\n    \"\"\"\n    Returns True if the file is still good to be used.\n    Othewise returns false\n    \"\"\"\n    try:\n        file_stat = os.stat(fileName)\n        file_age = time.time() - file_stat.st_mtime\n        if file_age > FILE_MAX_AGE:\n            return False\n        return True\n    except:\n        return False\n\n\ndef getData(line):\n    \"\"\"Returns a dict with, IP, Timestamp, URL and User Agent if found\n\n    Parameters\n        line - A Line with a ClickTracker log entry\n\n    Output\n        {'ip':...,'timestamp':...,'url':....,'ua':...}\n        ip-> IP Addres\n        time -> Time of the event\n        url -> Referer Url\n        ua -> User Agent of the rerferer user\n    \"\"\"\n    l = line.split()\n    result = {}\n    result['ip']=l[0]\n    result['time']=l[1]\n    result['url']=l[2]\n\n    result['ua']='N\/A'\n    if len(l)>3:\n        rest = l[3:]\n        ua_name = ''\n        for b in rest:\n            ua_name = ua_name + ' ' + b\n        result['ua'] = ua_name\n\n    return result\n\n#Maximum time to count a click from the same IP on the same URL\nTIME_BETWEEN_CLICKS = 12*3600\n\n#On the last run (if finished, we write down the time of the last timestamp on file)\n#If we did finish a run, we'll get this number from the timestamp file, and we'll ignore\n#all previous log entries to that timestamp.\nLAST_TIMESTAMP = None\nPOSSIBLE_LAST_TIMESTAMP = None\n\ntry:\n    f = fopen(FILE_TIMESTAMP,\"rb\")\n    LAST_TIMESTAMP = pickle.load(f)\n    LAST_TIMESTAMP = long(LAST_TIMESTAMP)\n    f.close()\nexcept:\n    print \"No timestamp from last time found.\"\n\nurls = {}\nurls_not_found = {}\nLOG_CLICK_TRACKER='tracker.log'\n\n#check if there is a version of the log file backed that's still good enough to be used.\nUSABLE_LOG_FILE = LOG_CLICK_TRACKER\n\n#use a copy of the log if we got some pickled data\nif isFileFresh(LOG_CLICK_TRACKER + '.last') and isFileFresh(FILE_001):\n    USABLE_LOG_FILE = LOG_CLICK_TRACKER + \".last\"\n\nIGNORED_ENTRIES = 0\nif not isFileFresh(FILE_001):\n\n    #open the tracker log (current or old)\n    print \"Loading data from ClickTrackr log...\"\n    f = open(USABLE_LOG_FILE,'r')\n\n    f.seek(0,2)\n    eof = f.tell()\n    f.seek(0)\n\n    while f.tell() < eof:\n        entry = getData(f.readline())\n\n        url = entry['url']\n        ip = entry['ip']\n        timestamp = entry['time']\n\n        if LAST_TIMESTAMP is not None and long(timestamp) < LAST_TIMESTAMP:\n            print \"i\",\n            IGNORED_ENTRIES += 1\n            continue\n\n        POSSIBLE_LAST_TIMESTAMP = long(timestamp)\n\n        if not url.startswith('http') or \n           url.startswith('http:\/\/babelfish.altavista.com') or \n           url.startswith('http:\/\/6'):\n            #IGNORED_ENTRIES += 1\n            continue\n\n        #Ask if this URL is already there\n        if urls.has_key(url):\n            #Ask if this IP is already there\n            if urls[url].has_key(ip):\n                #Get the last time stamp inside this IP\n                times = urls[url][ip]\n                last_time = times[len(times)-1]\n                delta_time = long(timestamp) - long(last_time)\n                #If its been more than acceptable time\n                if delta_time >= TIME_BETWEEN_CLICKS:\n                    urls[url][ip].append(timestamp)\n\n                    hits = 0\n                    for ipbuffer in urls[url]:\n                        if ipbuffer == 'hits': #just count the keys that are not 'hits'\n                            continue\n                        hits += len(urls[url][ipbuffer])\n\n                    urls[url]['hits'] = hits\n            else:\n                urls[url][ip] = [timestamp]\n                urls[url]['hits'] = 1\n        else:\n            urls[url]={}\n            urls[url][ip] = [timestamp]\n            urls[url]['hits']=1\n    f.close()\n\n    urls['POSSIBLE_LAST_TIMESTAMP'] = POSSIBLE_LAST_TIMESTAMP\n\n    #we serialize this data for later\n    if IGNORED_ENTRIES > 0:\n        print \"Ignored %d entries.\" % IGNORED_ENTRIES\n\n    print \"Saving ClickTrackr data to File...\"\n    f = file(FILE_001,\"wb\")\n    pickle.dump(urls,f)\n    f.close()\n    print \"Saving completed.\"\n\n    #we make a backup of the current ClickTrackr log (.last), in case we need to run again\n    #we can diff with this to know from where to relog in the future\n    os.system(\"cp %s %s\" % (LOG_CLICK_TRACKER,LOG_CLICK_TRACKER + \".last\"))\nelse:\n    #we unserialize the data\n    print \"Loading ClickTrackr data from existing file...\"\n    f = file(FILE_001,\"rb\")\n    urls = pickle.load(f)\n    f.close()\n    POSSIBLE_LAST_TIMESTAMP = urls.pop('POSSIBLE_LAST_TIMESTAMP') #we popup so we have only urls and we dont modify further down\n    print \"Loading completed.\"\n\n#LOAD ALL BLOG POST URLS, IDS AND CURRENT NUMBER OF HITS.\nblog_urls = {} #blogs hashed by their urls, Buckets have {'post_id':<last_post_id>,'post_link':<last_post_link>}\nblog_ids = {} #blogs hashed by their ids, Buckets have {'post_id':<last_post_id>,'post_link':<last_post_link>}\nif not isFileFresh(FILE_002):\n    print \"Loading Blogs and Last Posts from DB\"\n    sql = \"SELECT Blog_pk_id, Blog_url FROM BLOGS WHERE Blog_active=1;\"\n    cursor = dbExec(sql)\n    results = cursor.fetchall()\n\n    for r in results:\n        #Get the ID of the last post on each blog\"\n        sql = u\"SELECT BP_pk_id,BP_link FROM BLOG_POSTS WHERE BP_fk_blog_id = %d ORDER BY BP_pk_id DESC LIMIT 1\" % (r['Blog_pk_id']);\n        cursor = dbExec(sql)\n        last_post = cursor.fetchone()\n\n        if last_post:\n            blog_urls[r['Blog_url']] = {'post_id':last_post['BP_pk_id'],'post_link':last_post['BP_link']}\n            blog_ids[r['Blog_pk_id']] = {'post_id':last_post['BP_pk_id'],'post_link':last_post['BP_link']}\n\n    #serialize blog_urls and blog_ids\n    print \"Saving Blogs to File...\"\n    f = file(FILE_002,\"wb\")\n    pickle.dump(blog_urls,f)\n    pickle.dump(blog_ids,f)\n    f.close()\n    print \"Saving completed.\"\nelse:\n    #load blog_urls from serialized data\n    print \"Loading Blogs and Last Posts from File...\"\n    f = file(FILE_002,\"rb\")\n    blog_urls = pickle.load(f)\n    blog_ids = pickle.load(f)\n    f.close()\n    print \"Loading completed.\"\n\n#LOAD ALL BLOG_POSTS URL AND ITS HITS\npost_hits = {} #posts hashed by url, Buckets have (post_id, post_hits, blog_id)\nif not isFileFresh(FILE_003):\n    print \"Loading Posts from DB...\"\n    sql = \"SELECT SQL_CACHE BP_link, BP_pk_id, BP_fk_blog_id, PH_hits \"\n    sql += \"FROM BLOG_POSTS LEFT JOIN POST_HITS ON BP_pk_id = PH_fk_post_id;\"\n    cursor = dbExec(sql)\n    results = cursor.fetchall()\n\n    for r in results:\n        #hits might be null, if the post has never been reached on our page\n        hit_count = int(r['PH_hits']) if r['PH_hits'] is not None else 0\n        post_hits[r['BP_link']] = {'post_id':int(r['BP_pk_id']),\n                                   'post_hits':hit_count,\n                                   'blog_id':r['BP_fk_blog_id']}\n\n    #now get the blog_posts\n    print \"Saving Posts from DB on file\"\n    f = file(FILE_003,\"wb\")\n    pickle.dump(post_hits,f)\n    f.close()\n    print \"Saving completed.\"\nelse:\n    print \"Loading Posts from File...\"\n    f = file(FILE_003,\"rb\")\n    post_hits = pickle.load(f)\n    f.close()\n    print \"Loading completed.\"\n\n\n# The stars of the game are:\n# - urls {<url>:{'ip':<ip>,'hits':<hits>}} \/\/The urls and how many hits we got from the click trackr log\n# - blog_urls {<url>:{'post_id':<last_post_id>,'post_link':<last_post_link>}} \/\/urls of blogs, holding each a tuple with last post info\n# - post_hits {<url>:{'post_id':<post_id>,'post_hits':<post_hits>,'blog_id':<blog_id>]} \/\/urls and hit info of all posts\n# - urls_not_found {'<url>':<no_times_found_in_log>}\nif not isFileFresh(FILE_004):\n    total_not_found = 0\n    distinct_not_found = 0\n    total_converted = 0\n\n    converting_blog_url_to_post_url = False\n\n    print \"Crunching data...\"\n    for url in urls:\n\t#if the current url is the home of a blog\n        #we try to see if the blog has any hits.\n\tif blog_urls.has_key(url):\n            url = blog_urls[url]['post_link']\n            converting_blog_url_to_post_url = True\n\n        #if you find a direct match add the hits right away\n        if post_hits.has_key(url):\n\t    new_hits = 0\n\t    if urls.has_key(url) and urls[url].has_key('hits'):\n                new_hits = urls[url]['hits']\n                if converting_blog_url_to_post_url:\n                    total_converted+=1\n                    print \"!\",\n            \n            old_hits = 0\n            if post_hits[url].has_key('post_hits'):\n                old_hits = post_hits[url]['post_hits']\n\n            total_hits = new_hits + old_hits\n\n            #finally update post_hits arrays.\n            post_hits[url]['post_hits'] = total_hits\n            str_a = \"(%(post_id)d):%(post_hits)d:\" % post_hits[url] \n            str_b = \"%d+%d)\" % (new_hits,old_hits)\n            str_c = str_a + str_b\n            print str_c,\n        else:\n            if urls_not_found.has_key(url):\n                urls_not_found[url] += 1\n            else:\n                urls_not_found[url]=1\n                distinct_not_found +=1\n            total_not_found += 1\n            print \"-\",\n\n    print\n    print \"%d converted from blog to last post\" % total_converted\n    print \"Didnt find total %d urls.\" % total_not_found\n    print \"Didn't find distinct %d urls.\" % distinct_not_found\n\n    #serialize processed data in file 4\n    print \"Saving crunched data...\"\n    f = file(FILE_004,\"wb\")\n    pickle.dump(post_hits,f)\n    pickle.dump(urls_not_found,f)\n    f.close()\n    print \"Data saved.\"\nelse:\n    print \"Loading Previously Crunched Data...\"\n    f = file(FILE_004,\"rb\")\n    post_hits = pickle.load(f)\n    urls_not_found = pickle.load(f)\n    f.close\n    print \"Loading completed.\"\n\n\n    #If we can't find it on the blog posts, we could try to\n    #slim the URL of this url too http:\/\/servername.com\/folder\n    #and look up on the blog url\n\n    #if nothing, we slim down to http:\/\/servername.com\n\n    #if in any of these 2 cases we find a match then\n    #we add a hit on the last post of this blog\n\n    #the output of this file should be for now a file with SQL insert statements\nprint len(post_hits)\n\n#Generate SQL output from post_hits array\n# - post_hits {<url>:{'post_id':<post_id>,'post_hits':<post_hits>,'blog_id':<blog_id>]}\nprint \"Writing SQL...\"\nf = file(FILE_SQL,\"wb\")\nfor url in post_hits:\n    post_id = post_hits[url]['post_id']\n    hits = post_hits[url]['post_hits']\n    f.writelines(\"UPDATE POST_HITS SET PH_hits = %d WHERE PH_fk_post_id = %d;n\" % (hits,post_id))\nf.close()\nprint \"Finished Writing SQL\"\n\n#If we make it all the way till here, we write down the new LAST_TIMESTAMP\nif POSSIBLE_LAST_TIMESTAMP is not None:\n    f = file(FILE_TIMESTAMP,\"wb\")\n    pickle.dump(POSSIBLE_LAST_TIMESTAMP,f)\n    f.close()\n    print \"Wrote last timestamp.\"\nelse:\n    print \"Did not write LAST timestamp.\"\n\nfor u in urls_not_found:\n    print u\n<\/pre>\n<p><\/code><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Para los amigos que se inician en el mundo *nix, ya sea con su nueva Mac, o con una PC corriendo Linux, les recomiendo que aprendan a manejar bien los siguientes lenguajes, y el mundo sera suyo: &#8211; bash scripting (aliases, variables, exports, iteraciones, condicionales) &#8211; python (para programar logica mas compleja y portatil en [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[15,30,43,45,65],"tags":[],"class_list":["post-516","post","type-post","status-publish","format-standard","hentry","category-code","category-geeklife","category-linux","category-mac-osx","category-python"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p5Unzf-8k","jetpack-related-posts":[{"id":3821,"url":"https:\/\/www.gubatron.com\/blog\/bash-scripting-contains_item-bash-function-check-if-an-item-is-in-an-array\/","url_meta":{"origin":516,"position":0},"title":"[bash scripting] `contains_item` bash function. Check if an item is in an array","author":"gubatron","date":"September 6, 2019","format":false,"excerpt":"","rel":"","context":"In &quot;Code&quot;","block_context":{"text":"Code","link":"https:\/\/www.gubatron.com\/blog\/category\/code\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":3352,"url":"https:\/\/www.gubatron.com\/blog\/bash-scripting-how-to-get-a-files-name-without-its-extensions\/","url_meta":{"origin":516,"position":1},"title":"[bash scripting] How to get a file&#8217;s name without its extension(s).","author":"gubatron","date":"September 6, 2014","format":false,"excerpt":"Say you have an encrypted file file.foo.gpg and you want to make a shorthand command to decrypt that file, you'll want the resulting file to be named file.foo (without the .gpg), or say you want the name, with no extension?), you can use bash's magic variable voodo for that. A\u2026","rel":"","context":"In &quot;Code&quot;","block_context":{"text":"Code","link":"https:\/\/www.gubatron.com\/blog\/category\/code\/"},"img":{"alt_text":"Screen Shot 2014-09-06 at 4.29.38 PM","src":"https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2014\/09\/Screen-Shot-2014-09-06-at-4.29.38-PM.png?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2014\/09\/Screen-Shot-2014-09-06-at-4.29.38-PM.png?resize=350%2C200 1x, https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2014\/09\/Screen-Shot-2014-09-06-at-4.29.38-PM.png?resize=525%2C300 1.5x"},"classes":[]},{"id":1118,"url":"https:\/\/www.gubatron.com\/blog\/delete-all-direct-messages-of-your-twitter-account-at-once-or-at-least-try\/","url_meta":{"origin":516,"position":2},"title":"Delete All Direct Messages of your Twitter Account at once (or at least try!)","author":"gubatron","date":"February 13, 2009","format":false,"excerpt":"Since Twitter doesn't provide with a \"Delete All Direct Messages\" functionality, Here's a Python script that attempts to delete all the direct messages stored on your Twitter account. Limitations The only problem with it is that given the limitations of the Twitter REST API, I was forced to send a\u2026","rel":"","context":"In &quot;Code&quot;","block_context":{"text":"Code","link":"https:\/\/www.gubatron.com\/blog\/category\/code\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":360,"url":"https:\/\/www.gubatron.com\/blog\/como-matar-varios-procesos-cuando-killall-no-es-una-opcion\/","url_meta":{"origin":516,"position":3},"title":"Como matar varios procesos cuando killall no es una opcion.","author":"gubatron","date":"August 15, 2006","format":false,"excerpt":"A veces tienes un cronjob que se queda pegado por mucho rato, cuando haces Code: ps aux | grep miPrograma tienes un monton de instancias pegadas!! Intentas hacer killall miPrograma pero no funciona porque quizas es un programa que estas arrancando con un interprete, como python, o php, o perl.\u2026","rel":"","context":"In &quot;Geeklife&quot;","block_context":{"text":"Geeklife","link":"https:\/\/www.gubatron.com\/blog\/category\/geeklife\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2767,"url":"https:\/\/www.gubatron.com\/blog\/ubuntu-packages-for-a-kick-ass-web-server\/","url_meta":{"origin":516,"position":4},"title":"ubuntu packages for a kick ass web server","author":"gubatron","date":"September 7, 2012","format":false,"excerpt":"Copy and paste the following list on a file, say \"packages.txt\". To install all just do: sudo apt-get install $(cat packages.txt) accountsservice acpid adduser ant ant-optional apache2-utils apparmor apport apport-symptoms apt apt-transport-https apt-utils apt-xapian-index aptitude at base-files base-passwd bash bash-completion bc bind9-host bsdmainutils bsdutils busybox-initramfs busybox-static byobu bzip2 ca-certificates ca-certificates-java\u2026","rel":"","context":"In &quot;Code&quot;","block_context":{"text":"Code","link":"https:\/\/www.gubatron.com\/blog\/category\/code\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1457,"url":"https:\/\/www.gubatron.com\/blog\/map-commands-to-servers-via-ssh\/","url_meta":{"origin":516,"position":5},"title":"Quick N Dirty way to Map Commands to remote servers via ssh","author":"gubatron","date":"October 10, 2009","format":false,"excerpt":"You may be running several independent but similar servers at the same time and wasting time by executing commands in all of them one by one. Wouldn't it be nice to send a command to all of them at once? or to monitor all of them at once. The following\u2026","rel":"","context":"In &quot;Code&quot;","block_context":{"text":"Code","link":"https:\/\/www.gubatron.com\/blog\/category\/code\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/posts\/516","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/comments?post=516"}],"version-history":[{"count":0,"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/posts\/516\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/media?parent=516"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/categories?post=516"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/tags?post=516"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}