How to build your Docker image using the same Dockerfile regardless of the host architecture

Problem

If you are now using docker on a Mac M1 (arm64 platform), you don’t want to use amd64 as the architecture for your Linux Images.

You could have 2 lines on your Dockerfile and comment each one depending on where you’re building the image

Dockerfile

# Building on Apple Silicon host
FROM --platform=linux/arm64 ubuntu:20.04

# Building on Intel/x86_64 host
#FROM --platform=linux/amd64 ubuntu:20.04

Eventually this becomes very annoying.

Solution

You can pass a build time argument when you invoke docker build

Put this on your Dockerfile:

ARG BUILDPLATFORM
FROM --platform=linux/$BUILDPLATFORM ubuntu:20.04

On your docker_build_image.sh script:

export BUILDPLATFORM=`uname -m`
docker build --build-arg BUILDPLATFORM=${BUILDPLATFORM} -t myimagename .

How to resize a VirtualBox fixed size virtual drive (.vdi) on Mac

So you created a disk for your Windows or Linux VirtualBox VM and you made the mistake of not creating the drive as a dynamically expanding storage drive, you chose fixed size.

Now you’re running out of space and all your Google searches point you to stupid posts on the virtualbox.org forums that lead to nowhere, specially if you are a MacOSX user, look no further, you’ve found the solution to your problem on this post.

VirtualBox for Mac comes with a tool called VBoxManage, this tool is all you need to resize your virtual disk.

When you’re using the VBoxManage tool make sure your VM is not running, stop it completely to be safe.

Step 0. Backup your current drive (optional)

I recommend that before you do anything you clone your existing drive, just in case…

You can clone your drive using guess what? yes, the VBoxManage tool. No need to download any external tools.

[bash]$ VboxManage clonehd <path to your original drive> <path of the copy>[/bash]

in my computer it looked exactly like this:

[bash]$ VBoxManage clonehd /Users/gubatron/VirtualBox VMs/windows7-64bit/windows7-64bit.vdi /Users/gubatron/windows7-64bit.cloned.vdi
0%…10%…20%…30%…40%…50%…60%…70%…80%…90%…100%[/bash]

If you don’t know where your .vdi file is, it’s very simple, right Click your VM > Settings > Storage, and mouse over the .vdi to see it’s physical location on your Mac hardrive.


Mouse over your virtual drive if you don’t know its location

Step 1. Resize!

The command to resize is:
[bash]$ VBoxManage modifyhd <path to your vdi> –resize <new size in megabytes>[/bash]

In my case I had a 20GB drive that I wanted to double in size (40GB), a quick Google search for “40GB to megabytes” yields the number 40960, so that’s what I put on my –resize parameter.

This is how it looked for me:
[bash]$ VBoxManage modifyhd /Users/gubatron/VirtualBox VMs/windows7-64bit/windows7-64bit.vdi –resize 40960
0%…10%…20%…30%…40%…50%…60%…70%…80%…90%…100%[/bash]

The ‘resizing’ is almost immediate.

Step 2. (Re)Start VirtualBox

After you’ve physically resized your drive, both VirtualBox and your OS should know about the changes.

At this point, clicking on your VM in the VirtualBox window will show that it still has the old size:

Before: (Note the size of the drive at the bottom of the image)

After doing the resize, shutdown completely your VirtualBox, and launch it again. When you see the information about the storage drives attached to your VM you should see the new disk size.

After: (Note the size of the drive at the bottom of the image)

Step 3. Let your OS know the new logical size of the drive


Your OS is still not aware of the changes, you need to let it know about them

When you start your VM, it won’t know right away what the new size of the disk is, you’ll have let the operating system know that there is additional space it can use.

If your VM is running Windows 7, the way to let the operating system know that it can use the additional space on your drive is very straight forward.

Click on the Windows (Start) Icon on the Bar > Right Click on “Computer” > Manage (You’ll need to be an Administrator)

The “Computer Management” window will open, in it go to:

Storage > Disk Management > Right Click on the partition you want to extend.

Select the option that says “Extend”, a Wizard will open, just hit Next until it ends if you want to use the whole space.

After you’re done, if you refresh your “Computer” on the File explorer the resized drive should show right away

That’s it, enjoy and let me know how it went.

What’s new in FrostWire 4.13.5

FOR IMMEDIATE RELEASE:

FrostWire 4.13.5 is now available for MS Windows, Mac OSX and Linux. Major updates improve network bootstraping and peer discovery. 4.13.5 includes improvements on the Chatroom tab, Audio Previews and more.

Other improvements have taken place for the FrostWire build process (for developers this means true One-Step builds for all versions). Updates on translations have been made thanks to the feedback from users in Poland and throughout Latin America. .

In more detail users can expect the following:

  • Faster peer discovery on connection bootstraping. No more “Starting Connection…” problems, first time users will connect faster without using the official FixConnecting.zip patch.
  • Smiley Support to the chatroom

Users can see the available smileys by entering the command
/smileys

Now its possible to see and use Smileys from the Community Chat tab, Smiley display can be enabled or disabled from the view menu:

Show Smileys

Users can also toggle Smiley display directly from the chat window by typing the command
/tsmileys

  • Fixed wording on Spanish and Polish translations.

Bug Fixes and other improvements for this release also include:

  • FrostWire Message Update System improved. Per community request, some announcements will not be shown more than once so the user is not annoyed upon every application launch
  • Fixed bugs on the media player and playlists on Preview.
  • Fixed bug on search box auto-focusing while a search was running.
  • Fixed i18n system error for systems which default language is not english
  • Potential bugs related to deprecated code gone

Users can find now by details without the auto-focusing problem.

FrostWire 4.13.5 is expected to be the last of the 4.13.x series.

About FrostWire

FrostWire, a Gnutella Peer-to-Peer client, is a collaborative effort from many Open Source and freelance developers located from all around the world. In late 2005, concerned developers of LimeWire’s open source community announced the start of a new project fork “FrostWire” that would protect the developmental source code of the LimeWire client and any improvements to the Gnutella protocol design. The developers of FrostWire give high regard and respect to the GNU General Public License and consider it to be the ideal foundation of a creative and free enterprise market.

Ejemplo de automatizacion entre 2 maquinas remotas con bash scripting y Python

Para los amigos que se inician en el mundo *nix, ya sea con su nueva Mac, o con una PC corriendo Linux,
les recomiendo que aprendan a manejar bien los siguientes lenguajes, y el mundo sera suyo:

bash scripting (aliases, variables, exports, iteraciones, condicionales)
python (para programar logica mas compleja y portatil en cualquier sistema operativo)
– Uso de comandos como grep, egrep, awk (editor de streams) entre otros
– Expresiones regulares

En wedoit4you.com hicimos un simple script que va logeando visitas desde los blogs registrados.
Los blogeros ponen un pedacito de javascript, que al ser invocado, escribe una entrada en un log en el servidor.

Luego tenemos un script que analiza ese log, elimina cualquier intento de hacer muchos clicks, etc. etc.
Ese script se encarga luego de hacer matching de las URLs entrantes, con las URLs de los posts que wedoit4you.com
ya leyo. Este script lamentablemente tarda mucho en analizar mas de 150mb de data, mas lo que haya en el log,
y Dreamhost lo mata si dura mas de un minuto, o si hay mas de N procesos corriendo.

Que hacemos entonces?

Ponemos ese script en una maquina local, donde tenemos todo el cpu, y hacemos que el servidor a ciertas horas
del dia, haga un mysqldump de las tablas que me interesan (BLOGS, BLOG_POSTS, POST_HITS) y meta eso en un archivo
disponible via HTTP.


#!/bin/bash
DUMP_DIR=/home/cuenta_en_server/sitio.com/temp/
SQL_FILE_NAME=clicktrackr_dump.sql
SQL_FILE=${DUMP_DIR}/${SQL_FILE_NAME}
TGZ_FILE=${SQL_FILE_FNAME}.tar.gz
mysqldump bd_en_servidor BLOGS BLOG_POSTS POST_HITS > ${SQL_FILE}
cd ${DUMP_DIR}
pwd
echo Making tar
tar cfz ${TGZ_FILE} ${SQL_FILE}
echo Tar with SQL dump ready to be downloaded.
echo Finished.

Este script corre digamos a las 4am en el servidor.

Luego desde la casa
Luego en la maquina local corre un cronjob a las 4:30am, En mi caso una apple iMac Intel, y he aqui el poder de tener una Mac basada en Unix, y no la cagada de windows de mierda.

Hice sencillo bash script que se baja ese dump de la base de datos, y se baja el log del clickTracker para hacer los calculos en mi cpu,
(con el cual hago lo que me da la gana), los calculos son realizados con un script en python (click_tracker.py, incluido al final)
y una vez que termina de calcular, hace ftp de vuelta hacia el servidor y sube un archivo SQL
con instrucciones SQL para que se actualizen los hits de los posts. Este archivo de lolas que no lo subo a un directorio disponible en apache
pq alguien podria meterse con el y alterarnos los hits… Este Script local luce asi:


#!/bin/bash
rm /Users/gubatron/clicktrackr/*.tar.gz
rm /Users/gubatron/clicktrackr/*.sql
rm /Users/gubatron/clicktrackr/*.dat
rm /Users/gubatron/clicktrackr/*.log
echo "Downloading dump from server..."
wget http://www.wedoit4you.com/dir_del_dump/clicktrackr_dump.sql.tar.gz -O /Users/gubatron/clicktrackr/clicktrackr_dump.sql.tar.gz
cd /Users/gubatron/clicktrackr/
echo "Uncompressing Dump..."
tar xfz clicktrackr_dump.sql.tar.gz
echo "Loading data in MySQL"
mysql --user=usuario --password=password --database=bd_local < clicktrackr_dump.sql echo "Downloading latest tracker.log" wget http://www.wedoit4you.com/xxxxxxxxx/logs/tracker.log -O /Users/gubatron/clicktrackr/tracker.log echo "Crunching Data with python script" python click_tracker.py echo "Compressing data crunched" tar cvfz clicktrackr_update_tables.sql.tar.gz clicktrackr_update_tables.sql echo "Uploading data" #then upload tar.gz clicktrackr_update_tables.sql ftp -u ftp://usuario:password@wedoit4you.com/directorioNoAccesiblePorApache/ clicktrackr_update_tables.sql.tar.gz echo "Finished"

La salida de este script cuando se ejecuta es similar ea esto

imac:~ gubatron$ clicktrackr_processing
Downloading dump from server...
--09:45:14-- http://www.wedoit4you.com/xxxxxxx/clicktrackr_dump.sql.tar.gz
=> `/Users/gubatron/clicktrackr/clicktrackr_dump.sql.tar.gz'
Resolving www.wedoit4you.com... 208.113.146.143
Connecting to www.wedoit4you.com|208.113.146.143|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 41,227,814 [application/x-tar]

100%[====================================>] 41,227,814 86.84K/s ETA 00:00

09:53:18 (83.35 KB/s) - `/Users/gubatron/clicktrackr/clicktrackr_dump.sql.tar.gz' saved [41227814/41227814]

Uncompressing Dump...
Loading data in MySQL
/Users/gubatron/bin/clicktrackr_processing: line 12: clicktrackr_dump.sql: No such file or directory
Downloading latest tracker.log
--09:53:20-- http://www.wedoit4you.com/xxxxxxxxxxxxxx/tracker.log
=> `/Users/gubatron/clicktrackr/tracker.log'
Resolving www.wedoit4you.com... 208.113.146.143
Connecting to www.wedoit4you.com|208.113.146.143|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3,220 [text/plain]

100%[====================================>] 3,220 --.--K/s

09:53:23 (33.72 KB/s) - `/Users/gubatron/clicktrackr/tracker.log' saved [3220/3220]

Crunching Data with python script
/Users/gubatron/clicktrackr
No timestamp from last time found.
Loading data from ClickTrackr log...
Saving ClickTrackr data to File...
Saving completed.
Loading Blogs and Last Posts from DB
Saving Blogs to File...
Saving completed.
Loading Posts from DB...
Saving Posts from DB on file
Saving completed.
Crunching data...
0 converted from blog to last post
Didnt find total 11 urls.
Didn't find distinct 11 urls.
Saving crunched data...
Data saved.
Writing SQL...
Finished Writing SQL
Wrote last timestamp.
Compressing data crunched
clicktrackr_update_tables.sql -> clicktrackr_update_tables.sql.tar.gz
Uploading data
Connected to wedoit4you.com.
220 ProFTPD 1.3.0rc2 Server (DreamHost FTP) [208.113.146.143]
331 Password required for wedoit4y.
230 User wedoit4y logged in.
Remote system type is UNIX.
Using binary mode to transfer files.
200 Type set to I
250 CWD command successful
local: clicktrackr_update_tables.sql.tar.gz remote: clicktrackr_update_tables.sql.tar.gz
229 Entering Extended Passive Mode (|||57539|)
150 Opening BINARY mode data connection for clicktrackr_update_tables.sql.tar.gz
100% |*************************************| 550 KB 155.72 KB/s 00:03
226 Transfer complete.
563661 bytes sent in 00:03 (143.48 KB/s)
Finished

Una vez que la data fue procesada y FTPeada al servidor, hay otro cronjob que corre una hora mas
tarde, y este asume que el nuevo archivo con la data procesada estara ahi, podriamos agregarle mas
checks, utilizando "stat" y anotando el ultimo timestamp del sql utlizado la vez anterior cosa que no
volvamos a anotar los hits del dia anterior...

Asi luce el script para actualizar finalmente en el servidor:


#!/bin/bash
DIR_PRIVADO=/home/usuario/dirPrivado
PATH_DEL_TRACKER_LOG=/home/usuario/algunaCarpeta/tracker.log
cd ${DIR_PRIVADO}
rm *.sql
tar xvfz clicktrackr_update_tables.sql.tar.gz
mysql bd_en_servidor < ${DIR_PRIVADO}/clicktrackr_update_tables.sql rm *.tar.gz rm *.sql rm touch ${PATH_DEL_TRACKER_LOG} chmod 777 ${PATH_DEL_TRACKER_LOG}

Si tienes curiosidad de ver como cruncheo la data localmente, aqui esta el codigo en python.
(Es aun un trabajo en progreso)

#!/home/wedoit4y/bin/python/bin/python2.5
# This is the script that processes the ClickTrackr Log
import os
import sys
import pickle
import time

#NAMES OF FILES WHERE WE'LL STORE THE DIFFERENT STAGES OF RETRIEVED
#AND PROCESSED DATA.

#File that holds a dictionary with URLs and HITs we got from the original log file
FILE_MAX_AGE=3600*1
FILE_TIMESTAMP="clicktrackr_last_timestamp.dat"
FILE_001="clicktrackr_001_url_hits.dat"
FILE_002="clicktrackr_002_blogs_lastposts.dat"
FILE_003="clicktrackr_003_posts_hits.dat"
FILE_004="clicktrackr_004_processed_hits.dat" #...and urls not found
FILE_SQL="clicktrackr_update_tables.sql"

try:
    import snowrss_config
    from snowrss_config import getDbCursor
    #from snowrss import *
except Exception,e:
    print "Could not import snowrss_config [%s]" % e
    sys.exit()

def dbExec(sql):
    """Give it some SQL and it will return the returning cursor"""
    try:
        cursor = getDbCursor()
        cursor.execute(sql)
        cursor.connection.close()
    except Exception, e:
        #MySQL has gone away
        print 'dbExec(%s): ' % unicode(sql)
        print e
        return None
    return cursor

def isFileFresh(fileName):
    """
    Returns True if the file is still good to be used.
    Othewise returns false
    """
    try:
        file_stat = os.stat(fileName)
        file_age = time.time() - file_stat.st_mtime
        if file_age > FILE_MAX_AGE:
            return False
        return True
    except:
        return False


def getData(line):
    """Returns a dict with, IP, Timestamp, URL and User Agent if found

    Parameters
        line - A Line with a ClickTracker log entry

    Output
        {'ip':...,'timestamp':...,'url':....,'ua':...}
        ip-> IP Addres
        time -> Time of the event
        url -> Referer Url
        ua -> User Agent of the rerferer user
    """
    l = line.split()
    result = {}
    result['ip']=l[0]
    result['time']=l[1]
    result['url']=l[2]

    result['ua']='N/A'
    if len(l)>3:
        rest = l[3:]
        ua_name = ''
        for b in rest:
            ua_name = ua_name + ' ' + b
        result['ua'] = ua_name

    return result

#Maximum time to count a click from the same IP on the same URL
TIME_BETWEEN_CLICKS = 12*3600

#On the last run (if finished, we write down the time of the last timestamp on file)
#If we did finish a run, we'll get this number from the timestamp file, and we'll ignore
#all previous log entries to that timestamp.
LAST_TIMESTAMP = None
POSSIBLE_LAST_TIMESTAMP = None

try:
    f = fopen(FILE_TIMESTAMP,"rb")
    LAST_TIMESTAMP = pickle.load(f)
    LAST_TIMESTAMP = long(LAST_TIMESTAMP)
    f.close()
except:
    print "No timestamp from last time found."

urls = {}
urls_not_found = {}
LOG_CLICK_TRACKER='tracker.log'

#check if there is a version of the log file backed that's still good enough to be used.
USABLE_LOG_FILE = LOG_CLICK_TRACKER

#use a copy of the log if we got some pickled data
if isFileFresh(LOG_CLICK_TRACKER + '.last') and isFileFresh(FILE_001):
    USABLE_LOG_FILE = LOG_CLICK_TRACKER + ".last"

IGNORED_ENTRIES = 0
if not isFileFresh(FILE_001):

    #open the tracker log (current or old)
    print "Loading data from ClickTrackr log..."
    f = open(USABLE_LOG_FILE,'r')

    f.seek(0,2)
    eof = f.tell()
    f.seek(0)

    while f.tell() < eof:
        entry = getData(f.readline())

        url = entry['url']
        ip = entry['ip']
        timestamp = entry['time']

        if LAST_TIMESTAMP is not None and long(timestamp) < LAST_TIMESTAMP:
            print "i",
            IGNORED_ENTRIES += 1
            continue

        POSSIBLE_LAST_TIMESTAMP = long(timestamp)

        if not url.startswith('http') or 
           url.startswith('http://babelfish.altavista.com') or 
           url.startswith('http://6'):
            #IGNORED_ENTRIES += 1
            continue

        #Ask if this URL is already there
        if urls.has_key(url):
            #Ask if this IP is already there
            if urls[url].has_key(ip):
                #Get the last time stamp inside this IP
                times = urls[url][ip]
                last_time = times[len(times)-1]
                delta_time = long(timestamp) - long(last_time)
                #If its been more than acceptable time
                if delta_time >= TIME_BETWEEN_CLICKS:
                    urls[url][ip].append(timestamp)

                    hits = 0
                    for ipbuffer in urls[url]:
                        if ipbuffer == 'hits': #just count the keys that are not 'hits'
                            continue
                        hits += len(urls[url][ipbuffer])

                    urls[url]['hits'] = hits
            else:
                urls[url][ip] = [timestamp]
                urls[url]['hits'] = 1
        else:
            urls[url]={}
            urls[url][ip] = [timestamp]
            urls[url]['hits']=1
    f.close()

    urls['POSSIBLE_LAST_TIMESTAMP'] = POSSIBLE_LAST_TIMESTAMP

    #we serialize this data for later
    if IGNORED_ENTRIES > 0:
        print "Ignored %d entries." % IGNORED_ENTRIES

    print "Saving ClickTrackr data to File..."
    f = file(FILE_001,"wb")
    pickle.dump(urls,f)
    f.close()
    print "Saving completed."

    #we make a backup of the current ClickTrackr log (.last), in case we need to run again
    #we can diff with this to know from where to relog in the future
    os.system("cp %s %s" % (LOG_CLICK_TRACKER,LOG_CLICK_TRACKER + ".last"))
else:
    #we unserialize the data
    print "Loading ClickTrackr data from existing file..."
    f = file(FILE_001,"rb")
    urls = pickle.load(f)
    f.close()
    POSSIBLE_LAST_TIMESTAMP = urls.pop('POSSIBLE_LAST_TIMESTAMP') #we popup so we have only urls and we dont modify further down
    print "Loading completed."

#LOAD ALL BLOG POST URLS, IDS AND CURRENT NUMBER OF HITS.
blog_urls = {} #blogs hashed by their urls, Buckets have {'post_id':,'post_link':}
blog_ids = {} #blogs hashed by their ids, Buckets have {'post_id':,'post_link':}
if not isFileFresh(FILE_002):
    print "Loading Blogs and Last Posts from DB"
    sql = "SELECT Blog_pk_id, Blog_url FROM BLOGS WHERE Blog_active=1;"
    cursor = dbExec(sql)
    results = cursor.fetchall()

    for r in results:
        #Get the ID of the last post on each blog"
        sql = u"SELECT BP_pk_id,BP_link FROM BLOG_POSTS WHERE BP_fk_blog_id = %d ORDER BY BP_pk_id DESC LIMIT 1" % (r['Blog_pk_id']);
        cursor = dbExec(sql)
        last_post = cursor.fetchone()

        if last_post:
            blog_urls[r['Blog_url']] = {'post_id':last_post['BP_pk_id'],'post_link':last_post['BP_link']}
            blog_ids[r['Blog_pk_id']] = {'post_id':last_post['BP_pk_id'],'post_link':last_post['BP_link']}

    #serialize blog_urls and blog_ids
    print "Saving Blogs to File..."
    f = file(FILE_002,"wb")
    pickle.dump(blog_urls,f)
    pickle.dump(blog_ids,f)
    f.close()
    print "Saving completed."
else:
    #load blog_urls from serialized data
    print "Loading Blogs and Last Posts from File..."
    f = file(FILE_002,"rb")
    blog_urls = pickle.load(f)
    blog_ids = pickle.load(f)
    f.close()
    print "Loading completed."

#LOAD ALL BLOG_POSTS URL AND ITS HITS
post_hits = {} #posts hashed by url, Buckets have (post_id, post_hits, blog_id)
if not isFileFresh(FILE_003):
    print "Loading Posts from DB..."
    sql = "SELECT SQL_CACHE BP_link, BP_pk_id, BP_fk_blog_id, PH_hits "
    sql += "FROM BLOG_POSTS LEFT JOIN POST_HITS ON BP_pk_id = PH_fk_post_id;"
    cursor = dbExec(sql)
    results = cursor.fetchall()

    for r in results:
        #hits might be null, if the post has never been reached on our page
        hit_count = int(r['PH_hits']) if r['PH_hits'] is not None else 0
        post_hits[r['BP_link']] = {'post_id':int(r['BP_pk_id']),
                                   'post_hits':hit_count,
                                   'blog_id':r['BP_fk_blog_id']}

    #now get the blog_posts
    print "Saving Posts from DB on file"
    f = file(FILE_003,"wb")
    pickle.dump(post_hits,f)
    f.close()
    print "Saving completed."
else:
    print "Loading Posts from File..."
    f = file(FILE_003,"rb")
    post_hits = pickle.load(f)
    f.close()
    print "Loading completed."


# The stars of the game are:
# - urls {:{'ip':,'hits':}} //The urls and how many hits we got from the click trackr log
# - blog_urls {:{'post_id':,'post_link':}} //urls of blogs, holding each a tuple with last post info
# - post_hits {:{'post_id':,'post_hits':,'blog_id':]} //urls and hit info of all posts
# - urls_not_found {'':}
if not isFileFresh(FILE_004):
    total_not_found = 0
    distinct_not_found = 0
    total_converted = 0

    converting_blog_url_to_post_url = False

    print "Crunching data..."
    for url in urls:
	#if the current url is the home of a blog
        #we try to see if the blog has any hits.
	if blog_urls.has_key(url):
            url = blog_urls[url]['post_link']
            converting_blog_url_to_post_url = True

        #if you find a direct match add the hits right away
        if post_hits.has_key(url):
	    new_hits = 0
	    if urls.has_key(url) and urls[url].has_key('hits'):
                new_hits = urls[url]['hits']
                if converting_blog_url_to_post_url:
                    total_converted+=1
                    print "!",
            
            old_hits = 0
            if post_hits[url].has_key('post_hits'):
                old_hits = post_hits[url]['post_hits']

            total_hits = new_hits + old_hits

            #finally update post_hits arrays.
            post_hits[url]['post_hits'] = total_hits
            str_a = "(%(post_id)d):%(post_hits)d:" % post_hits[url] 
            str_b = "%d+%d)" % (new_hits,old_hits)
            str_c = str_a + str_b
            print str_c,
        else:
            if urls_not_found.has_key(url):
                urls_not_found[url] += 1
            else:
                urls_not_found[url]=1
                distinct_not_found +=1
            total_not_found += 1
            print "-",

    print
    print "%d converted from blog to last post" % total_converted
    print "Didnt find total %d urls." % total_not_found
    print "Didn't find distinct %d urls." % distinct_not_found

    #serialize processed data in file 4
    print "Saving crunched data..."
    f = file(FILE_004,"wb")
    pickle.dump(post_hits,f)
    pickle.dump(urls_not_found,f)
    f.close()
    print "Data saved."
else:
    print "Loading Previously Crunched Data..."
    f = file(FILE_004,"rb")
    post_hits = pickle.load(f)
    urls_not_found = pickle.load(f)
    f.close
    print "Loading completed."


    #If we can't find it on the blog posts, we could try to
    #slim the URL of this url too http://servername.com/folder
    #and look up on the blog url

    #if nothing, we slim down to http://servername.com

    #if in any of these 2 cases we find a match then
    #we add a hit on the last post of this blog

    #the output of this file should be for now a file with SQL insert statements
print len(post_hits)

#Generate SQL output from post_hits array
# - post_hits {:{'post_id':,'post_hits':,'blog_id':]}
print "Writing SQL..."
f = file(FILE_SQL,"wb")
for url in post_hits:
    post_id = post_hits[url]['post_id']
    hits = post_hits[url]['post_hits']
    f.writelines("UPDATE POST_HITS SET PH_hits = %d WHERE PH_fk_post_id = %d;n" % (hits,post_id))
f.close()
print "Finished Writing SQL"

#If we make it all the way till here, we write down the new LAST_TIMESTAMP
if POSSIBLE_LAST_TIMESTAMP is not None:
    f = file(FILE_TIMESTAMP,"wb")
    pickle.dump(POSSIBLE_LAST_TIMESTAMP,f)
    f.close()
    print "Wrote last timestamp."
else:
    print "Did not write LAST timestamp."

for u in urls_not_found:
    print u

VirtueDesktops y Parallels en la iMac Intel

Aqui los dejo con 2 videos.

Instale el VirtueDesktops para cambiar de escritorios con efectos especiales, y el otro video mientras instalo Windows XP SP2 en un disco duro virtual con un ISO torrenteado. La version de Parallels es un trial de 15 dias, si funciona al pelo creo que vale la pena comprarlo, es preferible que comprar una PC nueva.

Installing X11 for Mac OS from the CD

So I wanted to use the GIMP, and OpenOffice, being a good Open Source boy on the new iMac.

The only problem is that you need to have X11 installed.

So first thing you do, is you google and you find this page:

http://www.apple.com/downloads/macosx/apple/x11formacosx.html

which has a 43mb installer that at least for me errored out saying I had something newer installed. Then all READMEs say that if you have Tiger installed you need to install X11 from the Mac OSX installation disc.

If you go to the disc and try to install from the Install Bundled Software Only you are not going to find it.

After wasting time, I found a the silver bullet on the disc:

/System/Installation/Packages/X11User.pkg

Then after I installed, on the Root of the Disc I saw another Icon called "Optional Installs", guess what? It was there also…

Anyway, I hope you find this post if you're in the same trouble I didn't read anything on apple's site on how to get to it on the disc.

Oh, and one cool thing you get by installing X11 -> xterm (yay!)