Python script to download Google spreadsheet.

I like to automate tasks, I think every software engineer like that, right? After all thats our job. I wrote the following script for downloading google spreadsheet as csv. Just got it when I was going through my old code base, hopefully it would help someone else too.

To run the script you have to install gdata python module.

 

You have to run the script like below:


python gdooc.py spread_sheet_id#gid=tab_id

For example check the following screenshot:

cmd

 

And after downloading you will have the csv file in the same directory, currently the document id is being used as name of the csv file, you can change it as you want.

Happy Coding ūüôā

Advertisements

Attaching new EBS volume in AWS EC2 instance.

Amazon AWS plus EC2 logo_scaled

In AWS, EC2 by default provide 8GB space, in a past project I had to extend the size of one of my development instance as the data was growing fast. From AWS console add new EBS volume. Then attach it to your instance by AWS console and log into you EC2 instance via ssh.

Run following command:


sudo fdisk -l

which will show list of volumes with the newly added volume as unpartitioned. Something like below:

Screen Shot 2012-09-19 at 1_up_2

Then next step is to build the file system of new EBS volume using unix command mkfs. Like below:


sudo mkfs -t ext4 /dev/xvdf

Screen Shot 2012-09-19 at 1_mkfs

Next you have to mount it in your desired path,  e.g. /mnt/ebs1. Run following command:


sudo mount /dev/xvdf /mnt/ebs1

Then add an entry into /etc/fstab. it would be something like this:


"/dev/xvdf  /mnt/ebs1 ext4 defaults 1 1"

There are facts if you add the EBS volume to your /etc/fstab and some how if there are  issue (like file system corruption, unavailability of zone  etc ) with the volume during booting the instance it will not be booted. Because while booting your system will look for the entry and when its not available the whole instance is down. Check AWS forum post for details.

And also check this whole SO discussion to resolve this issue in alternative way ( using a script for example).

Check following docs if you are more interested about the unix commands that used in this post.

fdisk, mount and unmount  and mkfs.

Counting line number of files recursively in Unix

I was working in a project for last couple of months, as the days are passing the codebase is getting larger. Suddenly I thought, It would be great if I can  know how many lines of code I have written so far for each module. And also in total. I know unix has a really awesome utils named wc.

After googling and trying different params and commands I managed to find it by merging to unix tool(wc and find), the full command for recursive line number counting is like below:

wc -l `find . -type f`

The command returns something like below:

Screen Shot 2013-09-25 at 1.39.13 AM

Using ¬†find . -type f ¬† listing all the files recursively and wc -l is counting the line numbers ūüôā

For learning these tow unix command in details check wc and find manual.

MongoDB backup script

Last year while I was working in a project, I needed to automate the whole backing up process from taking snapshot of the current db to saving it to AWS S3 buckets. At that time I took most of the staffs from this blog post.

mongo_logo

Couple of days ago, I started to code for making small backup script that will backup to another cloud machine rather than to AWS S3. ¬†Instead of ¬†coding it from scratch, I reused my previously coded script. All I need to implement a bash function(save_in_cloud) which runs a simple scp command ūüôā

The whole script look like below:

I reused this script, all I did just added a new function which copy the current backup data to a remote server.  And also updated do_cleanup, now it works in any year.

The backup script depends on other two js (fsync_lock.js and fsync_unlock.js) functions which responsible for locking mongo during  db snapshots and releasing lock  after the  snapshots.

Happy Coding ūüôā

Painless deployment with Fabric

Deployment of code in test/staging/production  servers is one of the important part of modern web applications development cycle.

Deploying code were painful because its repetitive same tasks we have to do every time we want to push code, during deployment  if something goes wrong the application will go down too. But the scenario has changed, now we have many tools to make the deployment easier and fun. I have used Capistrano and Fabric for deployment. Found Fabric really painless and as its a Python battery, it was easier for me to adopt and get things done.

I am going to cover fundamental operations and finally a simple fabric script(like boilerplate) for writing your own fabric script.

env = its a Python dictionary like subclass where we define specific settings like password,user etc

local = runs command in  local host(where fabric script is being run)

run = runs command in a remote host

You can use these code tasks in many different ways, to do that check the Fabric Office Documentation from here.

First gist is a sample fabric script,second one is a bash script to install fabric in your ubuntu machine.

¬†After setting username,password and host information into the script you cab check your server’s access log by running ¬†fab test_server¬†latest_access_log¬†

I am using fabric for around two years and used for different small,medium and large projects.

There are many interesting open source projects going on top of Fabric. I found these two projects really promising.

1.Fabtools

2.Graphite_fabric

Search through github,you will find many advance level Fabric use.

Happy Coding!

Daily Unix Commands

If you are a dev or a sys admin  you must rely on unix command to get things done.

These are my daily unix tools list,I am putting these together for myself as a future reference,if it helps anyone else I will be happy ūüôā

Screen

I use ‘screen’ ¬†for keeping myself alive in my remote cloud machines. If you are new to screen,please check this docs.

Checking list of screen in current machine 'screen -ls'
Resuming 'screen -r screen_name' or 'screen -r pid'
Create new screen 'screen -S screen_name'
Detach from screen using 'ctrl+a+d'
Copy mode start 'ctrl+a+{'
Kill the current window 'ctrl+k'

Screen cheatscheet1 screen cheat sheet2

File Compression

Using zip and unzip

Zip command example is like below:

zip backup.zip filename1 filename2
zip -r backup.zip dir_name

Unzip command example is like below:

unzip backup.zip

tar,gzip and bzip2

tar cfv backup.tar filename1 filename2
tar xvf backup.tar

for specific directory

tar xvf backup.tar -C /dir_name/

to use gzip compression use -z  and for bzip2 compressing use -j like this

for compression:

tar -zcfv backup.tar.gzip filename1
tar -jcfv backup.tbz2 filename

for uncompression

tar -zxvf backup.ta
tar -jxvf  backup.tbz2

Unix file compression cheat sheet

And if you want to learn details about this compression techniques please check this

Vim basics

I use vim for editing files in unix system(specially in remote machines).These are list of commands I use:

Basic Navigation:


j->down
k->up
l->left
r->right
H->first line of the file
G->last line of the file
$->end of the line

Changing Line:


yy->copy the current line
p->paste the copied line

check Vim Cheat sheet .

I rarely use ‘sed’ command,and I only used it for easy find replace of strings(its really handy when you have to work on multiple files for same find replace ).The syntax is like below:

sed s/search_string/replace+string old_file.txt > new_file.txt

check sed refernce

Build Periodic crawler with Celery

Its a very common use case when you build a crawler and it will have to run periodically.And generally we set a unix Cron Job to handle the crawler periodically.

But its really pain when you add new task you have to login to the server and add new cron task into the crontab.Its only feasible when you have to run only few cron jobs.

I thought it would be great if I can handle it from my python code and do some interesting things.I have heard about Celery¬†a lot as a meesage queue system.¬†Truly¬†speaking at first I couldn’t understand¬†¬†how it works or how I can integrate with my projects.After googling I understood what is Celery then I thought it will be really great if I can run crawlers around it and use Celery to scheduling periodic work.

Install  Celery by following this link,then you will have to install and configure RabbitMQ from this link. BTW dont forget to add user,vhosts it is described on SettingUp RabbitMQ sections(ou can use mongodb,resddis as broker too).

And then you can clone my git repo,change the celeryconfig.py file as per as your configuration.Add a new task into tasks.py following the first method.

I have added a sample method which requests this site and print the HTTP response status code .

To run the project run “celerybeat”,then it will start celerybeat and start to send tasks to the broker like below:

Run “celeryd”¬†into another terminal window to check the task output,you will see something like below:

It is printing the response status after every 5 seconds.

You can handle anything that you want todo after the crawling,like parsing the dom saving text,submitting form etc.

Btw dont forget to run


pip install -r requirements.txt

to install necessary packages for the projects.My Github Periodic Crawling Project

Unix sort & uniq

It was around 2 Am and I was working like a caveman,but its hard to escape bed time ūüė¶

Suddenly I found I set a wrong cron job in a cloud and it generated duplicate results.I have to make a report from the cron output and every line should be unique.The file is around 1.2 GB.

It was  a json file, that has several thousand lines,many of them are redundant.I have to remove the redundant  values and make a file which every line is unique.

I started to write a python script to do that,I was on the half way to finish my python script that takes file and create another file that contains uniqu¬†elements¬†from the input file.As I was too¬†tired,thought I should do a search is there any unix command to this job.And found exactly what I needed ūüôā

sort filename.txt | uniq

Or

cat filename.txt | sort -u

If the input file contans:

Line 1
Line 2
Line 2
Line 3
Line 1
Line 3

The command generates

Line 1
Line 2
Line 3

And I just redirected the output of the command into a new file like below:

sort filename.txt | uniq > result.txt

Explanation of the command:

sort’ command lists all the lines by default alphabetically and ‘uniq’¬†command can eliminate or count duplicate lines in a pre sorted file.

You can also use sort and uniq in different situation, for  details check following links:

Sort and Uniq

These two utility command will help me to sleep early ūüôā

Just featured in Ubuntu Cloud!

I am a hardcore Ubuntu user.At home for entertainment and development I use Ubuntu.I fall in love with it when I was 2nd year University student.At that time I used for doing programming in c and later I used it for networking labs.

With time and try I came to know how to play with Ubuntu,how I can get most out of it.The community helped me a lot to making this relationship mature.
After leaving university my love didnt end up,got new boost I bought a rack space cloud to make myself expert in cloud system also to deploy applications that I develop for my clients.I choose ubuntu as OS in my cloud because of its community and familiarity.I love ubuntu community for its active participation.
A week ago I was interviewed by cloud ubuntu administrator.And yesterday night my interview featured in the cloud ubuntu home.
Check that out from here ūüôā

Hope it will inspire me as well as others to spread FOSS in larger space and make ubuntu cloud as ultimate solution for cloud computing ūüôā

Manage your VPS,Cloud using Webmin.

Webmin is a smart solution for managing VPS,I had played with it a year ago while deploying an LAMP solution.Usually when we use shared hosting we can have a access using Cpanel,which is a paid solution for managing hosting system.

Among open source solutions Webmin is the best.So for managing my personal RackSpace Cloud machine I decided to install webmin.

My cloud OS is ubuntu and you know how easy to install any package  in ubuntu system.I downloaded the .deb of webmin form here.

Then copied it using SCP .The command was as bellows to copy downloaded webmin:

scp webmin_***.deb userName@12.23.45.56:home/userName/sources

** userName should be you server user name,then 12.23.46.56 will be replaced using your servers IP and home/userName/sources
should be replaced by the location where you want to have the webmin package.
To know more about SCP,click here

Alternatively,you can get the webmin package using wget tool.To know about wget click here
Now the real parts.All you have to do,go to the folder where you have copied/downloaded the webmin pacakage.
And then have to install the deb package.
For me,it was like below:

cd sources
sudo dpkg -i webmin_***.deb

But it shows dependecies problem,left unconfigured!
Hm,so I tried with the apt-get tool directly using following command:

sudo apt-get install webmin

Then it shows already the package has been installed!That means previous dpkg has installed the package but was unconfigured.
Luckily apt-get tools has a very handy option “-f” which try to fix the broken dependencies.

sudo apt-get -f install

It fixes the dependencies.And it is configured.

Then I went to my server default DNS with the 10000 port.that means http://example.com:10000 is the webmin login panel.
You can login using your cloud user information.After login the interface looks like:

Webmin Home

Installation and configuration for other unix like system will be nearly same,with some small changes.
Now the web server administration is very easy,you can know status,configure new services and play with your cloud fast ūüôā