Latest articles about Python

28 Jun 2012

Communicating with RESTful APIs in Python

Written by Balthazar

REST defines a way to design an API with which you can consume its ressources using HTTP methods (GET, POST, etc) over URLs. Interacting with such an API basically comes down to sending HTTP requests.

In this article, we’ll see which python modules are available to solve this problem, and which one you should use. We’ll test all modules with this simple test case: we would like to create a new Github repository using their RESTful API.

Read more

08 Jun 2012

Automation of the Tesseract training process

Written by Balthazar

Tesseract is an open-source Optical Character Recognition engine, historically developped by HP and Google, allowing you to extract text information out of images. One of the great features of tesseract is the possibility of training it on a new language, a new set of characters, or even on a particular font. The training procedure is fully described here.

This prodecure is quite long and tedious. That’s why I’ve written a standalone Python wrapper that can take care of the training process for you, in the case where you want to train tesseract on a new font, or characters. This demo is intended for Unix/Linux users.

Read more

23 Apr 2012

Crawl a website with scrapy

Written by Balthazar

Introduction

In this article, we are going to see how to scrape information from a website, in particular, from all pages with a common URL pattern. We will see how to do that with Scrapy, a very powerful, and yet simple, scraping and web-crawling framework.

Read more