Python Coding Skills



Friday Informal Sharing
Yufan Zheng : zhengyufan@wisers.com


Get Started

Everyone Python

3-4 Hours

Then, We Can Start Python Programming

But, It Requires Much More Time
to Become Really Good At It.


4 Hours Is Far Less Than Enough.

Python Code


Readability is At The Heart of Python Design.


  • Simple is Better Than Complex...
  • Short is Better Than Long...
  • Explict is Better Than Implicit...
  • Beautiful is Better Than Ugly...
  • Readability Counts...

Code Style

Naming Conventions


Variables, functions, methods, packages, modules :

lower_case_with_underscores

Classes and Exceptions :

CapWords

Private methods :

_single_leading_underscore(self, ...)

Constants :

ALL_CAPS_WITH_UNDERSCORES

Avoid one-letter variables :

l, O, I

Avoid Redundant Labeling


Yes

import audio

core = audio.Core()
controller = audio.Controller()

No

import audio

core = audio.AudioCore()
controller = audio.AudioController()

Prefer "Reverse Notation"


Yes

elements = ...
elements_active = ...
elements_defunct = ...

No

elements = ...
active_elements = ...
defunct_elements ...

Line Length : Don't Stress Over 80-100 Characters


# Use parentheses for line continuations.
wiki = (
    "The Colt Python is a .357 Magnum caliber revolver formerly "
    "manufactured by Colt's Manufacturing Company of Hartford, "
    "Connecticut. It is sometimes referred to as a Combat Magnum "
    "same year as Smith & Wesson's M29 .44 Magnum."
)

30 - 30 Principle


A function should never exceed 30 lines.


A class can contain at most 30 methods.

Google Comment Style - Function



def func(arg1, arg2, arg3=None):
    """ Intro line

    Rant on here. Make sure it's at least 3-line long (j/k)

    Args:
        arg1: An argument
        arg2: Another argument
        arg3: optional argument

    Returns:
        Describe returned value(s) here

    Raises:
        Error
    """
    pass

Google Comment Style - Class



class SampleClass(object):
    """ Summary of class here.

    Longer class information....
    Longer class information....

    Attributes:
        likes_spam: A boolean indicating if we like SPAM or not.
        eggs: An integer count of the eggs we have laid.
    """

    def __init__(self, likes_spam=False):
        """Inits SampleClass with blah."""
        self.likes_spam = likes_spam
        self.eggs = 0

    def public_method(self):
        """Performs operation blah."""

PEP 8 Python Style Guide


A style guide is to improve the readability of code and
make it consistent across the wide spectrum of Python code


  • Code Layout

    1. # Indentation
    2. # Maximum Line Length
    3. # Should a Line Break Before or After a Binary Operator?
    4. # Blank Lines
    5. # Imports
    6. # ...
  • Naming Conventions

    1. # Names to Avoid
    2. # Class Names
    3. # Global Variable Names
    4. # Function and Parameter Names
    5. # Constants
    6. # ...
  • Comments

    1. # Block Comments.
    2. # Inline Comments.
    3. # Documentation Strings
    4. # ...
  • Others

    1. # When to Use Trailing Commas.
    2. # Whitespace in Expressions and Statements
    3. # String Quotes
    4. # ...

Pythonic Code

What is Pythonic Code ?


The Way We Explore The Feature of Python Language
to produce Clear, Concise and Maintainable Code

Pythonic Code ? Yes !


Transforming Code into Beautiful, Idiomatic and Concise Python


Call a function until a sentinel value


lines = []
while True:
    line = f.readline()
    if line == '':
        break
    lines.append(line)

Better Code :
lines = [line for line in iter(f.readline, '')]

iter takes two arguments. The first you call over and over again and the second is a sentinel value.

Pythonic Code

Making The Program More Efficient

Dictionary For Performance

What are the difference between two algorithms ?

data_list = [...]                       # len = 500, 000 
interesting_points = [...]              # len =      100

for i in interesting_ids:
    point = find_point_by_id_in_list(data_list, i)
    interesting_points.append(point)

data_lookup = {...}                     # len = 500, 000 
interesting_points = [...]              # len =      100

for i in interesting_ids:
    point = data_lookup(data_list, i)
    interesting_points.append(point)

High-Performance Container Datatypes

namedtuple() factory function for creating tuple subclasses with named fields

New in version 2.6.

deque list-like container with fast appends and pops on either end

New in version 2.4.

Counter dict subclass for counting hashable objects

New in version 2.7.

OrderedDict dict subclass that remembers the order entries were added

New in version 2.7.

defaultdict dict subclass that calls a factory function to supply missing values

New in version 2.5.

Memory Efficiency with Slots


Custom types store their data in individualized, dynamic dictionaries via self.__dict__. Using __slots__ to limit available attribute names and move the name/key storage outside the instance to a type level can significantly improve memory usage.

class ImmutableThing:
            
    __slots__ = ['a', 'b', 'c']

    def __init__(self, a, b, c):
         self.a = a 
         self.b = b
         self.c = c

Pythonic Code

Efficient Build-In Tools

Bisect for Quick Search

Bisect provides support for maintaining a list in sorted order without having to sort the list after each insertion.

import bisect
import random

# Reset the seed
random.seed(1)

# Use bisect_left and insort_left.
l = []
for i in range(1, 5):
    r = random.randint(1, 100)
    position = bisect.bisect_left(l, r)
    bisect.insort_left(l, r)
    print '%2d %2d' % (r, position), l

$ python bisect_example.py

14  0 [14]
85  1 [14, 85]
77  1 [14, 77, 85]
26  1 [14, 26, 77, 85]
50  2 [14, 26, 50, 77, 85]

Pickle For Fast Object Serialization


import pickle

a = A_VERY_LARGE_OBJECT

with open('filename.pickle', 'wb') as handle:
    pickle.dump(a, handle, protocol=pickle.HIGHEST_PROTOCOL)

with open('filename.pickle', 'rb') as handle:
    b = pickle.load(handle)

print(a == b)

Array Serialization


from array import array
from random import random
floats = array('d', (random() for i in range(10**7))) 

fp = open('floats.bin', 'wb')
floats.tofile(fp)
fp.close()

floats2 = array('d') 
fp = open('floats.bin', 'rb') 
floats2.fromfile(fp, 10**7) 
fp.close()

Pythonic Code

Taking The Advantages of Decorators

What is Decorator ?

You Might Have Used Decorators
Even Without Knowing It



from flask import Flask
app = Flask(__name__)

@app.route("/")
def hello():
    return "Hello World!"

from functools import lru_cache

@lru_cache(maxsize=None)
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)

Decorator is Used To Modify The Inner Function

The Following Three Codes Have The Same Effect.


# State, before defining f, that a_decorator will be applied to it.
@a_decorator
def f(...):
    ...

def f(...):
    ...

# After defining f, apply a_decorator to it.
f = a_decorator(f)

def a_decorator():
    ...
    def f(...):
        ...
    ...
    return f

Print Decorated Function's Input Output


def deco(func):
    def inner(*args):
        print('Input :', args)
        output = func(*args)
        print('Output:', output)
        return output
    return inner

@deco
def add(a, b):
    return a + b

>>> add(1, 2)
Input : (1, 2)
Output: 3

Example 1 - Use Decorator to Auto Register
Input, Output and Execution Time


import time

def timer(func):
   
    def clock(*args):
        t0 = time.perf_counter()
        result = func(*args)
        elapsed = time.perf_counter() - t0
        name = func.__name__
        arg_str = ', '.join(repr(arg) for arg in args)
        print('[%0.8fs] %s(%s) -> %r' % (elapsed, name, arg_str, result)) 
        return result
        
    return clock

Example 1 - Use Decorator to Auto Register
Input, Output and Execution Time


import time
from timer import timer

@timer
def factorial(n):
    return 1 if n < 2 else n*factorial(n-1)

if __name__=='__main__':
    factorial(4)

$ python3 clockdeco_demo.py
              
[0.00000191s] factorial(1) -> 1
[0.00004911s] factorial(2) -> 2
[0.00008488s] factorial(3) -> 6
[0.00013208s] factorial(4) -> 24

Example 2 - Use Decorator to Make Class Singleton


class Singleton:

    _singletons = dict()

    def __init__(self, decorated):
        self._decorated = decorated

    def getInstance(self):
        key = self._decorated.__name__
        try:
            return Singleton._singletons[key]
        except KeyError:
            Singleton._singletons[key] = self._decorated()
            return Singleton._singletons[key]

    def __call__(self):
        raise Exception(
            'Singletons must be accessed through the `getInstance` method.')

Example 2 - Use Decorator to Make Class Singleton


@Singleton
class Foo:
    def __init__(self):
        print('Foo created')

    def bar(self, obj):
        print(obj)

foo = Foo()                  # Wrong, raises Exception

foo = Foo.getInstance()
goo = Foo.getInstance()

print(goo is foo)            # True

foo.bar('Hello, world! I m a singleton.')

Python Meta Class Functions

NameDescription
__new__The real function to create an object.
__init__Initialize newly created object.
__call__Make the class callable like a function.

Python Meta Class Functions

class Foo:
            
    def __new__(cls, *args, **kwargs):
        print('Calling __new__')
        return super(Foo, cls).__new__(cls)

    def __init__(self, a):
        print('Calling __init__')
        self.a = a

    def __call__(self, *args, **kwargs):
        print('Calling __call__')
        print(self.a)

>>> f = Foo(1)
Calling __new__
Calling __init__
>>> f()
Calling __call__
1

Pythonic Code

Meta Programming

Example I : __len__ & __getitem__ (1)


import collections
            
Card = collections.namedtuple('Card', ['rank', 'suit'])
 
class FrenchDeck:

    ranks = [str(n) for n in range(2, 11)] + list('JQKA')
    suits = 'spades diamonds clubs hearts'.split()
    
    def __init__(self):
        self._cards = [Card(rank, suit) for suit in self.suits
                                        for rank in self.ranks]
        
    def __len__(self):
        return len(self._cards)
        
    def __getitem__(self, position):
        return self._cards[position]

Example I : __len__ & __getitem__ (2)


 >>> deck = FrenchDeck()
 >>> len(deck)
 52

 >>> deck[0]
 Card(rank='2', suit='spades')
 >>> deck[-1]
 Card(rank='A', suit='hearts')

 >>> from random import choice
 >>> choice(deck)
 Card(rank='3', suit='hearts')
 >>> choice(deck)
 Card(rank='K', suit='spades')
 >>> choice(deck)
 Card(rank='2', suit='clubs')

Example I : __len__ & __getitem__ (3)


 >>> deck[:3]
 [Card(rank='2', suit='spades'), Card(rank='3', suit='spades'),
 Card(rank='4', suit='spades')]
 >>> deck[12::13]
 [Card(rank='A', suit='spades'), Card(rank='A', suit='diamonds'),
 Card(rank='A', suit='clubs'), Card(rank='A', suit='hearts')]

 >>> for card in deck: 
 ...   print(card)
 Card(rank='2', suit='spades')
 Card(rank='3', suit='spades')
 Card(rank='4', suit='spades')
 ...

 >>> for card in reversed(deck): 
 ...   print(card)
 Card(rank='A', suit='hearts')
 Card(rank='K', suit='hearts')
 Card(rank='Q', suit='hearts')
 ...

Example II : __iter__


class Article:
    def __init__(self, sentences):
         self.sentences = sentences
    
    def __iter__(self):
         return (sentence for sentence in self.sentences)
         
class Sentence:
    def __init__(self, words):
         self.words = words
    
    def __iter__(self):
         return (word for word in self.words)
         
...

>>> for sentence in article:
>>> ... for word in sentence:
>>> ....... print(word)

Example III : __repr__


from array import array
import math

class Vector2d:
    def __init__(self, x, y):
        self.x = float(x)
        self.y = float(y)

    def __iter__(self):
        return (i for i in (self.x, self.y))

    def __repr__(self):
        class_name = type(self).__name__
        return '{}({!r}, {!r})'.format(class_name, *self)

...
 
>>> v = Vector2d(1, 2)
>>> print(v) 
Vector2d(1.0, 2.0)

Object Oriented Design

Why is This Code Bad Designed ?


class UserSettings:

    def __init__(self, user):
        self.user = user

    def change_setting(self, setting):
        if self.verify_credential():
            # do change setting
            pass

    def verify_credential(self):
        # do verify credential
        pass

Single Responsibility Principle



There should never be more than one reason for a class to change

Single Responsibility Principle - Good Code


class UserSettings:

    def __init__(self, user):
        self.user = user
        self.auth = UserAuth(user)

    def change_setting(self, setting):
        if self.auth.verify_credential():
            # do change setting
            pass

class UserAuth:

    def __init__(self, user):
        self.user = user

    def verify_credential(self):
        # do verify credential
        pass

Is This A Good Code ?


class Rectangle:

    def __init__(self, width, height):
        self.width = width
        self.height = height

class AreaCalculator:

    def compute_area(self, shapes):
        """ Compute the sum area of a shape collection """
        area = 0.0
        for shape in shapes:
            area += shape.width * shape.height
        return area

Possible Extension 1

Collection needs to contain circle ...

A Quick Solution Based On Previous Design


class Rectangle:
    ...
        
class Circle:
   
    def __init__(self, radius):
        self.radius = radius

class AreaCalculator:

    def compute_area(self, shapes):
        """ Compute the sum area of a shape collection """
        area = 0.0
        for shape in shapes:
            # Check type here
            if isinstance(shape, Rectangle):  
                area += shape.width * shape.height
            else:
                area += 3.14 * shape.radius * shape.radius
        return area

Possible Extension 2

What if the collection needs to contain triangle, diamond, octagon etc ... ?

Open / Closed Principle



Software entities should be open for extension, but closed for modification

Open / Closed Principle - Good Code


from abc import ABC, abstractmethod

class Shape(ABC):
    @abstractmethod
    def area(self):
        pass

class Rectangle(Shape):
    def __init__(self, width, height):
        self.width = width
        self.height = height
    
    def area(self):
        return self.width * self.height

class Circle(Shape):
    def __init__(self, radius):
        self.radius = radius
        
    def area(self):
        return self.radius * self.radius * 3.14

class AreaCalculator:

    def compute_area(self, shapes):
        """ Compute the sum area of a shape collection """
        area = 0.0
        for shape in shapes:
            area += shape.area()
        return area

Is This A Good Design ?


from abc import ABC, abstractmethod

class IEmployee(ABC):
    @abstractmethod
    def work(self):
        pass
    @abstractmethod
    def eat(self):
        pass

class Researcher(IEmployee):
    def work(self):
        pass
    def eat(self):
        pass

class Programmer(IEmployee):
    def work(self):
        pass
    def eat(self):
        pass

Interface Segregation Principle



Clients should not be forced to depend upon interfaces that they do not use.

Interface Segregation Principle - Good Code


from abc import ABC, abstractmethod

class Workable(ABC):
    @abstractmethod
    def work(self):
        pass

class Feedable(ABC):
    @abstractmethod
    def eat(self):
        pass

class Researcher(Workable, Feedable):
    def work(self):
        pass
    def eat(self):
        pass

class Programmer(Workable, Feedable):
    def work(self):
        pass
    def eat(self):
        pass

Interface Segregation Principle - A Real Example


Composition Over Inheritance - Bad Implementation


OO Design Principles

  • SRP

    Single Responsibility Principle

    There should never be more than one reason for a class to change

  • OCP

    Open / Closed Principle

    Software entities should be open for extension, but closed for modification

  • ISP

    Interface Segregation Principle

    Clients should not be forced to depend upon interfaces that they do not use.


Python Inheritance

1. Do Not Inherit From Build-In Data Type.



# Bad Implementation
class DoppelDict(dict):
              
    def __setitem__(self, key, value): 
        super().__setitem__(key, [value] * 2)

>>> dd = DoppelDict(one=1)
>>> dd
{'one': 1}
>>> dd['two'] = 2
>>> dd
{'one': 1, 'two': [2, 2]}
>>> dd.update(three=3)
>>> dd
{'three': 3, 'one': 1, 'two': [2, 2]}

2. Mixin for Multi-Inheritances


Example I - A General Repr Mixin


import reprlib
            
class ReprLibMixin(object):
   
    def __repr__(self):
        return "<{} {attr}>".format(
            self.__class__.__name__,
            attr=" ".join("{}={}".format(k, reprlib.repr(v)) for k, v in sorted(self.__dict__.items())),
        )
    
class Word(ReprLibMixin):
   
    def __init__(self, text, sentence_index, article_index, pos_tag):
        self.text = text
        self.sentence_index = sentence_index
        self.article_index = article_index
        self.pos_tag = pos_tag

Example II - Comparable Mixin


class Comparable(object):
    def __ne__(self, other):
        return not (self == other)

    def __lt__(self, other):
        return self <= other and (self != other)

    def __gt__(self, other):
        return not self <= other

    def __ge__(self, other):
        return self == other or self > other

class Integer(Comparable):
    def __init__(self, i):
        self.i = i

class Char(Comparable):
    def __init__(self, c):
        self.c = c

3. Enrich Constructor With
Static Factory Method


class Sentence:
    def __init__(words, article_indices, sentence_indices, ...):
        pass
    
    @staticmethod
    def fromText(sentence_str):
        pass
    
    @staticmethod
    def fromLtpJson(ltp_json):
        pass

Performance Tuning

Step 1: Make A Better Algorithmn


  • 1. Binary Search
  • 2. Quick Sort
  • 3. Data Structure: BST, Trie, Interval Tree, Stack etc.
  • 4. Dynamic Programming (Tabulation, Memoizatation)
  • 5. Save Previous Results in Dict
  • ...

Step 2: Cache


from functools import lru_cache
    
@lru_cache(maxsize=None)
def fib(n):
    if n <= 0:
        print("Incorrect input")
    elif n == 1:
        return 0
    elif n == 2:
        return 1
    else:
        return fib(n-1) + fib(n-2)

Step 3: Python Coroutine


import aiohttp
import asyncio

URL = 'https://www.youtube.com/'

async def job(sess):
    resp = await sess.get(URL)
    return str(resp.url)

async def main(loop):
    async with aiohttp.ClientSession() as sess:
        tasks = [loop.create_task(job(sess)) for _ in range(2)]
        finished, unfinished = await asyncio.wait(tasks)
        for r in finished:
            print(r.result())
            
loop = asyncio.get_event_loop()
loop.run_until_complete(main(loop))
loop.close()

Step 4: Python Multi Threading


from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    p = Pool(5)
    print(p.map(f, [1, 2, 3]))

Step 5: Others...


  • 1. Cython
  • 2. numba
  • 3. PyPI
  • 4. f2py
  • ...

Profiling

import cProfile
import re
cProfile.run('re.compile("foo|bar")')

      197 function calls (192 primitive calls) in 0.002 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.001    0.001 :1()
     1    0.000    0.000    0.001    0.001 re.py:212(compile)
     1    0.000    0.000    0.001    0.001 re.py:268(_compile)
     1    0.000    0.000    0.000    0.000 sre_compile.py:172(_compile_charset)
     1    0.000    0.000    0.000    0.000 sre_compile.py:201(_optimize_charset)
     4    0.000    0.000    0.000    0.000 sre_compile.py:25(_identityfunction)
   3/1    0.000    0.000    0.000    0.000 sre_compile.py:33(_compile)