You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

This document is based on the Google Python Style Guide. The original document is comprehensive and should be referred for work outside the MADlib team. This document is specific to the work in MADlib.

Python Language rules

 Imports: Use full path of import for specific functions, while using import for packages and modules only.

- Use import x for importing packages and modules.
- Use from x import y where x is the package prefix and y is the module name with no prefix.
- Use from x import y as z if two modules named y are to be imported or if y is an inconveniently long name.

Example:

 Yes: import sound
         sound.effects()

 No: from sound import *

Yes: from sound.effects import echo
         …
        echo.EchoFilter(input, output, delay=0.7, atten=4)

Yes: import os
       import sys

No:  import os, sys

Imports are always put at the top of the file, just after any module comments and doc strings and before module globals and constants. Imports should be grouped with the order being most generic to least generic:

1. standard library imports
2. third-party imports
3. application-specific imports

Within each grouping, imports should be sorted lexicographically, ignoring case, according to each module's full package path.

import foo
from foo import bar
from foo.bar import baz
from foo.bar import Quux
from Foob import ar

 

Global variables: Avoid global variables. Use only in the case of module-level constants. If global variables are used use all caps with underscores.

Example: PI = 3.14159
                LANCZOS_MAX_ITER = 5000

 

List Comprehensions/Generators: Simple list comprehensions can be clearer and simpler than other list creation techniques. Generator expressions can be very efficient, since they avoid the creation of a list entirely.

Example:

 squares = [x * x for x in range(10)]

 result = [(x, y) for x in range(10) for y in range(5) if x*y < 10]

 black_beans = [jelly_bean for jelly_bean in jelly_beans
                             if jelly_bean.color == 'black']

 

Default iterators and operators: Use the default iterators for the builtin types like list, dictionary and sets. Prefer these methods to methods that return lists, except that * you should not mutate a container while iterating over it *.

Example:

 Yes: for key in a_dict:
                
 No: for key in a_dict.keys():
                

 Yes: if key not in a_dict:
                
 No: if not a_dict.has_key(key):
               

 Yes: for k, v in a_dict.iteritems():
               
 No: for k in dict:
                      for v in a_dict[k]:
                        …

          Yes: for index, value in enumerate(a_list):
                     …
 No: for index in range(len(a_list)):
                     value = a_list[index]
                     ...

True/False evaluations: Python evaluates certain values as false when in a boolean context. A quick "rule of thumb" is that all "empty" values are considered false so 0, None, [], {}, '' all evaluate as false in a boolean context. This may look strange to C/C++ developers but this is the "Pythonic" way of doing this.

 - Never use == or != to compare singletons like None. Use 'is None' or 'is not None'.

 - Beware of writing 'if x:' when you really mean 'if x is not None:'
    e.g., when testing whether a variable or argument that defaults to None was set to
            some other value. The other value might be a value that's false in a boolean context!

 - Never compare a boolean variable to False/True using ==.  
            Use 'if not x:' or 'if x:' instead of x == False.
            If you need to distinguish False from None then chain the expressions, such as
                  'if x is not None and x:'.

 - For sequences (strings, lists, tuples), use the fact that empty sequences are false, so
             'if not seq:' or 'if seq:' is preferable to 'if len(seq):' or 'if not len(seq):'.

 - For integers it is safer to compare to 0 explicity.
             e.g. 'if i % 10 == 0:'  instead of 'if i %10:'

 

Classes:  If a class inherits from no other base classes, explicitly inherit from object. This also applies to nested classes.

Yes: class SampleClass(object):

        class ChildClass(ParentClass):
             """Explicitly inherits from another class already."""

   

Inheriting from object is needed to make properties work properly, and it will protect your code from one particular potential incompatibility with Python 3000. It also defines special methods that implement the default semantics of objects including __new__, __init__, __delattr__, __getattribute__, __setattr__, __hash__, __repr__, and __str__.


Python style rules:

Doc strings
:

Python has a unique commenting style using doc strings. A doc string is a string that is the first statement in a package, module, class or function. These strings can be extracted automatically through the __doc__ member of the object and are used by pydoc.

A function must have a docstring, unless it meets all of the following criteria: not externally visible, very short and, obvious.

A docstring should give enough information to write a call to the function without reading the function's code. A docstring should describe the function's calling syntax and its semantics, not its implementation. For tricky code, comments alongside the code are more appropriate than using docstrings.

Certain aspects of a function should be documented in special sections, listed below. Each section begins with a heading line, which ends with a colon. Sections should be indented two spaces, except for the heading.

Args:

List each parameter by name. A description should follow the name, and be separated by a colon and a space. If the description is too long to fit on a single 80-character line, use a hanging indent of 2 or 4 spaces (be consistent with the rest of the file). The description should mention required type(s) and the meaning of the argument.

If a function accepts *foo (variable length argument lists) and/or **bar (arbitrary keyword arguments), they should be listed as *foo and **bar.

Returns: (or Yields: for generators)

 Describe the type and semantics of the return value. If the function only returns None, this section is not required.

Raises:
           List all exceptions that are relevant to the interface.

Example:

def fetch_bigtable_rows(big_table, keys, other_silly_variable=None):

 """Fetches rows from a Bigtable.

 @brief Retrieves rows pertaining to the given keys from the Table instance represented by big_table.  Silly things may happen if other_silly_variable is not None.

 Args:
           @param big_table: An open Bigtable Table instance.
                 @param keys: A sequence of strings representing key of each table row to fetch,
                 @param other_silly_variable: Another optional variable, that has a much longer
        name than the other args, and which does nothing.

 Returns:
                Dict. A dict mapping keys to the corresponding table row data fetched. Each row is
           represented as a tuple of strings.
           For example:

   {'Serak': ('Rigel VII', 'Preparer'),
                       'Zim': ('Irk', 'Invader'),
                       'Lrrr': ('Omicron Persei 8', 'Emperor')}

    If a key from the keys argument is missing from the dictionary, then that row was not
           found in the table.

 Raises:
            IOError: An error occurred accessing the bigtable.Table object.

 """

Classes should have a doc string below the class definition describing the class. If your class has public attributes, they should be documented here in an Attributes section and follow the same formatting as a function's Args section.

Example:

class SampleClass(object):

 """Summary of class here.

 Longer class information...
           Longer class information....

 Attributes:
                 likes_spam: A boolean indicating if we like SPAM or not.
                 eggs: An integer count of the eggs we have laid.

 """

 def __init__(self, likes_spam=False):

         """Inits SampleClass with blah."""

          self.likes_spam = likes_spam
                self.eggs = 0

 def public_method(self):
             """Performs operation blah."""

Block and Inline Comments

The final place to have comments is in tricky parts of the code. If you're going to have to explain it at the next code review, you should comment it now. Complicated operations get a few lines of comments before the operations commence. Non-obvious ones get comments at the end of the line.

# We use a weighted dictionary search to find out where i is in
# the array.  We extrapolate position based on the largest num
# in the array and the array size and then do binary search to
# get the exact number.

if i & (i-1) == 0:     # true iff i is a power of 2

To improve legibility, inline comments should be at least 2 spaces away from the code.

On the other hand, never describe the code. Assume the person reading the code knows Python (though not what you're trying to do) better than you do.

# BAD COMMENT: Now go through the b array and make sure whenever i occurs
# the next element is i+1

Semicolons: Don't end lines with semicolons. Do not add two statements in same line using a semicolon between

Line Length: Maximum line length should be kept under 80 characters. (Exception: URLs in docs).

Do not use backslash for continuation. Make use of Python's implicit line joining inside parentheses, brackets and braces. If necessary, you can add an extra pair of parentheses around an expression.

Yes: foo_bar(self, width, height, color='black', design=None, x='foo',
                    emphasis=None, highlight=0)

  if (width == 0 and height == 0 and
              color == 'red' and emphasis == 'strong'):

When a literal string won't fit on a single line, use parentheses for implicit line joining.

Yes: x = (“This will build a very long long “
                      “long long long long long long string”)

Parentheses: Use parentheses sparingly and only when needed (but use if it reduces ambiguity)

Yes: if foo:
          bar()
          while x:
              x = bar()

           if x and y:
              bar()

  if not x:
              bar()

 return foo

for (x, y) in dict.items(): ...

No:  if (x):
           bar()

       if (x) and (y):
 bar()

       if not(x):
           bar()

       return (foo)

Indentation: Indent code with 4 spaces. Never mix tab and spaces!!! Use hanging indent when the parameter list is too long.

Yes:   # Aligned with opening delimiter
   foo = long_function_name(var_one, var_two,
                                                     var_three, var_four)
   # 4-space hanging indent; nothing on first line
   foo = long_function_name(
                            var_one, var_two, var_three,
                            var_four)

No: foo = long_function_name(var_one, var_two,
                       var_three, var_four)
   # 2-space hanging indent forbidden
   foo = long_function_name(
             var_one, var_two, var_three,
             var_four)

Blank Lines: Two blank lines between top-level definitions, one blank line between method definitions. Two blank lines between top-level definitions, be they function or class definitions. One blank line between method definitions and between the class line and the first method. Use single blank lines as you judge appropriate within functions or methods.

Whitespace:

No whitespace inside parentheses, brackets or braces.

Yes: spam(ham[1], {eggs: 2}, [])
No:  spam( ham[ 1 ], { eggs: 2 }, [ ] )

No whitespace before a comma, semicolon, or colon. Do use whitespace after a comma, semicolon, or colon except at the end of the line.

Yes: if x == 4:
     print x, y
  x, y = y, x

No:  if x == 4 :
     print x , y
  x , y = y , x

No whitespace before the open paren/bracket that starts an argument list, indexing or slicing.

Yes: spam(1)
No:  spam (1)

Yes: dict['key'] = list[index]
No:  dict ['key'] = list [index]

Surround binary operators with a single space on either side.

Yes: x = 1
No:  x<1

Don't use spaces around the '=' sign when used to indicate a keyword argument or a default parameter value.

Yes: def complex(real, imag=0.0): return magic(r=real, i=imag)
No:  def complex(real, imag = 0.0): return magic(r = real, i = imag)

Don't use spaces to vertically align tokens on consecutive lines, since it becomes a maintenance burden (applies to :, #, =, etc.):

Yes:
 foo = 1000  # comment
 long_name = 2  # comment that should not be aligned

 dictionary = {
  'foo': 1,
  'long_name': 2,
 }

No:
 foo              = 1000  # comment
 long_name = 2  # comment that should not be aligned

 dictionary = {
  'foo'            : 1,
  'long_name' : 2,
 }

Strings:

Use the format method or the % operator for formatting strings, even when the parameters are all strings. Use your best judgement to decide between + and % (or format) though.

Yes: x = a + b
  x = '{}, {}!'.format(imperative, expletive)
  x = 'name: {}; score: {}'.format(name, n)

No: x = '{}{}'.format(a, b)  # use + in this case
 x = 'name: ' + name + '; score: ' + str(n)

Avoid using the + and += operators to accumulate a string within a loop. Since strings are immutable, this creates unnecessary temporary objects and results in quadratic rather than linear running time. Instead, add each substring to a list and ''.join the list after the loop terminates (or, write each to a io.BytesIO buffer).

Yes:     items = ['<table>']
  for last_name, first_name in employee_list:
                items.append('<tr><td>%s, %s</td></tr>' % (last_name, first_name))
       items.append('</table>')
  employee_table = ''.join(items)

No:      employee_table = '<table>'
 for last_name, first_name in employee_list:
               employee_table += '<tr><td>%s, %s</td></tr>' % (last_name, first_name)
           employee_table += '</table>'

Use """ for multi-line strings rather than '''. Note, however, that it is often cleaner to use implicit line joining since multi-line strings do not flow with the indentation of the rest of the program:

Yes:  print ("This is much nicer.\n"
                 "Do it this way.\n")

No: print """This is pretty ugly.
Don't do this.
"""

Naming conventions:

module_name, package_name, ClassName, method_name, ExceptionName, function_name,
GLOBAL_CONSTANT_NAME, global_var_name, instance_var_name, function_parameter_name, local_var_name.


Names to Avoid
 single character names except for counters or iterators
 dashes (-) in any package/module name
 __double_leading_and_trailing_underscore__ names (reserved by Python)
          Python keywords (Python 2.x does not restrict variable names like ‘sum’, ‘False’ etc)  



Guidelines derived from Guido's Recommendations

 

 

Type

Public

Internal

Packages

lower_with_under

 

Modules

lower_with_under

_lower_with_under

Classes

CapWords

_CapWords

Exceptions

CapWords

 

Functions

lower_with_under()

_lower_with_under()

Global/Class Constants

CAPS_WITH_UNDER

_CAPS_WITH_UNDER

Global/Class Variables

lower_with_under

_lower_with_under

Instance Variables

lower_with_under

_lower_with_under (protected) or __lower_with_under (private)

Method Names

lower_with_under()

_lower_with_under() (protected)

Function/Method Parameters

lower_with_under

 

Local Variables

lower_with_under

 

 

 



-------------------------------------------------------------------------

Future items to add:


- Properties

- Default Argument Values

- Lexical Scoping

- Function and Method Decorators

-------------------------------------------------------------------------


  • No labels