Skip to content

Python for Perl Programmers

Gurhar Khalsa edited this page Jul 14, 2022 · 6 revisions

Perl programmers are used to certain behaviors. Python does things differently. Here are some differences (and gotchas) we've found.

This page uses Python 3. There are some significant differences between Python 2 and 3, too. See Sebastian Raschka's Key differences between Python 2.7.x and Python 3.x for info on this.

dict.copy is not same as copy.copy

The type of dict.copy is always a dict. The type of copy.copy(DictSubClass()) is DictSubClass, but DictSubclass.copy is dict. PKDict implements copy for this reason.

Strings are Iterables

The fact that strings are iterables is very convenient. In Perl, you would:

for my $x (split('', 'abc')) {
    print("$x\n");
}

In Python, this is simply:

for x in "abc":
    print(x)

Nice. This can be useful, but it can also be a gotcha.

Single Elements are not Tuples:

This is obvious:

for c in ("abc", "def", "ghi"):
    print(c)
abc
def
ghi

This is not:

for c in ("abc"):
    print(c)
a
b
c

A single element tuple is not a tuple, it's a parenthesized expression. Perl has this issue, too, but Perl defines the evaluation of an expression in terms of the context in which it is evaluated.

In Python, to turn a single expression into a tuple, follow it with a comma (note the parens on tuples are optional):

for c in ("abc",):
    print(c)
abc

However, the safer code is probably just to make it a list if you are iterating over a constant list:

for c in ["abc"]:
    print(c)

nonlocal and global are implicit read

nonlocal or global variables are implicitly read, but must be explicitly specified for write. For example, No assertion is raised in this case:

def outer():
    x = 0
    y = 1

    def inner():
        x = 1
        return 1 + y

    assert inner() == 2
    assert x == 0


outer()

This can be confusing, because usually, you are not modifying the outer variables in a closure, just reading them. If you are using a variable as a sentinel in the closure, you will not get the proper behavior unless you initialize it first in the outer scope and reference it as a nonlocal.

In Perl, variables are implicitly read or written across scopes.

Private methods: _priv vs __priv

In Perl, you call private methods directly, e.g. _priv($self). The purpose of calling this way is to avoid method dispatch to a private method by the same name in another class, that is, C2 inherits from C1, and they both have a private method named _priv. The point of private methods is that they are private to the class, and can't be overridden accidentally.

In Python, if you don't want a method to be overriden in a class, you need to name it with two leading underscores. This seems to be be rarely used, and many classes use self._priv, which is really like a "protected" method in Java without language enforcement. Therefore, you will always want to use self.__priv form of method dispatch. This too is not exactly private, but it is good enough to prevent mistakes caused by accidentally naming the same private method in two subclasses.

concatenated adjacent strings

This is a legitimate way to concatenate strings:

x = "a" "b"
assert x == "ab"

The problem is something like this:

x = ["a", "b", "c" "d"]
assert x == ["a", "b", "cd"]

You don't normally put a trailing comma in Python, like in Perl, so you might add to a list, and miss that trailing comma, which would yield something very different if the comma were there.

builtins vs builtin

There is some confusion on the net about __builtins__. If you want to get an builtin object from a name, you have to:

import __builtin__

int_instance = getattr(__builtin__, "int")

Some people say incorrectly to use __builtins__. The value of __builtins__ is initialized to __builtin__ in the __main__ module so it will work initially, but not say, when called from a test.

See https://docs.python.org/2/reference/executionmodel.html for more discussion

Unfortunately, in Python 3, __builtin__ becomes builtin.

Catching all exceptions

In Perl, if you watch to catch all exceptions, you just catch all exceptions. In Python, exceptions are used for control flow, such as KeyboardInterrupt and SystemExit. To catch all exceptions, use this:

try:
    ...op...
except Exception:
    ...handle op error...

Of course, if you need to do something whenever the try exits, use finally.

Operating system check ($^O or $OSNAME)

Python is very confused about how to check your operating system. platform.system() is the most modern API, and it doesn't have the dreaded linux2 vs linux3 problem.

Cygwin is strange, because platform.system() reports CYGWIN_NT-6.3, which seems to go against the grain of the simplicity of what platform.system() is supposed to return.

Some people recommend:

sys.platform.startswith("linux")

However, I think you have to do:

sys.platform.startswith("win32")

win32 is fixed, because Microsoft doesn't want win64.

platform.system() says it can return Java, which is also strange.

pybivio.platform abstracts this with calls like is_linux and is_windows.

super is tricky

Can't really explain better than this.

You'll get this error if you are trying to super an old-style class:

TypeError: must be type, not classobj

You can use issubclass (not isinstance) to check for a new-style class:

isinstance(setuptools.command.test.test, object)

If it isn't, you have to be explicit (like in old-style perl):

def initialize_options():
    setuptools.command.test.test.initialize_options(self)

debugging setup.py

Dumps a lot of useful information:

DISTUTILS_DEBUG=1 pip install -e .

Duck Typing: Maps vs Sequences

One of the most important aspects of dynamic languages is "duck typing": being able to ask questions about an object to determine if it implements a defined interface (collection of methods). Python excels in this regard, but one thing I can't figure out how to do is distinguish between a sequence and a map.

Both maps (dict, etc.) and sequences (str, list, etc.) implement __getitem__ and __iter__, but they behave differently. dict returns keys from iterators and sequences return values. This is significant if you want to implement an generic initializer for a mapping type [see pykern.pknamespace] (https://github.com/radiasoft/pykern/blob/master/pykern/pknamespace.py).

Consider this:

class M(object):
    def init(self, values):
        self._map = {}
        for k in values:
            self._map[k] = values[k]

This works, for example, if you pass in an empty sequence, because the iterator returns empty. However, it fails with a non-zero sequence, e.g. M(['a']) with a TypeError because the iterator is returning the values of the sequence so they can't be passed to __getitem__.

You don't want to hardwire the test for dict, because the initializer (values) can be any mapping type as long as it implements __iter__ and __getitem__.

Debugging extensions

Compile codes with debugging. In gdb, you can print objects:

(gdb) call _PyObject_Dump(PyExc_ImportError)
object  : <type 'exceptions.ImportError'>
type    : type
refcount: 12
address : 0x7c3680
(gdb) p errno
$1 = 2
(gdb) p PyExc_ImportError.ob_type
$2 = (struct _typeobject *) 0x7d4680 <PyType_Type>
(gdb) p *PyExc_ImportError.ob_type
$3 = {ob_refcnt = 39, ob_type = 0x7d4680 <PyType_Type>, ob_size = 0,
  tp_name = 0x5697f6 "type", tp_basicsize = 872, tp_itemsize = 40,
  tp_dealloc = 0x4967b0 <type_dealloc>, tp_print = 0x0, tp_getattr = 0x0,
  tp_setattr = 0x0, tp_compare = 0x0, tp_repr = 0x49a640 <type_repr>,

You can modify the extension to print an object:

PYCHECK( PyList_Type.tp_print((PyObject *)sys_path, stdout, 0) );

Running Python Code

In PyKern, configuration files are python modules. In order to parse these files, we would want to use execfile but that has been deprecated in Python 3, and it is generally considered harmful to use eval.

In BOP, we have used Perl modules for configuration quite successfully over the last 15 years. I (@robnagler) talked with an ex-Bivion a little while ago, and he said that he was struggling with 10 year-old legacy code that relied on YAML for configuration files. Sometimes, he said, you just want to write code. That's what we've found so that's how pkconfig came about.

However, it turns out that there's no easy way to import a Python file as a module. You can use runpy or equivalent recipes, but they don't return a module object, which is what pkconfig needs to select the appropriate channel configuration at run-time.

That's why we wrote pkrunpy, which does the right thing.

os.path.join

os.path.join is different File::Spec::join (catfile):

use File::Spec
File::Spec->join('/a', '/b') == '/a//b' || die;

whereas os.path.join does:

>>> import os.path
>> os.path.join("/a", "/b")
'/b'

"[os.path.join] joins one or more path components intelligently.", which means: "If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component."

reverse()

Python's list.reverse is destructive, but there's a trick you can use to implement a non-destructive reverse, e.g.

>>> range(3)[::-1]
[2, 1, 0]

The third slice argument is an Extended Slice, which allows some pretty interesting operations by value, e.g.

>>> range(8)[::2]
[0, 2, 4, 6]
>>> range(8)[1::2]
[1, 3, 5, 7]

wat: 1 < {}

If you haven't seen it yet, the WAT talk iz da besta.

Now, let's talk about Python:

$ python 
>>> -1 < {}
True
>>> 0 < {}
True
>>> 1 < {}
True

WAT?