-
Notifications
You must be signed in to change notification settings - Fork 7
Python for Perl Programmers
Perl programmers are used to certain behaviors. Python does things differently. Here are some differences (and gotchas) we've found.
This page uses Python 3. There are some significant differences between Python 2 and 3, too. See Sebastian Raschka's Key differences between Python 2.7.x and Python 3.x for info on this.
The type of dict.copy
is always a dict
. The type of copy.copy(DictSubClass())
is DictSubClass
, but DictSubclass.copy
is dict
. PKDict
implements copy
for this reason.
The fact that strings are iterables is very convenient. In Perl, you would:
for my $x (split('', 'abc')) {
print("$x\n");
}
In Python, this is simply:
for x in "abc":
print(x)
Nice. This can be useful, but it can also be a gotcha.
This is obvious:
for c in ("abc", "def", "ghi"):
print(c)
abc
def
ghi
This is not:
for c in ("abc"):
print(c)
a
b
c
A single element tuple is not a tuple, it's a parenthesized expression. Perl has this issue, too, but Perl defines the evaluation of an expression in terms of the context in which it is evaluated.
In Python, to turn a single expression into a tuple, follow it with a comma (note the parens on tuples are optional):
for c in ("abc",):
print(c)
abc
However, the safer code is probably just to make it a list if you are iterating over a constant list:
for c in ["abc"]:
print(c)
nonlocal
or global
variables are implicitly read, but must be explicitly specified for write. For example, No assertion is raised in this case:
def outer():
x = 0
y = 1
def inner():
x = 1
return 1 + y
assert inner() == 2
assert x == 0
outer()
This can be confusing, because usually, you are not modifying the outer variables in a closure, just
reading them. If you are using a variable as a sentinel in the closure, you will not get the proper
behavior unless you initialize it first in the outer scope and reference it as a nonlocal
.
In Perl, variables are implicitly read or written across scopes.
In Perl, you call private methods directly, e.g. _priv($self)
. The
purpose of calling this way is to avoid method dispatch to a
private method by the same name in another class, that is, C2
inherits
from C1
, and they both have a private method named _priv
. The point
of private methods is that they are private to the class, and can't be
overridden accidentally.
In Python, if you don't want a method to be overriden in a class, you
need to name it with two leading underscores. This seems to be be rarely
used, and many classes use self._priv
, which is really like a "protected"
method in Java without language enforcement. Therefore, you will always
want to use self.__priv
form of method dispatch. This too is not exactly
private, but it is good enough to prevent mistakes caused by accidentally
naming the same private method in two subclasses.
This is a legitimate way to concatenate strings:
x = "a" "b"
assert x == "ab"
The problem is something like this:
x = ["a", "b", "c" "d"]
assert x == ["a", "b", "cd"]
You don't normally put a trailing comma in Python, like in Perl, so you might add to a list, and miss that trailing comma, which would yield something very different if the comma were there.
There is some confusion on the net about __builtins__
. If you want to get
an builtin object from a name, you have to:
import __builtin__
int_instance = getattr(__builtin__, "int")
Some people say incorrectly to use __builtins__
. The value of
__builtins__
is initialized to __builtin__
in the __main__
module so it will work initially, but not say, when called from a
test.
See https://docs.python.org/2/reference/executionmodel.html for more discussion
Unfortunately, in Python 3, __builtin__
becomes builtin
.
In Perl, if you watch to catch all exceptions, you just catch all exceptions. In
Python, exceptions are used for control flow, such as KeyboardInterrupt
and
SystemExit
. To catch all exceptions, use this:
try:
...op...
except Exception:
...handle op error...
Of course, if you need to do something whenever the try
exits, use finally
.
Python is very confused
about how to check your operating system. platform.system()
is the most
modern API, and it doesn't have the dreaded linux2
vs linux3
problem.
Cygwin is strange, because platform.system()
reports CYGWIN_NT-6.3
, which seems to go against the grain of the
simplicity of what platform.system()
is supposed to return.
Some people recommend:
sys.platform.startswith("linux")
However, I think you have to do:
sys.platform.startswith("win32")
win32
is fixed, because Microsoft doesn't want win64
.
platform.system()
says it can return Java
, which is also strange.
pybivio.platform abstracts this with
calls like is_linux
and is_windows
.
Can't really explain better than this.
You'll get this error if you are trying to super
an old-style class:
TypeError: must be type, not classobj
You can use issubclass
(not isinstance
) to check for a new-style class:
isinstance(setuptools.command.test.test, object)
If it isn't, you have to be explicit (like in old-style perl):
def initialize_options():
setuptools.command.test.test.initialize_options(self)
Dumps a lot of useful information:
DISTUTILS_DEBUG=1 pip install -e .
One of the most important aspects of dynamic languages is "duck typing": being able to ask questions about an object to determine if it implements a defined interface (collection of methods). Python excels in this regard, but one thing I can't figure out how to do is distinguish between a sequence and a map.
Both maps (dict
, etc.) and sequences (str
, list
, etc.) implement __getitem__
and __iter__
, but
they behave differently. dict
returns keys from iterators and sequences return values. This is significant
if you want to implement an generic initializer for a mapping type
[see pykern.pknamespace
] (https://github.com/radiasoft/pykern/blob/master/pykern/pknamespace.py).
Consider this:
class M(object):
def init(self, values):
self._map = {}
for k in values:
self._map[k] = values[k]
This works, for example, if you pass in an empty sequence, because the iterator returns empty. However,
it fails with a non-zero sequence, e.g. M(['a']) with a TypeError
because the iterator is returning
the values of the sequence so they can't be passed to __getitem__
.
You don't want to hardwire the test for dict
, because the initializer (values
) can be any mapping
type as long as it implements __iter__
and __getitem__
.
Compile codes with debugging. In gdb
, you can print objects:
(gdb) call _PyObject_Dump(PyExc_ImportError)
object : <type 'exceptions.ImportError'>
type : type
refcount: 12
address : 0x7c3680
(gdb) p errno
$1 = 2
(gdb) p PyExc_ImportError.ob_type
$2 = (struct _typeobject *) 0x7d4680 <PyType_Type>
(gdb) p *PyExc_ImportError.ob_type
$3 = {ob_refcnt = 39, ob_type = 0x7d4680 <PyType_Type>, ob_size = 0,
tp_name = 0x5697f6 "type", tp_basicsize = 872, tp_itemsize = 40,
tp_dealloc = 0x4967b0 <type_dealloc>, tp_print = 0x0, tp_getattr = 0x0,
tp_setattr = 0x0, tp_compare = 0x0, tp_repr = 0x49a640 <type_repr>,
You can modify the extension to print an object:
PYCHECK( PyList_Type.tp_print((PyObject *)sys_path, stdout, 0) );
In PyKern, configuration files
are python modules. In order to parse these files, we would want to use execfile
but that has been deprecated
in Python 3, and it is generally considered harmful to use eval.
In BOP, we have used Perl modules for configuration quite successfully over the last 15 years. I (@robnagler) talked with an ex-Bivion a little while ago, and he said that he was struggling with 10 year-old legacy code that relied on YAML for configuration files. Sometimes, he said, you just want to write code. That's what we've found so that's how pkconfig came about.
However, it turns out that there's no easy way to import a Python file as a module. You can
use
runpy
or
equivalent recipes,
but they don't return a module object, which is what pkconfig
needs to
select the appropriate
channel configuration
at run-time.
That's why we wrote pkrunpy, which does the right thing.
os.path.join
is different File::Spec::join
(catfile):
use File::Spec
File::Spec->join('/a', '/b') == '/a//b' || die;
whereas os.path.join
does:
>>> import os.path
>> os.path.join("/a", "/b")
'/b'
"[os.path.join] joins one or more path components intelligently.", which means: "If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component."
Python's list.reverse
is destructive, but there's a trick you can use to implement
a non-destructive reverse, e.g.
>>> range(3)[::-1]
[2, 1, 0]
The third slice argument is an Extended Slice, which allows some pretty interesting operations by value, e.g.
>>> range(8)[::2]
[0, 2, 4, 6]
>>> range(8)[1::2]
[1, 3, 5, 7]
If you haven't seen it yet, the WAT talk iz da besta.
Now, let's talk about Python:
$ python
>>> -1 < {}
True
>>> 0 < {}
True
>>> 1 < {}
True
WAT?