This year's EuroPython in Berlin – the largest european conference concerning the Python programming language – is over. It has been a great conference, with a lot of interesting talks and some great social events.
For those of you that didn't attend but are interested in Python and programming, we created short summaries of some of the many talks. In contrast to last year's EuroPython blogpost, this one is going to be fairly technical. So if you only understand gibberish but would like to know more, I can highly recommend the free CodeAcademy Python course.
Constanze is the spokesperson of the German Chaos Computer Club and was involved with various legal fights in German courts regarding basic rights and online freedom. In her talk, she summarized the surveillance situation in the world right now and how we are affected.
Bob highlighted some weaknesses of Python in comparison with Haskell and other newer programming languages. He highlighted that due to the dynamic nature and missing analysis tools it is very hard to find some types of errors before runtime. He then showed different approaches to solve this problem, like using mypy and adding type annotations to functions and variables. He also criticized mutability, missing algebraic data types, performance and the GIL.
Petr mostly talked about how attribute access works, what descriptors are and how they are being used. Basically, descriptors are everywhere. A descriptor is a simple class that has a __get__ method and may have a __set__ and a __delete__ method. You probably know the @property decorator, well guess what – that decorator just generates a descriptor in the background. The advantage of a descriptor class over a property is that it can have its own encapsulated state and behavior. Methods are also descriptors, with the job to bind self to the current instance. To see how attributes are actually accessed, see this magic formula.
If you want to call other programming languages from Python (e.g. for reasons of speed or integration), there are three possibilities. Native C extensions look very complicated, they allow you to write Python functions directly in C. There's also the ctypes module. With ctypes, you don't code your Python function in C directly, but you can call C functions from your Python code. It's still a bit complicated though. The third possibility is CFFI, a fairly recent library. It also allows to call functions from shared libraries. It's easier to use and it works on PyPy. You can even write inline C code in your Python module and it will automatically be compiled and initialized. Conclusion: CFFI is easier to use and more portable than the other two solutions. You should give it a try!
High concurrency hardware is getting cheaper (see Parallella board). But threads and locking are tough. CSP (Communicating Sequential Processes) is easier. In CSP, the processes are made up of sequential tasks that can all run at the same time because they don't share any data. Data is passed via channels. Synchronization is done via events. Message passing is already in Unix (processes communicating via pipes). It is also a core part of Golang and Rust. Scala also supports it. If processes are implemented as coroutines, they can be very fast and cheap, but they cannot take advantage of multicore processors. They can also be OS threads or processes, which is slower but allows to take advantage of multiple cores or CPUs.
Sarah wrote a library called python-csp that also allows this kind of programming in Python. And one of her students implemented a simple language called naulang based on the RPython toolchain that allows this kind of concurrency.
My favorite quote from the talk: "Scala as a JVM language is taking up the whole screen for the code example."
Tom (main author of the Django REST Framework) talked about his new tool to generate documentation from Markdown syntax. It looks like a very easy and straightforward way to create simple HTML docs for your project if you don't need all those fancy features that Sphinx offers.
First Armin Rigo came on stage and said "Hi I'm Armin, and I'm happy to ... let Romain talk.". Then Romain Guillebert took over and said "Hi, this is the biggest crowd I've ever spoken before, so ... sorry." After this entertaining introduction (come on, their job is to write code, not to talk!) they presented the current state of the PyPy project and answered questions. No, PyPy is not dead!
Armin presented his work concerning the implementation of software transactional memory (STM) in PyPy. He talked about the GIL and how STM is probably the best approach to get rid of it without breaking backwards compatibility. The results look really promising. You can read everything about their implementation in the docs. Oh, and you should probably donate to the project!
CPUs are getting faster quickly, but memory speed and latency isn't increasing as fast. So today CPUs are starving, because memory access is too slow. Valentin proposes to keep the data in the memory compressed and to decompress it in the CPU.
There's a library for this called Blosc. It's a metacodec, it does not compress data itself, but uses other compression codecs. It cuts data into blocks and optimizes them for compression using filters. The currently only filter reorders bytes by bit significance. That brings along a much better compression ratio and speed. The process is also being made faster by using multithreading and SIMD instructions on the CPU. If you want to use blosc from Python, use python-blosc.
Carl is the author of landscape.io, an online service that does quality checking for your project. He first showed typical code quality problems that can be detected with static code analysis tools. He recommended to use these tools often, and to measure and track the progress. The number of warnings could for example be sent to a Graphite server. The change of this number over time is actually more relevant to your project than the absolute number of warnings. In the end, this will save you tons of time.
Dave has been working on the Jedi autocompletion library for two years now. The project has grown and become mature, it can handle most Python features via static analysis. In contrast to other tools that build a type tree, it works through recursion and lazy evaluation. That makes it fast. Dave has started implementing a linting feature for Jedi that can find a lot of errors before runtime. Because Jedi understands most Python code, it can find problems that other tools can't find. Bazinga!
Mark did a great talk on how to properly write command line applications using Python. He touched topics like different argument parsing libraries, input/output handling, signals, configuration files, colors, progress output, formatting, setup.py and more. Worth watching!
During this talk, Dmitry basically explained the implementation of a full Python remote debugger, starting with a simple trace functions and finishing with a tool that supports remote debugging using sockets, conditional breakpoints, multithreading and more. Everything is implemented on top of Python's sys.settrace function.
Maciej and Fabrizio talked about their company, where they started to teach Python to the non-programmer colleagues. They used IPython Notebooks on a centralized server, so that nobody had to go through a tedious local installation process. They also started to create libraries and services to wrap different data sources. Their experience with these courses was very positive.
This was a very interesting talk about the memory management internals in cPython. Piotr talked about memory sizes for different data types, about interning and about how collections are allocated, grown and shrunk. He quickly talked about reference counting and garbage collection and then showed a few useful tools to debug memory problems, like psutil, memory_profiler, objgraph, heapy and meliae + snakerun.
This talk touched very similar topics as the previous one. Tomasz showed a very typical use case where big memory areas are allocated and released by Python, but not by the operating system. The reason is probably memory fragmentation. Solutions or workarounds for this are to use an alternative malloc implementation, to use a different Python interpreter like PyPy or Stackless, or to run tasks with a big memory consumption in a separate subprocess.
The EuroPython 2014 team did a great job organizing this conference. The talks were interesting, the food was great and the atmosphere was very relaxed. All in all, there were 1'226 attendees that used 1'100 GB of internet traffic. On the beverage side of things, they consumed 10'000 bottles of Club Mate and Fritz Cola, and drank 1'700 cocktails as well as 885 liters of beer.