Saturday, July 2, 2011

Getting close to mid-term targets

Last two weeks were quite productive. I've finished most issues on my to-do list for first half of GSoC, and now working on the mid-term target - getting translator translate itself.

No more translation errors for python 2.7 syntax

There are no more red items at libs status page - pyjamas now understands all 2.7 syntax, and produces some kind of output for it. Sometimes it still results in JS syntax errors, but its good enough to correctly translate most of python stdlib without any modifications.
This includes:
  • Complex assignments, including deep unpacking - ((a, b), [c, d]), e = someiterable is working as expected (issue #527)
  • Assignments in `for` statements - turns out you can do things like `for x.a.b, d[key] in iterable` in python, and you can do same in pyjamas now.
  • Tuple key assignment works correctly (issue #496)
  • Set literals
  • Set and dict comprehensions
  • `With` statement, including support for `with ctxt1() as v1, ctxt2() as v2`  notation
  • "Discard expressions" - often used for NameError checking. Needs some more work on corner cases, but works in general (issue #584)
  • [::] and slice() now return sliceobjects, which are part of new slice protocol (issues #585, #364, and some other)
  • Getattr and callfunc over expressions, like `('a' + 'b').strip()` (issue #591)
  • Callfunc decorators, like `@decorator()` (issue #369)
  • Complex type, including complex literals, like `1+1j`
  • Top-level assert statements
Some builtin modules
I've added some typically builtin modules, like `operator`, `errno`, `weakref`. They are generally useless in pyjamas (except operator, but thats still rare), but some stdlib modules depend on them, so its good to have them.
sys module received some enhancement, esp. for PyV8 environment - now it retrieves stdin, stdout, stderr pipes from parent app correctly, and also retrieves correctly trimmed argv, so modules like optparse could work correctly.

Compiler module passes most of the tests
This is another significant milestone reached - pyjs now can correctly compile compiler/parser/grammar modules into javascript, and resulting javascript can parse python syntax and generate appropriate AST. Parser/tokenize still fails on some input due to regex incompatibility and obscure pyjs bugs, but 18 of 21 tests passed is good.

I'm going to keep working on making compiler pass existing tests and adding new tests for 2-3 more days, and then switch to `pyjs.translator`. There are some bugs - mainly class-related, like try/except in class definition scope - I also hoped to fix before mid-term, but since there are only two weeks left, translator is top priority for now.

Tuesday, June 14, 2011

Working towards compiler compiling

Missed some weekly posts due to 'okay, just one more feature and I'll make the post' syndrome and overall exam-time mess. I have one more exam at 16th, and then nothing should hold me from hacking on pyjamas full-time.

So here's what I've been doing meanwhile:

Libs status:
I've implemented simple pyjamas app to track dependency trees of particular modules and provide report on each dependency, http://pyjslibs.appspot.com/
Big goal is to have 'translator' target green. Its direct prerequisite - to green 'compiler' target.
It reports some unneeded modules as well, since at this moment linker tries to compile all imports, even if they are within try/except or if statements.

Tools:
  • pyv8run, my primary testing and developing tool for pyjamas is now fixed and significantly improved. I've merged it with my older pyv8shell tool, and it now supports REPL mode itself. REPL mode still has some importing issues, but importing itself is still big mess in Pyjamas, some bigger things like `sys.path`, import hooks, imp module, __import__ builtins are yet to be implemented properly.
  • depstest - tool I've created to generate data for  pyjslibs status app above.
  • generate_stdlib - simple script that puts together stdlib from pyjamas/lib, pypy/lib and stock cpython lib. Idea here is to support as many stock (unchanged) cpython modules as possible, so we would not have to maintain them ourselves.
  • test.py - script that runs all different tests we have to run each time something is modified. Fixed some import issues with libtest and cpython along the way.
Translator, pyjslib, pyjs/lib


I've implemented some missing features and added some libraries, but this wasn't primary goal yet,
  • pyjspath (os.path) module, which does path manipulations but obviously lacks filesystem functions
  • types module - pyjamas does not use separate types/classes for some things like tracebacks/frames/code, so it had to be hacked a bit to work.
  • Added __doc__ attribute to modules
  • Implemented __builtins__ alias for builtins
  • Implemented __builtin__ importing
  • Partially implemented globals() 
  • Implemented `from module import *`
Internal compiler and parser modules

As I've explained on mailing list, pgen/lib2to3 were too heavy to use directly, so I've spent this week chopping them into small, separate `compiler`, `parser`, `symbol` and `token` modules, with same interface as according cpython lib modules, and separate script to generate grammar, symbol and token files.
This work is done, and translator in my internalast branch now uses only internal compiler and related modules. As Luke said, there is indeed some performance drop, but with caching its not very significant, and there is still a lot of space for optimization.
Heres some time measurements for compiling LibTest and all its dependencies with cpython compiler and internal compiler:

CPython:
real    0m27.505s
user    0m24.262s
sys     0m2.664s

Internal:
real    0m38.957s
user    0m35.622s
sys     0m2.668s

For the next week I will continue working towards getting this new compiler/parser to compile via pyjs and pass some basic python-parsing tests. Current status is listed at pyjslibs status app, via 'compiler' target.
I've solved all translator errors for compiler.*/parser.* modules, but there are still more for their cpython dependencies:
 #> ./pyv8run.py --strict ../stdlib/test/test_compiler.py
Traceback (most recent call last):
...
__main__.TranslationError: _weakrefset line 59:
unsupported type (in _stmt)
With(CallFunc(Name('_IterationGuard'), [Name('self')], None, None), None, Stmt([For(AssName('itemref', 'OP_ASSIGN'), Getattr(Name('self'), 'data'), Stmt([Assign([AssName('item', 'OP_ASSIGN')], CallFunc(Name('itemref'), [], None, None)), If([(Compare(Name('item'), [('is not', Name('None'))]), Stmt([Discard(Yield(Name('item')))]))], None)]), None)]))

`with` statement is #1 offender now, but typical pyjamas app, being executed by browser, has no open() capability anyway, so its not obvious whether its worth to invest time into implementing `with` now. I'll decide on it once other `compiler` issues are solved, maybe its easier to just pull python2.5 modules instead.


@jnowl was asking for skulpt vs emscripten vs pyjamas comparison, and since he is not first to ask about it, I'll post more detailed answer as separate post later.
Short answer:
1) Pyjamas is more mature and production-ready
2) Pyjamas overall is geared towards pyjamas (gwt port) widget set, and is solid solution for web application development. For any other usage, it can shoot you in the leg, and there are no docs.
3) Pyjamas translates python code into javascript code, which is executed via some javascript VM. Emscripten translates python interpreter into javascript, and runs python code via it, adding another layer to the stack. Latter is more 'correct' approach, while first is more practical. We are okay with lacking some cpython capabilities if it still gets the job done.

Thursday, April 28, 2011

Plan for preliminary work

My project was accepted, and therefore its time to do some preliminary work before official coding time starts.

First and foremost project goal is to make PyJS to be able to translate and run itself.
Basically, this means that it should be possible to do this:
#> $PyJSRepo/pyv8/pyv8run.py --strict translator --strict -o /tmp/translator.js $PyJSRepo/pyjs/src/pyjs/translator.py
And it should:
1. Translate translator.py and all its dependencies
2. Load result in pyv8
3. Use loaded in pyv8 translated Translator to translate translator.py again
4. Result of (3) has to be completely identical to result of (1)
This goal has to be achieved for mid-term evaluation (July 12). 

In order to achieve this goal we have to make pyjamas support all 180 modules on which translator.py depends.

At this moment, most of them fail with some kind of error either during translation or execution. These errors are caused by one of 3 issues:
1) Lack of support for specific syntax, like a[::] (ternary slice, bugs #364, #434, #577, #582)
2) Lack of support of specific features in pyjslib, like type(lambda x:x).
3) Restrictions of environment, like weakref.py. Such modules cannot be supported directly, and we have to provide replacement with compatible API or remove dependency on it from other modules.

First and second issues could be resolved via either fixing pyjs/pyjslib or providing replacement module, which does not use this particular feature.

To help myself track progress and properly prioritize work, I'll make simple pyjamas web-app, which will show list of all modules Translator uses, and show results of translation/evaluation of each of them.

Therefore plan for next week is as follows:

1) Find/make tool to produce dependency graphs for translate.py, compiler.py and other base modules we would need.
2) Add simple compile/eval test suite for all modules from (1)
3) Make pyjamas-GAE web app to show results from (2)
4) Fix import path issues in pyv8* tools
5) Fix issues in master for pyv8run

This will be useful for tracking stdlib support as well and providing good overview of current pyjs compatibility with cpython.

Saturday, April 16, 2011

Introduction

PyJS, which is part of Pyjamas package, is some very under-appreciated tool in Python world, and generally is not recognized as 'python implementation' and often not considered as separate from widget set.
This is partially due to lack of documentation and clear API for use-cases other than Pyjamas itself. Translator class has rigid templates, limiting its applications, and Linker/Builder are very pyjamas-centric, pulling in DOM/Widget libraries by default right now.

Moreover, right now translator lacks ability to compile itself, and do compilation in run-time independent of CPython (or other base implementation), making it look far less capable, than it actually is.

As I'm one of those who are interested in PyJS itself rather than whole Pyjamas suite, I felt that GSoC would be a great opportunity to fix these issues and bring some public attention to PyJS as stand-alone tool, capable of running arbitrary python on javascript VMs like Google V8 and SpiderMonkey, having potential to be even faster than CPython thanks to advanced JIT of these platforms.

While this may sound as big undertaking, most of needed infrastructure is already there - we have pure-python python parser, translator itself is pure python as well, and road to making it independent of CPython is straight.

Improving API and supporting statement/expression-level compilation is harder, as currently translator lacks globals()/locals() support, and implementing it isn't trivial, but definitely possible.

Another step would be to update and improve CLI utilities we ship, and include typical 'python' utility to make it familiar for non-pyjamas users. I've already worked on this part, but API update for translator/linker should come first. While I'm at this, I would also need to add tests for CLI tools themself, as breaking one of them and not noticing is too common right now.

Aside from confidence increase and improvement of public image, there would be enough direct benefits for Pyjamas as well:
  • Ability to do compilation in run-time could let us send raw python code to the browsers, significantly reducing total javascript size (up to 10 times).
  • Improvement and simplification of widgets development, bringing full power of python introspection and interactivity to browsers environment.
Another goal is to evaluate ability to support python 3 syntax for parser and translator, and ability to do initial compilation with python 3 interpreters.
While its too early to talk about full python 3 support for Pyjamas, as that would take too much time and would be too hard to maintain afterwards, parser and translator themselves are much more feasible target for GSoC.
Since we are translating python into not-so-pythonic javascript, its possible to translate most of python 3 into same javascript with (mostly) same pyjslib, with some prior AST transformations.
3to2 project achieved a lot in figuring what can be translated and what not, so first step here would be to study their findings and see what else could be resolved by python-to-javascript translation.
Most likely we could not support all of python 3 at once, but bottom line is to document all changes needed for full support, and keep python 3 support in mind while re-designing translator, so we don't end up dead-locked with python 2 later.