Tuesday, December 20, 2005

A whiff of Cheesecake

I've been pretty busy lately, but I wanted to take the time to work a bit on the Cheesecake project, especially because I finally got some feedback on it. I still haven't produced an official release yet, but people interested in this project (you two know who you are :-) can grab the source code via either svn or cvs:

SVN from tracos.org:
svn co http://svn.tracos.org/cheesecake
CVS from SourceForge:
cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/cheesecake co -P cheesecake

(all in one line)
Here are some things that have changed:
  • The cheesecake module now computes 3 partial indexes, in addition to the overall Cheesecake index (thanks to PJE for the suggestion):
    • an INSTALLABILITY index (can the package be downloaded/unpacked/installed in a temporary directory)
    • a DOCUMENTATION index (which of the expected files and directories are present, what is the percentage of modules/classes/methods/functions with docstrings)
    • a CODE KWALITEE index (average of pylint score)
  • The license file is now considered a critical file (thanks to Will Guaraldi for the suggestion)
  • The PKG-INFO file is no longer checked (the check was redundant because setup.py is already checked for)
Here are some things that will change in the very near future.

Per Will Guaraldi's suggestion, I'm thinking of changing the way the index is computed for required non-critical files. Will suggested "Would it make sense to use a 3/4 rule for non-critical required files? If a project has 3/4 of the non-critical required files, they get x points, otherwise they get 0 points". I'm actually thinking of having a maximum amount of 100 points for the "required files" check and give 50 points if at least 40% of the files are there, 75 points if at least 60% of the files are there and 100 points if at least 80% of the files are there. This might prove to be more encouraging for people who want to increase their Cheesecake index.

Another one of Will's observations: "For the required files and directories, it'd be nice to have Cheesecake output some documentation as to where to find documentation on such things. For example, what content should README contain and why shouldn't I put the acknowledgements in the README? I don't know if this is covered in the Art of Unix Programming or not (mentioned above)--I don't have a copy of that book. Clearly we're creating standards here, so those standards should have some documentation."

I'm thinking of addressing this issue by computing the Cheesecake index for a variety of projects hosted at the CheeseShop. The results will be saved in a database file (I'm thinking of using Durus for its simplicity) and the cheeseshop module will be able to query the file for things such as:
  • show me the URLs for projects which contain README files
  • show me the top N projects in the area of INSTALLABILITY (or DOCUMENTATION, or CODE KWALITEE, or OVERALL INDEX)
This will allow a package creator to look at stuff other people are (successfully) doing in their packages.

The way I see this working as a quick fix is that I will generate the database file on one of my servers, then I will make it available via svn.

In the long run, this is obviously not ideal, so I'm thinking about putting a Web interface around this functionality. You will then be able to see the top N projects in a nice graph, then issue queries via the Web interface, etc...(if only days had 48 hours)

Some things I also want to add ASAP:
  • use pyflakes in addition to pylint
  • improve CODE KWALITEE index by inspecting unit test stuff: number of unit tests, percentage of methods/functions unit tested, running the coverage module against the unit tests (although it might prove tricky to do this right, since a package can rely on a 3rd party unit test framework such as py.test, nose, etc.)
As always, suggestions/comments are more than welcome.

No comments:

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...