Tag Archives: PTVS

New New Project from Existing Code

Last time I wrote about the New Project from Existing Code wizard in Visual Studio and how we extended it to provide support for Python. This time, I’m going to look at the replacement for this dialog.

As discussed, there are a number of issues with reusing the managed languages wizard (C# and VB) for Python, largely due to the fact that it was never intended to be extensible. It is also difficult or impossible to (reliably) provide an alternate implementation, and in any case the feature is not very easy to discover. Because the usual way to create a new project normally involves skipping the menu and going straight to the list of templates, the aim was to add an item to this dialog:

New Project dialog with From Existing Python code selected

The three steps involved are creating the wizard, providing the necessary interfaces for Visual Studio and creating a template to start the wizard. The code is from changeset a8d12570c484, which was added prior to PTVS 1.5RC.

Designing a Wizard

Since this approach does not rely on reusing existing code, we were free to design the wizard in a way that flows nicely for Python developers and only exposes those options that are relevant, such as the interpreter to use with a project and the main script file. Because not every option needs to be used, and there are obviously ways to change them later, the wizard was set up with two pages. The first page collected the information that is essential to importing a project, this being the source of the files, a filter for file types and any non-standard search paths.

New Project from Existing Python Code Wizard page one

To avoid forcing the user to guess what each value means or how it behaves, lighter explanatory text is included directly in the window. (In earlier days these would have been tooltips, but with touch starting to become more prominent, requiring hovering is poor UI design.) The tone is deliberately casual and reassuring – one of the surprises people often find with Visual Studio is that adding an existing file will copy it into the project folder. When importing very large projects, it is far more desirable to leave the files alone and put the project in a nearby location. Because we support a ProjectHome element in our .pyproj files, we can treat any folder as the root of the project (this will no doubt be the subject of a post in the future).

The second page is entirely optional, and while it cannot be skipped entirely from the first, once users are familiar with the dialog it is very easy to click through without changing the default selections:

New Project from Existing Python Code Wizard page two

The two options on this page relate to Visual Studio settings, specifically, which version of Python should be used when running from within VS and which file should be used for F5/Ctrl+F5 execution (as opposed to using the “Start with/without Debugging” option on a specific file). Again, the light grey explanatory text reassures the user that any selection made here is not permanent and provides directions on how to make changes later. The second option – which file to run on F5 – also suggests that not all files will appear in the list. For performance reasons, only *.py (and *.pyw) files in the root directory are shown, since showing all files would require a recursive directory traversal (which is slow) and produce a much longer list of files (hard to navigate). Since Python does not allow the import statement to traverse up from the started script (in typical uses), most projects will have their main file in the root of the project. (That said, there is a chance that this aspect of the dialog will change for the next release, either by including all files, switching to a treeview or simply not being offered as an option.)

When “Finish” is clicked, the rest of the task is quite straightforward: the files in the provided path are scanned for all those matching the filter and the $variables$ in FromExistingCode.pyproj are replaced. This produces a .pyproj file that is then loaded normally. (Contrast with the other approach that creates an empty project and adds each file individually. This way is much faster.) Details are in the following section.

IWizard and replacementsDictionary

Template wizards in Visual Studio are implemented by providing the IWizard interface. The methods on this interface are called at various points to allow customisation of the template, but only one method is of interest here: RunStarted. The how-to page covers the process in detail, but the basic idea is that RunStarted displays the user interface and updates the set of replacement strings, which are then applied to existing template files.

The only template file used is FromExistingCode.pyproj, which contains five variables for replacement: $projecthome$, $startupfile$, $searchpaths$, $interpreter$ and $content$. While the first three are simple values, $interpreter$ will be replaced by the GUID and version (InterpreterId and InterpreterVersion properties) that represents the interpreter selected on page two of the wizard, and $content$ is replaced by the list of files and folders. Strictly, this is a slight misuse of the templating system, which is intended for values rather than code/XML generation, but it works and is quite efficient.

When RunStarted is called (by VS), a dictionary is provided for the wizard to fill in with these values. This means that a lot of processing takes place within this one function, which is generally not how callbacks like this should behave. However, in this case, it is perfectly appropriate to use modal dialogs and let exceptions propagate – in particular, WizardBackoutException should be thrown if the user cancels out of the dialog (unlike the WizardCancelledException, backing out returns the user to the template selection dialog).

RunStarted is implemented (along with the other methods) in ImportWizard.cs, with UI and XML generation separated into other methods to allow for easier testing.

103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
public void RunStarted(object automationObject, Dictionary<string, string> replacementsDictionary, WizardRunKind runKind, object[] customParams) {
    try {
        var provider = new ServiceProvider((Microsoft.VisualStudio.OLE.Interop.IServiceProvider)automationObject);
        var settings = ImportWizardDialog.ShowImportDialog(provider);
 
        if (settings == null) {
            throw new WizardBackoutException();
        }
 
        SetReplacements(settings, replacementsDictionary);
    } catch (WizardBackoutException) {
        try {
            Directory.Delete(replacementsDictionary["$destinationdirectory$"]);
        } catch {
            // If it fails (doesn't exist/contains files/read-only), let the directory stay.
        }
        throw;
    } catch (Exception ex) {
        MessageBox.Show(string.Format("Error occurred running wizard:\n\n{0}", ex));
        throw new WizardCancelledException("Internal error", ex);
    }
}

One point worth expanding on is the Directory.Delete call in the cancellation handler. Because this is a new project wizard, VS creates the destination directory based on user input before the wizard starts. However, if the wizard is cancelled from within RunStarted, as opposed to failing before RunStarted can be called, the directory is not removed. To prevent the user from seeing empty directories appear in their Projects folder, we try and remove it. (That said, we don’t try very hard – if the directory has ended up with files in it already then it will not be removed.)

The .vstemplate File

The final piece of this feature is adding the entry to the New Project dialog and starting the wizard when selected by the user. Templates in Visual Studio are typically added by including .vstemplate files in a registered folder (optionally creating a registering a folder specific to an extension). These files are XML and specify the template properties and the list of files to copy to the destination directory. For templates that include wizards, an optional WizardExtension element can be added, as seen in FromExistingCode.vstemplate:

16
17
18
19
<WizardExtension>
  <Assembly>Microsoft.PythonTools.ImportWizard, Version=1.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a</Assembly>
  <FullClassName>Microsoft.PythonTools.ImportWizard.Wizard</FullClassName>
</WizardExtension>

Importantly, the assembly name must be a fully qualified name, which is why we used a fixed version number for this file (the rest of our assemblies use an automatically generated build number, which is fine for compiled code but not so simple to include in a data file that is embedded in a .zip file). Wizard is the name of the class in ImportWizard.cs that implements IWizard, and so VS is able to instantiate Wizard and invoke RunStarted as part of creating the new project. The entire template is simply added to our existing templates directory, and VS discovers the .vstemplate file and includes it in the list.

One of the concerns with the previous version of this wizard was discoverability: it was not easy to find the feature. To completely solve this problem, we set the SortOrder value of the template to 10, which is very low and all but guarantees that it will always appear first in the list. So now the first option that will appear to both new and existing users of PTVS is a simple way to use their projects without having to add each file individually.

Summary

In this post we looked at the new New Project from Existing Code wizard that replaced the one from last time. Not using the existing implementation allowed us to design the wizard specifically for Python code and implement it efficiently. We also made the feature more discoverable by placing it first in the list of templates, which is how new projects are usually created. This feature was first released in Python Tools for Visual Studio 1.5RC.

New Project from Existing Code

In this post, we’ll be looking at how version one of a particular feature was implemented. The implementation was not very good (I can say that, since I contributed it), but it filled a necessary gap until it could be done better. Python Tools for Visual Studio 1.1 had this implementation, while PTVS 1.5 uses a completely different approach (that might be the subject of a future blog, but not this one).

When creating a new project in Visual Studio, you typically begin from a list of templates that are specific to your language:

Visual Basic project templates

If you already have a set of existing code, the obvious approach is to select “Empty Project” and add the files to that project. This situation is common when working on cross-platform projects, which often do not include Visual Studio projects, or when you want to migrate from earlier versions without introducing the compatibility ‘hacks’ provided by the upgrade wizard. An alternative to manually importing the files into an empty project is to use the “Project from Existing Code” wizard, which is available under the “New Project” submenu:

Project from Existing Code menu item

By default, this wizard only supports C#, VB and C++, but since very few Python projects ever come with project files, importing existing code is a very common task. This feature added basic support for Python projects to the existing wizard. The code is from changeset 156ceb1b1949, which was part of the original pull request that was merged into PTVS 1.0.

Adding Python to the list

The default wizard only provides support for C#, VB and C++.

Import Wizard window

Luckily, since Visual Studio is endlessly extensible, these project types are not hardcoded, but are found in the registry. For example, the C# entries are found in the Projects\{FAE04EC0-301F-11d3-BF4B-00C04F79EFBC}\ImportTemplates subkey of the Visual Studio key:

C# ImportTemplates registry entry

The meaning of each value is relatively straightforward, but to summarise:

WizardPageObjectAssembly
Full name of the assembly containing the wizard implementation.
WizardPageObjectClass
Full name of the call implementing the wizard.
ProjectType
The name (or a string resource within the project type’s package) to display in the dropdown box.
ImportProjectsDir
Directory containing the following three files.
ClassLibProjectFile
Filename of the project template for a class library project.
ConsoleAppProjectFile
Filename of the project template for a console application project.
WindowsAppProjectFile
Filename of the project template for a windowed application project.

Those last four items are specific to the Microsoft.VsWizards.ImportProjectFolderWizard.Managed.PageManager class, which is used for C# and VB projects only. The C++ wizard uses a different implementation that is very specific to C++, but the one for C# and VB is far more general and can be easily adapted to Python without requiring any new code (except to set up the registry entries). We could also provide a completely new class, but in the interest of minimising new code, that wasn’t done this time around.

To register the existing importer for Python projects, a ProvideImportTemplatesAttribute was created, since package attributes are how registry keys are added for PTVS (rather than specifying them in the installer). It was applied to the project package as:

[ProvideImportTemplates("#111",
                        PythonConstants.ProjectFactoryGuid,
                        "$PackageFolder$\\Templates\\Projects\\ImportProject",
                        "ImportWinApp.pyproj",
                        "ImportConApp.pyproj",
                        "ImportConApp.pyproj")]

The parameters map very closely to the values shown above for C#. PTVS has no concept of a library project, so we use a standard console project: the wizard implementation is… fragile… and while you could hope that leaving the class library template out would disable the option, in reality is simply causes an exception. Each template was created simply by using the existing template with the initial Program.py removed.

Adding this attribute is sufficient to make “Python Project” (the value of the “#111”) appear in the wizard, but not to actually import the project. This second step required a little more work inside the implementation of the PTVS project system.

The AddFromDirectory method

Among all the classes required to implement the project system, it is very rare to see dead code. Methods are only implemented if they are called, and as it turned out, a method required for importing existing code had never been written. This was the AddFromDirectory method in OAProjectItems.cs.

In brief (not that I can be much more brief than that MSDN page), AddFromDirectory recursively adds a folder and all the files and folders it contains into the project. The directory provided is returned as a new project folder (as if the user had clicked on “New Folder” themselves), which means that this function cannot actually be used to import the entire project. As a workaround, the wizard implementation adds any top-level files directly, then calls AddFromDirectory for each top-level folder.

Implementing AddFromDirectory is very simple, since the existing AddDirectory and AddFromFile methods can be used. The original implementation filtered files to only include those that could be imported from Python (in effect, *.py and *.pyw files in directories containing an __init__.py file), though this was later relaxed based on user feedback.

Problems

While this approach is very simple, it has a number of drawbacks. Those that have already been mentioned include an irrelevant option for a “Class Library” and the inability to filter top-level files and directories. These could be easily resolved by substituting another wizard class, but this is actually the point where the extensibility breaks down.

Import wizards are .NET classes that implement Microsoft.VsWizards.ImportProjectFolderWizard.IPageManager from the Microsoft.VisualStudio.ImportProjectFolderWizard.dll assembly that ships with Visual Studio. Unfortunately, this assembly is not stored anywhere that can be safely referenced without assuming the full VS installation path, cannot be redistributed (and probably cannot be loaded multiple times from different locations anyway), and is not guaranteed to be loaded before another assembly containing wizards. Basically, while it is possible to create a new wizard, it is not possible to safely load that wizard, making this idea pretty useless.

The final problem that was encountered was discoverability. Despite documentation, a video and announcements, it is not easy to find this feature, and even the other languages suffer from this. It would seem that people generally use other ways to open the New Project window (since there are at least four) and so never even see the “From Existing Code” option. In the 1.5 release of PTVS we’ve tried to solve this by providing the wizard as an item in the normal templates list:

Python project templates with PTVS 1.5

Summary

It is possible to extend the “New Project from Existing Code” wizard by providing some template files and registry entries. The project system that the wizard is for has to implement AddFromDirectory on project items, since this is the function that performs most of the work. Unfortunately, extending by providing a new wizard implementation is not possible because of how the dependent assemblies are distributed and loaded. This feature was originally in Python Tools for Visual Studio version 1.0, and was replaced in version 1.5 with a more appropriate and discoverable wizard.

Smart Indentation for Python

The text editor in Visual Studio provides a number of options related to indentation. Apart from the standard tabs/spaces and “how many” options, you can choose between three behaviours: “None”, “Block” and “Smart.”

The 'Tabs' options dialog in Visual Studio 2012

To my knowledge, the “None” mode is very rarely used. When enabled, pressing Enter will move the caret (the proper name for the text cursor) to the first column of the following line. While this is the normal behaviour in a word processor, it does not suit programming quite as well.

“Block” mode is more useful. In this mode, pressing Enter will move the caret to the following line and insert as many spaces/tabs as appeared at the start of the previous line. This ensures that consecutive lines of text all start in the same column by default, which is commonly used as a hint to the programmer that they are part of the same block.

However, the most common (and default) mode is “Smart.” Unlike the other two modes, which are built into the editor, smart indentation is provided by a language service (which are also responsible for providing syntax highlighting and completions). Because they are targeted to a specific language, they can help the programmer by automatically adding and removing extra indentation in ways that make sense.

For example, the smart indentation for C++ will automatically add an indent after an open brace, which begins a new block, and remove an indent for a close brace, which ends the block. Similarly, an indent is added after “if” or “while” statements, as well as the others that support implicit blocks, and removed after the one statement that may follow. In most cases, a programmer can simply continue typing without ever having to manually indent or unindent their code.

This feature has existed since very early in Python Tools for Visual Studio‘s life, but the implementation has changed significantly over time. In this post we will look at the algorithms used in changesets 41aa3fe86341 and 4db951c455d5, as well as the general approach to providing smart indentation from a language service.

ISmartIndentProvider

Since Visual Studio 2010, language services have provided smart indenting by exporting an MEF component implementing ISmartIndentProvider. This interface includes one method, CreateSmartIndent, which returns an object implementing ISmartIndent for a provided text view. ISmartIndent also only includes one method (ignoring Dispose), GetDesiredIndentation, which returns the number of spaces to indent by. VS will convert this to tabs (or a mix of tabs and spaces) depending on the user’s settings.

The entire implementation of these interfaces in PTVS looks like this:

[Export(typeof(ISmartIndentProvider))]
[ContentType(PythonCoreConstants.ContentType)]
public sealed class SmartIndentProvider : ISmartIndentProvider {
    private sealed class Indent : ISmartIndent {
        private readonly ITextView _textView;
 
        public Indent(ITextView view) {
            _textView = view;
        }
 
        public int? GetDesiredIndentation(ITextSnapshotLine line) {
            if (PythonToolsPackage.Instance.LangPrefs.IndentMode == vsIndentStyle.vsIndentStyleSmart) {
                return AutoIndent.GetLineIndentation(line, _textView);
            } else {
                return null;
            }
        }
 
        public void Dispose() { }
    }
 
    public ISmartIndent CreateSmartIndent(ITextView textView) {
        return new Indent(textView);
    }
}

The AutoIndent class referenced in GetDesiredIndentation contains the algorithm for calculating how many spaces are required. Two algorithms for this are described in the following sections, the first using reverse detection and the second using forward detection.

Reverse Indent Detection

This algorithm was used in PTVS up to changeset 41aa3fe86341, which was shortly before version 1.0 was released. It was designed to be efficient by minimising the amount of code scanned, but ultimately got so many corner cases wrong that it was easier to replace it with the newer algorithm. The full source file is AutoIndent.cs.

At its simplest, indent detection in Python is based entirely on the preceding line. The normal case is to copy the indentation from that line. However, if it ends with a colon then we should add one level, since that is how Python starts blocks. Also, we can safely remove one level if the previous line is a return, raise, break or continue statement, since no more of that block will be executed. (As a bonus, we also do this after a pass statement, since its main use is to indicate an empty block.) The complications come when the preceding textual line is not the preceding line of code.

Take the following example:

1
2
if a == 1 and (b == 2 or
               c == 3):

How many spaces should we add for line 3? According to the above algorithm, we’d add 15 plus the user’s indent size setting (for the colon on line 2). This is clearly not correct, since the if statement has 0 leading spaces, but it is the result when applying the simpler algorithm.

Finding the start of an expression is actually such a common task in a language service that PTVS has a ReverseExpressionParser class. It tokenises the source code as normal, but rather than parsing from the start it parses backwards from an arbitrary point. Since the parser state (things like the number of open brackets) is unknown, the class is most useful for identifying the span of a single expression (hence the name).

For smart indentation, the parser is used twice: once to find the start of the expression assuming we’re not inside a grouping (the zero on line 102) and once assuming we are inside a grouping (the one on line 103).The span provided on line 94 is the location of the last token before the caret, which for smart indenting should be an end of line character (that the parser automatically skips).

90
91
92
93
94
95
96
97
98
99
100
101
102
103
// use the expression parser to figure out if we're in a grouping...
var revParser = new ReverseExpressionParser(
        line.Snapshot,
        line.Snapshot.TextBuffer,
        line.Snapshot.CreateTrackingSpan(
            tokens[tokenIndex].Span.Span,
            SpanTrackingMode.EdgePositive
        )
    );
 
int paramIndex;
SnapshotPoint? sigStart;
var exprRangeNoImplicitOpen = revParser.GetExpressionRange(0, out paramIndex, out sigStart, false);
var exprRangeImplicitOpen = revParser.GetExpressionRange(1, out paramIndex, out sigStart, false);

The values of exprRangeNoImplicitOpen and exprRangeImplicitOpen are best described by example. Consider the following code:

1
2
3
4
5
6
7
def f(a, b, c):
    if a == b and b != c:
        return (a +
                b * c
                + 123)
    while a < b:
        a = a + c

When parsing starts at the end of line 4, exprRangeNoImplicitOpen will reference the span containing b * c, since that is a complete expression (remembering that the parser does not know it is inside parentheses). exprRangeImplicitOpen is initialised with one open grouping, so it will reference (a + b * c. However, if we start parsing at the end of line 7, exprRangeNoImplicitOpen will reference a + c while exprRangeImplicitOpen will be null, since an assignment within a group would be an error.

Using the two expressions, we can create a new set of indentation rules:

  • If exprRangeImplicitOpen was found, exprRangeNoImplicitOpen was not (or is different to exprRangeImplicitOpen), and the expression starts with an open grouping ((, [ or {), we are inside a set of brackets.
    • In this case, we match the indentation of the brackets + 1, as on line 2 of the earlier example.
  • Otherwise, if exprRangeNoImplicitOpen was found and it is preceded by a return or raise, break or continue statement, OR if the last token is one of those keywords, the previous line must be one of those statements.
    • In this case, we copy the indentation and reduce it by one level for the following line.
  • Otherwise, if both ranges were found, we have a valid expression on one line and one that spans multiple lines.
    • This occurred in the example shown above.
    • In this case, we find the lowest indentation on any line of the multi-line expression and use that.
  • Otherwise, if the last non-newline character is a colon, we are at the first line of a new block.
    • In this case, we copy the indentation and increase it by one level.

These rules are implemented on lines 105 through 143 of AutoIndent.cs. However, with this approach there are many cases that need special handling. Most of the above ‘rules’ are the result of these being discovered. For example, issue 157 goes through a lot of these edge cases, and while most of them were resolved, it remained a less-than-robust algorithm. The alternative approach, described below, was added to handle most of these issues directly rather than as workarounds.

Forward Indent Detection

This algorithm replaced the reverse algorithm for PTVS 1.0 and has been used since with very minor modifications. It potentially sacrifices some performance in order to obtain more consistent results, as well as being able to support a slightly wider range of interesting rules. The discussion here is based on the implementation as of changeset 4db951c455d5; full source at AutoIndent.cs.

For this algorithm, the reverse expression parser remains but is used slightly differently. Its definition was changed slightly to allow external code to enumerate tokens from it (by adding an IEnumerable<ClassificationSpan> implementation) and its IsStmtKeyword() method was made public. This allows AutoIndent.GetLineIndentation() to perform its own parsing:

109
110
111
112
113
114
115
116
117
118
119
120
121
122
var tokenStack = new System.Collections.Generic.Stack<ClassificationSpan>();
tokenStack.Push(null);  // end with an implicit newline
bool endAtNextNull = false;
 
foreach (var token in revParser) {
    tokenStack.Push(token);
    if (token == null && endAtNextNull) {
        break;
    } else if (token != null &&
       token.ClassificationType == revParser.Classifier.Provider.Keyword &&
       ReverseExpressionParser.IsStmtKeyword(token.Span.GetText())) {
        endAtNextNull = true;
    }
}

The result of this code is a list of tokens leading up to the current location and guaranteed to start from outside any grouping. Depending on the structure of the preceding code, this may result in quite a large list of tokens; only a statement that can never appear in an expression will stop the reverse parse. This specifically excludes if, else, for and yield, which can all appear within expressions, and so all tokens up to the start of the method or class may be included. This is unfortunate, but also required to make guarantees about the parser state without parsing the entire file from the beginning (which is the only other known state).

Since the parser state is known at the first token, we can parse forward and track the indentation level. The algorithm now looks like this as we visit each token in order (by popping them off of tokenStack):

  • At a new line, set the indentation level to however many spaces appear at the start of the line.
  • At an open bracket, set the indentation level to the column of the bracket plus one and remember the previous level.
    • This ensures that if we reach the end of the list while inside the group, our current level lines up with the group and not the start of the line.
  • At a close bracket, restore the previous indentation level.
    • This ensures that whatever indentation occurs within a group, we will use the original indentation for the line following.
  • At a line continuation character (a backslash at the end of a line), skip ahead until the end of the entire line of code.
  • If the token is a statement to unindent after (return and friends), set a flag to unindent.
    • This flag is preserved, restored and reset with the indentation level.
  • If the token is a colon character and we are not currently inside a group, set a flag to add an indent.
    • And, if the following token is not an end-of-line token, also set the unindent flag.

After all tokens have been scanned, we will have the required indentation level and two flags indicating whether to add or remove an indent level. These flags are separate because they may both be set (for example, after a single-line if statement such as if a == b: return False). If they don’t cancel each other out, then an indent should be added or removed to the calculated level to find where the next line should appear:

167
168
169
indentation = current.Indentation +
    (current.ShouldIndentAfter ? tabSize : 0) -
    (current.ShouldDedentAfter ? tabSize : 0);

The implementation of this algorithm uses a LineInfo structure and a stack to manage preserving and restoring state:

44
45
46
47
48
49
50
private struct LineInfo {
    public static readonly LineInfo Empty = new LineInfo();
    public bool NeedsUpdate;
    public int Indentation;
    public bool ShouldIndentAfter;
    public bool ShouldDedentAfter;
}

And the structure of the parsing loop looks like this (edited for length):

while (tokenStack.Count > 0) {
    var token = tokenStack.Pop();
    if (token == null) {
        current.NeedsUpdate = true;
    } else if (token.IsOpenGrouping()) {
        indentStack.Push(current);
        ...
    } else if (token.IsCloseGrouping()) {
        current = indentStack.Pop();
        ...
    } else if (ReverseExpressionParser.IsExplicitLineJoin(token)) {
        ...
    } else if (current.NeedsUpdate == true) {
        current.Indentation = GetIndentation(line.GetText(), tabSize)
        ...
    }
 
    if (ShouldDedentAfterKeyword(token)) {
        current.ShouldDedentAfter = true;
    }
 
    if (token != null && token.Span.GetText() == ":" && indentStack.Count == 0) {
        current.ShouldIndentAfter = true;
        current.ShouldDedentAfter = (tokenStack.Count != 0 && tokenStack.Peek() != null);
    }
}

A significant advantage of this algorithm over the reverse indent detection is its obviousness. It is much easier to follow the code for the parsing loop than to attempt to interpret the behaviour and interactions inherent in the reverse algorithm. Further, modifications can be more easily added because of the clear structure. For example, the current behaviour indents the contents of a grouping to the level of the opening token, but some developers prefer to only add one indent level and no more. With the reverse algorithm, finding the section of code requiring a change is difficult, but the forward algorithm has an obvious code path at the start of groups.

Summary

Smart indentation allows Python Tools for Visual Studio to assist the developer by automatically indenting code to the level it is usually required at. Since Python uses white-space to define scopes, much like C-style languages use braces, this makes writing Python code simpler and can reduce “inconsistent-whitespace” errors. Language services provide this feature by exporting (through MEF) an implementation of ISmartIndentProvider. This post looked at two algorithms for determining the required indentation based on the preceding code, the latter of which shipped with PTVS 1.0.

User-unhandled Exceptions

This feature is one that I added to Python Tools for Visual Studio as an intern in 2011. Those familiar with debugging in Visual Studio will know that if your code throws an exception that is not handled, the debugger breaks at the statement that caused the error:

This feature is known as a user-uncaught exception and is incredibly useful in debugging, since it provides an opportunity to inspect local state before the stack unwinds. It is optional but defaults to on for most exceptions:

The “thrown” column breaks before the system traverses the stack looking for a handler (the “first chance”), while the “user-unhandled” column breaks after traversal if no code chooses to handle it, but before the stack actually begins to unwind (the “second chance”).

In Python, an unhandled exception will unwind the stack immediately, executing finally blocks and otherwise destroying state as it goes, printing a basic trace at the end. In other words, the first traversal step does not exist in Python and there is no second chance. Before adding this feature, PTVS only supported breaking on the first chance:

The aim of this feature was to mimic other languages by simulating the first traversal of exception handlers without modifying the program state. Step one of this task was to enable the UI, step two was to find a way to detect active exception handlers, and step three was to extend the debugger to support the feature.

To follow along at home, the changeset is baff92317760, which I will quote from (and link to) where relevant. (The code has changed again since this commit, but these are the relevant ones for this discussion.) I also recorded a short video around the time this feature was added to demonstrate how it works.

Enabling the UI

Enabling the check boxes for user-unhandled exceptions is simply a case of changing the registration. Each debugging engine can list the exceptions it supports under AD7Metrics\Exception\{engine_guid}\{exception_name}. This allows the user to select whether or not to break on these exceptions, while the debugging engine is still fully responsible for handling the break itself.

Within these keys is a State value that combines values from the
EXCEPTION_STATE enumeration. The values we used are EXCEPTION_STOP_USER_UNCAUGHT and EXCEPTION_JUST_MY_CODE_SUPPORTED (0x4020):

The state value for Python exception ArithmeticError is set to 0x4020

In PTVS, the exceptions are registered using the ProvideDebugException attribute on PythonToolsPackage.cs, which includes a state parameter for setting this value. By modifying ProvideDebugExceptionAttribute.cs we can simply change the default for all of our exceptions:

    _state = (int)(enum_EXCEPTION_STATE.EXCEPTION_JUST_MY_CODE_SUPPORTED | 
                   enum_EXCEPTION_STATE.EXCEPTION_STOP_USER_UNCAUGHT);

After adding this line to the attribute constructor and rebuilding, the dialog has the checkboxes enabled:

However, simply making the option available does not provide any functionality, since it is entirely managed by Visual Studio. The options selected by the user are exposed to the debugger through the IDebugEngine2.SetException method, implemented in AD7Engine.cs. The following changes were made to support both forms of exception handling:

         int IDebugEngine2.SetException(EXCEPTION_INFO[] pException) {
+            bool sendUpdate = false;
             for (int i = 0; i < pException.Length; i++) {
                 if (pException[i].guidType == DebugEngineGuid) {
+                    sendUpdate = true;
                     if (pException[i].bstrExceptionName == "Python Exceptions") {
-                        _defaultBreakOnException = true;
+                        _defaultBreakOnExceptionMode =
+                            (int)(pException[i].dwState & (enum_EXCEPTION_STATE.EXCEPTION_STOP_FIRST_CHANCE | enum_EXCEPTION_STATE.EXCEPTION_STOP_USER_UNCAUGHT));
                     } else {
-                        _breakOnException.Add(pException[i].bstrExceptionName);
+                        _breakOnException[pException[i].bstrExceptionName] = 
+                            (int)(pException[i].dwState & (enum_EXCEPTION_STATE.EXCEPTION_STOP_FIRST_CHANCE | enum_EXCEPTION_STATE.EXCEPTION_STOP_USER_UNCAUGHT));
                     }
                 }
             }
 
-            _process.SetExceptionInfo(_defaultBreakOnException, _breakOnException);
+            if (sendUpdate) {
+                _process.SetExceptionInfo(_defaultBreakOnExceptionMode, _breakOnException);
+            }
             return VSConstants.S_OK;
         }

One trick here is that this method is only called where a setting is different from its default, which means that the debug engine has to be aware of all the default values. For example, we made AttributeError not break by default, since it is often handled in a way we cannot detect, but now SetExceptionInfo only receives a notification for when handling is enabled, and so the debug engine has to be aware that unless it is told otherwise, it should not break on AttributeError. The changes required to make it break when the exception is thrown are discussed later.

Detecting Exception Handlers

As described above (briefly), Python uses a different approach to thrown exceptions than other languages. The main difference is that exception handlers are checked as the stack unwinds, so by the time Python code can be sure that an exception is not handled, the stack is gone, all finally blocks have been run and only the traceback remains. The traceback contains source files and line numbers, but no information about local variables or values. Ideally, the debugger would detect that an exception in unhandled without unwinding and break in before the call stack is destroyed.

Since we are restricted to static analysis, a fallback position is necessary. The implementation assumes that all exceptions are unhandled unless proven otherwise, which makes us more likely to break when an exception occurs. The alternative would result in more stack traces and less breaking. Since the developer is already debugging their code, it is fair to assume that a false positive (breaking when a handler exists) is preferable to a false negative (not breaking when there is no handler).

In terms of implementation, the GetHandledExceptionRanges method in PythonProcess.cs does the main detection work:

309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
internal IList<Tuple<int, int, IList<string>>> GetHandledExceptionRanges(string filename) {
    PythonAst ast;
    TryHandlerWalker walker = new TryHandlerWalker();    var statements = new List<Tuple<int, int, IList<string>>>();
 
    try {
        using (var source = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.Read)) {
            ast = Parser.CreateParser(source, LanguageVersion).ParseFile();
            ast.Walk(walker);
        }
    } catch (Exception ex) {
        Debug.WriteLine("Exception in GetHandledExceptionRanges:");
        Debug.WriteLine(string.Format("Filename: {0}", filename));
        Debug.WriteLine(ex);
        return statements;
    }
 
    foreach (var statement in walker.Statements) {
        int start = statement.GetStart(ast).Line;        int end = statement.Body.GetEnd(ast).Line + 1;        var expressions = new List<string>();
 
        if (statement.Handlers == null) {
            expressions.Add("*");        } else {
            foreach (var handler in statement.Handlers) {
                Expression expr = handler.Test;
                TupleExpression tuple;
                if (expr == null) {
                    expressions.Clear();
                    expressions.Add("*");                    break;
                } else if ((tuple = handler.Test as TupleExpression) != null) {
                    foreach (var e in tuple.Items) {
                        var text = ToDottedNameString(e, ast);
                        if (text != null) {
                            expressions.Add(text);                        }
                    }
                } else {
                    var text = ToDottedNameString(expr, ast);
                    if (text != null) {
                        expressions.Add(text);                    }
                }
            }
        }
 
        if (expressions.Count > 0) {
            statements.Add(new Tuple<int, int, IList<string>>(start, end, expressions));        }
    }
 
 
    return statements;
}

Given a filename, we parse the file and produce a syntax tree. With TryHandlerWalker we simply collect all the try nodes, each of which also contains the list of caught expressions and, importantly, the line numbers of the try statement and the first except statement. This information is sent to the debugger in response to a REQH command (discussed below).

Later, the debugger will parse expressions from the except statements to compare against the thrown exception. Line numbers are sufficient to determine whether code is in a try block (since try and except are only valid at the start of a line) but anything can be specified as the except expression. For example, the following is working (perhaps not valid, and certainly not easy to read) code:

try:
    do_something()
except get_exception_types()[1]:    print("handled")

When we are simulating a stack unwind, we cannot call the function to find out what it returns, firstly because we are not in the correct call frame and secondly because of potential side effects. (The closest thing to a sensible use of a call as an except expression is logging, which would mean that if we called it we would produce spurious messages.)

To handle as many safe cases as possible, the debugger will only look up names and modules. As soon as anything more complicated appears in an expression list, we assume that it does not handle the current exception and keep looking for another handler. Because we have access to the call frames, we are able to look up names in the correct scope. For example, in the following code we will determine that the exception has a handler, even though its name (zde) is defined in a different call frame than where the exception is thrown:

def test_1(x):
    return 100 / x 
def test_2(x):
    zde = ZeroDivisionError    try:
        return test_1(x)
    except zde:        return 0

An issubclass call is used to determine whether the exception will be caught by each possible handler.1 When it returns true (or a plain except: is encountered), execution is continued and the debugger is not notified of the exception. If we reach a call frame that does not have code available (meaning we can’t get the handlers) or that belongs to the debugger itself, we assume the exception is unhandled and break.

1Of course, this means that arbitrary code could still be executed in the form of a __subclasscheck__ method. This method should not have side effects anyway, so all we’d really achieve by calling it is to expose a bug earlier.

Extending the Debugger

In order for the debugger to support the updated exception handling mechanism, we have to add the new REQH (REQuest Handlers) and SEHI (Set Exception Handler Info) commands. Commands are used for stateless communicate between the Python process being debugged and the Visual Studio instance doing the debugging. REQH will be sent from the debuggee to request the list of exception handlers in a particular source file and SEHI is sent with the response. PythonProcess.cs contains the handling for REQH events:

266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
switch (CommandtoString(cmd_buffer)) {
    case "EXCP": HandleException(socket); break;
    case "BRKH": HandleBreakPointHit(socket); break;
    case "NEWT": HandleThreadCreate(socket); break;
    case "EXTT": HandleThreadExit(socket); break;
    case "MODL": HandleModuleLoad(socket); break;
    case "STPD": HandleStepDone(socket); break;
    case "EXIT": HandleProcessExit(socket); return;
    case "BRKS": HandleBreakPointSet(socket); break;
    case "BRKF": HandleBreakPointFailed(socket); break;
    case "LOAD": HandleProcessLoad(socket); break;
    case "THRF": HandleThreadFrameList(socket); break;
    case "EXCR": HandleExecutionResult(socket); break;
    case "EXCE": HandleExecutionException(socket); break;
    case "ASBR": HandleAsyncBreak(socket); break;
    case "SETL": HandleSetLineResult(socket); break;
    case "CHLD": HandleEnumChildren(socket); break;
    case "OUTP": HandleDebuggerOutput(socket); break;
    case "REQH": HandleRequestHandlers(socket); break;    case "DETC": _process_Exited(this, EventArgs.Empty); break;
}
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
private void HandleRequestHandlers(Socket socket) {
    string filename = socket.ReadString(); 
    Debug.WriteLine("Exception handlers requested for: " + filename);
    var statements = GetHandledExceptionRanges(filename); 
    _socket.Send(SetExceptionHandlerInfoCommandBytes);    SendString(_socket, filename);
 
    _socket.Send(BitConverter.GetBytes(statements.Count));
 
    foreach (var t in statements) {
        _socket.Send(BitConverter.GetBytes(t.Item1));
        _socket.Send(BitConverter.GetBytes(t.Item2));
 
        foreach (var expr in t.Item3) {
            SendString(_socket, expr);
        }
        SendString(_socket, "-");
    }
}

The GetHandledExceptionRanges method was shown above; HandleRequestHandlers calls this method and transmits the line numbers and exception types to the debuggee. The implementation on the debuggee side is in visualstudio_py_debugger.py. The handling for the SEHI command looks like this:

667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
self.command_table = {
    cmd('exit') : self.command_exit,
    cmd('stpi') : self.command_step_into,
    cmd('stpo') : self.command_step_out,
    cmd('stpv') : self.command_step_over,
    cmd('brkp') : self.command_set_breakpoint,
    cmd('brkc') : self.command_set_breakpoint_condition,
    cmd('brkr') : self.command_remove_breakpoint,
    cmd('brka') : self.command_break_all,
    cmd('resa') : self.command_resume_all,
    cmd('rest') : self.command_resume_thread,
    cmd('exec') : self.command_execute_code,
    cmd('chld') : self.command_enum_children,
    cmd('setl') : self.command_set_lineno,
    cmd('detc') : self.command_detach,
    cmd('clst') : self.command_clear_stepping,
    cmd('sexi') : self.command_set_exception_info,
    cmd('sehi') : self.command_set_exception_handler_info,}
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
def command_set_exception_handler_info(self):
    try:
        filename = read_string(self.conn) 
        statement_count = read_int(self.conn)
        handlers = []
        for _ in xrange(statement_count):
            line_start, line_end = read_int(self.conn), read_int(self.conn) 
            expressions = set()
            text = read_string(self.conn).strip()
            while text != '-':
                expressions.add(text)
                text = read_string(self.conn)
 
            if not expressions:
                expressions = set('*')
 
            handlers.append((line_start, line_end, expressions)) 
        BREAK_ON.handler_cache[filename] = handlers    finally:
        BREAK_ON.handler_lock.release()

Notice that the filename is sent both ways. Because the protocol is stateless, the debuggee cannot automatically associate the response with its request. In practice, it will block until it receives the response it expects, but it would require no protocol modifications to support preemptively sending SEHI commands. To avoid parsing source files repeatedly, the debuggee caches the handler lists, which are stored as tuples containing the same information found by GetHandledExceptionRanges.

Breaking on an exception is relatively straightforward. The handler for when an exception is thrown already had a ShouldBreak test, which was changed to check the mode for the exception as, if necessary, whether any handlers exist.

358
359
360
def handle_exception(self, frame, arg):
    if frame.f_code.co_filename != __file__ and BREAK_ON.ShouldBreak(self, *arg):        self.block(lambda: report_exception(frame, arg, self.id))
-    def ShouldBreak(self, name):
-        return self.break_always or name in self.break_on
 
+    def ShouldBreak(self, thread, ex_type, ex_value, trace):
+        name = ex_type.__module__ + '.' + ex_type.__name__
+        mode = self.break_on.get(name, self.default_mode)
+        return (bool(mode & BREAK_MODE_ALWAYS) or+                (bool(mode & BREAK_MODE_UNHANDLED) and not self.IsHandled(thread, ex_type, ex_value, trace)))

The IsHandled method performs the call-stack traversal described earlier, returning False at the first frame that has no code available or True at the first frame that has a handler matching the exception type:

    # Edited for length
    def IsHandled(self, thread, ex_type, ex_value, trace):
        ... 
        cur_frame = trace.tb_frame
 
        while should_send_frame(cur_frame) and cur_frame.f_code.co_filename is not None:
            handlers = self.handler_cache.get(cur_frame.f_code.co_filename)
 
            if handlers is None:
                # get handlers, or assume unhandled and return False
                ...
 
            line = cur_frame.f_lineno
            for line_start, line_end, expressions in handlers:
                if line_start <= line < line_end:
                    if '*' in expressions:
                        return True
 
                    for text in expressions:
                        res = lookup_local(cur_frame, text)
                        if res is not None and issubclass(ex_type, res):
                            return True
 
            cur_frame = cur_frame.f_back
 
        return False

The search starts from the first frame in the traceback, which is the frame where the exception was caused, and goes backwards through f_back (that is, towards the caller) searching each file’s handler list. If no handlers are cached, they are requested from the debugger, and the linear search and use of line ranges ensures the correct behaviour for nested try blocks.

Summary

This feature added the ability for Python Tools for Visual Studio to detect and break on unhandled exceptions without unwinding the stack. This allows developers to inspect the state of their program and not just the stack trace. Handler detection is based on simple source code analysis, performed when an exception has been thrown, and assumes that if it can’t guarantee that the exception has a handler then it should break. It was first released in PTVS 1.0, August 2011.