Case-insensitive Select-Xml

Internet-xmlPowerShell is not case sensitive (most of the time). There are two situation when it is: you explicitly request it (e.g. using operators like –cmatch) or we depend on something where there is no easy way to turn it off. Select-Xml is using XPath and falls in the second category. Whenever I need to query large XML file I prefer to start with more general queries, and there is very few things more frustrating than being forced to remember case of e.g. attribute names. There are two options to go about it.

First of all we can do it easy way: change case of the whole string that defines XML, and be sure, that everything is either lower or upper case. Problem with this approach is the fact, that result we get back is different from original, and we are bit “stuck” with “broken” XML.

Second option is to use XPath own tools to get similar results, and has advantage of getting “proper” XML data back. Using Select-Xml also enables modifying XML and saving under different name.

I will use file named “Template.xml”, that is just template for unattend.xml (used for unattended OS deployment) to run my queries – more practical than some custom generated useless xml document.

XPath 101

When working with Select-Xml there are two things you have to remember:

  • we are talking XPath 1.0 (that is somewhat limiting)
  • we have to take care of XML namespaces, including default one

What is (default) namespace? You will see it in any “real” XML document. It’s a reason why most simple Select-Xml examples simply “don’t work” when you try to apply them to real-life examples. Defining namespace(s) is what you have to do first:

Select-Xml-Namespaces

As you can see: even though Node is named “component” we had to define namespace “u” first, and in actual XPath refer to the name using namespace:name syntax. This is very basic XPath: we don’t filter anything here. Lets try to filter something next.

XPath filters

Syntax to filter elements using XPath is relatively simple, but it’s best explained using examples. We already listed components, but now we care only about elements that contain node “UserLocale”. We have to remember that we are again looking for elements in default namespace (that we map to “u” prefix):

Select-Xml @Param //u:component[u:UserLocale]

As you can see filter is wrapped in square brackets. We can always compare values with ones that meet our expectations:

Select-Xml @Param "//u:component[@processorArchitecture = 'amd64']"

Filters lets you use functions that can make filtering easier. The one I use most often is contains(). It takes two arguments: string value, and substring that should be found in it. I’m a lousy and typing things, so anything that gives me possibility to type less is more than welcome. Also: filters can be nested. “Internal” filter can give us list of elements, that “external” filter will use to check condition. Merging those two techniques together we will look for any component node that has any attribute (@*) with name containing string ‘version’, with value that contains string ‘non’:

Select-Xml @Param @"
    //u:component[
        contains(
            @*[
                contains(
                    name(),
                    'version'
                )
            ],
            'non'
        )
    ]
"@            

That’s just basic examples, but should give you idea how XML filtering works.

Who cares about case?

So far all our queries were case-sensitive. If you ignore the case of node name, attribute name or value in your XPath queries you won’t get expected results back. But what if you don’t know what case is used? Or you want to get all results, regardless of the case used? That’s where translate() comes in handy. This function takes three arguments: string, that will be modified, string with list of characters that will be replaced, and string with list of characters that they will be replaced with. Again, example should make it more clear: we want to see all nodes with value equal to ‘monad’, regardless of the case:

Select-Xml @Param -XPath @"
    //*[
        contains(
            translate(
                text(),
                'ADMNO',
                'admno'
            ),
            'monad'
        )
    ]
"@            

And the result I got:

SelectXml-With-Translate

As you can see: 2nd and 3rd arguments are almost identical: the only difference is the case used. Another thing to note: order of letters is not important. Finally, if we would want to remove any characters we just have to list those last in second argument, and don’t provide characters to replace them with. And what if we want to do the same with attribute values? Suppose I want to find any node, with attribute that has value “nonsxs”. We will use nested filter again:

Select-Xml @Param -XPath @"
    //*[
        @*[
            contains(
                translate(
                    .,
                    'S',
                    's'
                ),
                'nonsxs'
            )
        ]
    ]
"@            

And what we want to apply same logic to attribute names? E.g. I want to see any node with attribute which name contains “key”. First code:

Select-Xml @Param -XPath @"
    //*[
        @*[
            contains(
                translate(
                    name(.),
                    'KEY',
                    'key'
                ),
                'key'
            )
        ]
    ]
"@ | Format-Table -AutoSize @{            
    Name = 'Name'            
    Expression = {            
        $_.Node.LocalName #.LocalName            
    }            
}, @{            
    Name = 'Attributes'            
    Expression = {            
        $_.Node.PSObject.Properties.Name -join ', '            
    }            
}            

And results:

Select-Xml-Attribute-Names

Finally: situation when we want to work on any node that has name that contains certain substring (regardless of the case used in it). This time we will also modify value of this node and save resulting document in new file:

Select-Xml @Param -XPath @"
    //*[
        contains(
            translate(
                name(),
                'PAS',
                'pas'
            ),
            'pass'
        )
    ]
            
"@ | ForEach-Object -Process {            
    $_.Node.InnerText = 'MySecretPass'            
    $Node = $_            
} -End {            
    $Node.Node.OwnerDocument.Save("$PWD\Updated.xml")            
}

We can check if we succeeded by changing Path that our Select-Xml queries point to and try to look for any node that has value “MySecretPass”. Result:

Select-Xml-Query-For-Node-Value

Getting to know all those techniques made my life a lot easier, when I needed to analyze a lot of XML data. This data was export from tool we use at work, and using all those filters I was able to gather a lot of useful statistics for myself and my colleagues. Doing the same with regex was probably possible too, but I prefer to pick the right tool for a job: XML is just not designed for regular expressions, and is structured, something that regular expressions are not really able to pick up.

Advertisements

One thought on “Case-insensitive Select-Xml

  1. Pingback: PowerShell Magazine | IT Pro PowerShell experience

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s