Matevž Gačnik's Weblog

WCF: Passing Collections Through Service Boundaries, Why and How

In WCF, collection data that is passed through the service boundary goes through a type filter - meaning you will not necessarily get the intrinsic service side type on the client, even if you're expecting it.

No matter if you throw back an int[] or List<int>, you will get the int[] by default on the client.

The main reason is that there is no representation for System.Collections.Generic.List or System.Collection.Generic.LinkedList in service metadata. The concept of System.Collection.Generic.List<int> for example, actually does not have a different semantic meaning from an integer array - it's still a list of ints - but will allow you to program against it with ease.

Though, if one asks nicely, it is possible to guarantee the preferred collection type on the client proxy in certain scenarios.

Unidimensional collections, like List<T>, LinkedList<T> or SortedList<T> are always exposed as T arrays in the client proxy. Dictionary<K, V>, though, is regened on the client via an annotation hint in WSDL (XSD if we are precise). More on that later.

Let's look into it.

WCF infrastructure bends over backwards to simplify client development. If the service side contains a really serializable collection (marked with [Serializable], not [DataContract]) that is also concrete (not an interface), and has an Add method with the following signatures...

public void Add(object obj);
public void Add(T item);

... then WCF will serialize the data to an array of the collections type.

Too complicated? Consider the following:

[ServiceContract]
interface ICollect
{
[OperationContract]
public void AddCoin(Coin coin);

[OperationContract]
public List<Coin> GetCoins();
}

Since the List<T> supports a void Add<T> method and is marked with [Serializable], the following wire representation will be passed to the client:

[ServiceContract]
interface ICollect
{
[OperationContract]
void AddCoin(Coin coin);

[OperationContract]
Coin[] GetCoins();
}

Note: Coin class should be marked either with a [DataContract] or [Serializable] in this case.

So what happens if one wants the same contract on the client proxy and the service? There is an option in the WCF proxy generator, svcutil.exe to force generation of class definitions with a specific collection type.

Use the following for List<T>:

svcutil.exe http://service/metadata/address
/collectionType:System.Collections.Generic.List`1

Note: List`1 uses back quote, not normal single quote character.

What the /collectionType (short /ct) does, is forces generation of strongly typed collection types. It will generate the holy grail on the client:

[ServiceContract]
interface ICollect
{
[OperationContract]
void AddCoin(Coin coin);

[OperationContract]
List<Coin> GetCoins();
}

In Visual Studio 2008, you will even have an option to specify which types you want to use as collection types and dictionary collection types, as in the following picture:

On the other hand, dictionary collections, as in System.Collections.Generic.Dictionary<K, V> collections, will go through to the client no matter what you specify as a /ct parameter (or don't at all).

If you define the following on the service side...

[OperationContract]
Dictionary<string, int> GetFoo();

... this will get generated on the client:

[OperationContract]
Dictionary<string, int> GetFoo();

Why?

Because using System.Collections.Generic.Dictionary probably means you know there is no guarantee that client side representation will be possible if you are using an alternative platform. There is no way to meaningfully convey the semantics of a .NET dictionary class using WSDL/XSD.

So, how does the client know?

In fact, the values are serialized as joined name value pair elements as the following schema says:

<xs:complexType name="ArrayOfKeyValueOfstringint">
<xs:annotation>
    <xs:appinfo>
      <IsDictionary
        xmlns="http://schemas.microsoft.com/2003/10/Serialization/">
        true
      </IsDictionary>
    </xs:appinfo>
</xs:annotation>
<xs:sequence>
    <xs:element minOccurs="0" maxOccurs="unbounded"
      name="KeyValueOfstringint">
      <xs:complexType>
        <xs:sequence>
          <xs:element name="Key" nillable="true" type="xs:string" />
          <xs:element name="Value" type="xs:int" />
        </xs:sequence>
      </xs:complexType>
    </xs:element>
</xs:sequence>
</xs:complexType>
<xs:element name="ArrayOfKeyValueOfstringint"
nillable="true" type="tns:ArrayOfKeyValueOfstringint" />

Note: You can find this schema under types definition of the metadata endpoint. Usually ?xsd=xsd2, instead of ?wsdl will suffice.

As in:

The meaningful part of type service-to-client-transportation resides in <xs:annotation> element, specifically in /xs:annotation/xs:appinfo/IsDictionary element, which defines that this complex type represents a System.Collections.Generic.Dictionary class. Annotation elements in XML Schema are parser specific and do not convey any structure/data type semantics, but are there for the receiver to interpret.

This must be one of the most excellent school cases of using XML Schema annotations. It allows the well-informed client (as in .NET client, VS 2008 or svcutil.exe) to utilize the semantic meaning if it understands it. If not, no harm is done since the best possible representation, in a form of joined name value pairs still goes through to the client.

Categories: .NET 3.0 - WCF | .NET 3.5 - WCF | Web Services

Thursday, 27 September 2007 22:04:47 (Central Europe Standard Time, UTC+01:00)

Comments

Approaches to Document Style Parameter Models

I'm a huge fan of document style parameter models when implementing a public, programmatic façade to a business functionality that often changes.

public interface IDocumentParameterModel
{
   [OperationContract]
   [FaultContract(typeof(XmlInvalidException))]
   XmlDocument Process(XmlDocument doc);
}

This contract defines a simple method, called Process, which processes the input document. The idea is to define the document schema and validate inbound XML documents, while throwing exceptions on validation errors. The processing semantics is arbitrary and can support any kind of action, depending on the defined invoke document schema.

A simple instance document which validates against a version 1.0 processing schema could look like this:

<?xml version="1.0?>
<Process xmlns="http://www.gama-system.com/process10.xsd" version="1.0">
   <Instruction>Add</Instruction>
   <Parameter1>10</Parameter1>
   <Parameter2>21</Parameter2>
</Process>

Another processing instruction, supported in version 1.1 of the processing schema, with different semantics could be:

<?xml version="1.0?>
<Process xmlns="http://www.gama-system.com/process11.xsd" version="1.1">
<Instruction>Store</Instruction>
<Content>77u/PEFwcGxpY2F0aW9uIHhtbG5zPSJod...mdVcCI</Content>
</Process>

Note that the default XML namespace changed, but that is not a norm. It only allows you to automate schema retrieval using the schema repository (think System.Xml.Schema.XmlSchemaSet), load all supported schemas and validate automatically.

public class ProcessService : IDocumentParameterModel
{
   public XmlDocument Process(XmlDocument doc)
   {
      XmlReaderSettings sett = new XmlReaderSettings();

      sett.Schemas.Add(<document namespace 1>, <schema uri 1>);
      ...
      sett.Schemas.Add(<document namespace n>, <schema uri n>);

      sett.ValidationType = ValidationType.Schema;
      sett.ValidationEventHandler += new
         ValidationEventHandler(XmlInvalidHandler);
      XmlReader books = XmlReader.Create(doc.OuterXml, sett);
      while (books.Read()) { }

      // processing goes here
      ...
   }

   static void XmlInvalidHandler(object sender, ValidationEventArgs e)
   {
      if (e.Severity == XmlSeverityType.Error)
         throw new XmlInvalidException(e.Message);
   }
}

The main benefit of this approach is decoupling the parameter model and method processing version from the communication contract. A service maintainer has an option to change the terms of processing over time, while supporting older version-aware document instances.

This notion is of course most beneficial in situations where your processing syntax changes frequently and has complex validation schemas. A simple case presented here is informational only.

So, how do we validate?

We need to check the instance document version first. This is especially true in cases where the document is not qualified with a different namespace when the version changes.
We grab the appropriate schema or schema set
We validate the inbound XML document, throw a typed XmlInvalidException if invalid
We process the call

The service side is quite straightforward.

Let's look at the client and what are the options for painless generation of service calls using this mechanism.

Generally, one can always produce an instance invoke document by hand on the client. By hand meaning using System.Xml classes and DOM concepts. Since this is higly error prone and gets tedious with increasing complexity, there is a notion of a schema compiler, which automatically translates your XML Schema into the CLR type system. Xsd.exe and XmlSerializer are your friends.

If your schema requires parts of the instance document to be digitally signed or encrypted, you will need to adorn the serializer output with some manual DOM work. This might also be a reason to use the third option.

The third, and easiest option for the general developer, is to provide a local object model, which serializes the requests on the client. This is an example:

ProcessInstruction pi = new ProcessInstruction();
pi.Instruction = "Add";
pi.Parameter1 = 10;
pi.Parameter2 = 21;
pi.Sign(cert); // pi.Encrypt(cert);
pi.Serialize();
proxy.Process(pi.SerializedForm);

The main benefit of this approach comes down to having an option on the server and the client. Client developers have three different levels of complexity for generating service calls. The model allows them to be as close to the wire as they see fit. Or they can be abstracted completely from the wire representation if you provide a local object model to access your services.

Categories: .NET 3.0 - WCF | .NET 3.5 - WCF | Architecture | Web Services | XML

Monday, 24 September 2007 11:19:10 (Central Europe Standard Time, UTC+01:00)

Comments

XmlSerializer, Ambient XML Namespaces and Digital Signatures

If you use XmlSerializer type to perform serialization of documents which are digitally signed later on, you should be careful.

XML namespaces which are included in the serialized form could cause trouble for anyone signing the document after serialization, especially in the case of normalized signature checks.

Let's go step by step.

Suppose we have this simple schema, let's call it problem.xsd:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema targetNamespace="http://www.gama-system.com/problems.xsd"
           elementFormDefault="qualified"
           xmlns="http://www.gama-system.com/problems.xsd"
           xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Problem" type="ProblemType"/>
<xs:complexType name="ProblemType">
    <xs:sequence>
      <xs:element name="Name" type="xs:string" />
      <xs:element name="Severity" type="xs:int" />
      <xs:element name="Definition" type="DefinitionType"/>
      <xs:element name="Description" type="xs:string" />
    </xs:sequence>
</xs:complexType>
<xs:complexType name="DefinitionType">
    <xs:simpleContent>
      <xs:extension base="xs:base64Binary">
        <xs:attribute name="Id" type="GUIDType" use="required"/>
      </xs:extension>
    </xs:simpleContent>
</xs:complexType>
<xs:simpleType name="GUIDType">
    <xs:restriction base="xs:string">
      <xs:pattern value="Id-[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-
                         [0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}"/>
    </xs:restriction>
</xs:simpleType>
</xs:schema>

This schema describes a problem, which is defined by a name (typed as string), severity (typed as integer), definition (typed as byte array) and description (typed as string). The schema also says that the definition of a problem has an Id attribute, which we will use when digitally signing a specific problem definition. This Id attribute is defined as GUID, as the simple type GUIDType defines.

Instance documents validating against this schema would look like this:

<?xml version="1.0"?>
<Problem xmlns="http://www.gama-system.com/problems.xsd">
<Name>Specific problem</Name>
<Severity>4</Severity>
<Definition Id="c31dd112-dd42-41da-c11d-33ff7d2112s2">MD1sDQ8=</Definition>
<Description>This is a specific problem.</Description>
</Problem>

Or this:

<?xml version="1.0"?>
<Problem xmlns="http://www.gama-system.com/problems.xsd">
<Name>XML DigSig Problem</Name>
<Severity>5</Severity>
<Definition Id="b01cb152-cf93-48df-b07e-97ea7f2ec2e9">CgsMDQ8=</Definition>
<Description>Ambient namespaces break digsigs.</Description>
</Problem>

Mark this one as exhibit A.

Only a few of you out there are still generating XML documents by hand, since there exists a notion of schema compilers. In the .NET Framework world, there is xsd.exe, which bridges the gap between the XML type system and the CLR type system.

xsd.exe /c problem.xsd

The tool compiles problem.xsd schema into the CLR type system. This allows you to use in-schema defined classes and serialize them later on with the XmlSerializer class. The second instance document (exhibit A) serialization program would look like this:

// generate problem
ProblemType problem = new ProblemType();
problem.Name = "XML DigSig Problem";
problem.Severity = 5;
DefinitionType dt = new DefinitionType();
dt.Id = Guid.NewGuid().ToString();
dt.Value = new byte[] { 0xa, 0xb, 0xc, 0xd, 0xf };
problem.Definition = dt;
problem.Description = "Ambient namespaces break digsigs.";

// serialize problem
XmlSerializer ser = new XmlSerializer(typeof(ProblemType));
FileStream stream = new FileStream("Problem.xml", FileMode.Create, FileAccess.Write);
ser.Serialize(stream, problem);
stream.Close();

Here lie the dragons.

XmlSerializer class default serialization mechanism would output this:

Mark this one as exhibit B.

If you look closely, you will notice two additional prefix namespace declarations in exhibit B bound to xsi and xsd prefixes, against exhibit A.

The fact is, that both documents (exhibit B, and exhibit A) are valid against the problem.xsd schema.

Prefixed namespaces are part of the XML Infoset. All XML processing is done on XML Infoset level. Since only declarations (look at prefixes xsi and xsd) are made in exhibit B, the document itself is not semantically different from exhibit A. That stated, instance documents are equivalent and should validate against the same schema.

</theory>

What happens if we sign the Definition element of exhibit B (XmlSerializer generated, prefixed namespaces present)?

We get this:

<?xml version="1.0"?>
<Problem xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xmlns:xsd="http://www.w3.org/2001/XMLSchema"
         xmlns="http://www.gama-system.com/problems.xsd">
<Name>XML DigSig Problem</Name>
<Severity>5</Severity>
<Definition Id="b01cb152-cf93-48df-b07e-97ea7f2ec2e9">CgsMDQ8=</Definition>
<Description>Ambient namespaces break digsigs.</Description>
<Signature xmlns="http://www.w3.org/2000/09/xmldsig#">
    <SignedInfo>
      <CanonicalizationMethod Algorithm="http://www.w3.org/TR/...20010315" />
      <SignatureMethod Algorithm="http://www.w3.org/...rsa-sha1" />
      <Reference URI="#Id-b01cb152-cf93-48df-b07e-97ea7f2ec2e9">
        <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1" />
        <DigestValue>k3gbdFVJEpv4LWJAvvHUZZo/VUQ=</DigestValue>
      </Reference>
    </SignedInfo>
    <SignatureValue>K8f...p14=</SignatureValue>
    <KeyInfo>
      <KeyValue>
        <RSAKeyValue>
          <Modulus>eVs...rL4=</Modulus>
          <Exponent>AQAB</Exponent>
        </RSAKeyValue>
      </KeyValue>
      <X509Data>
        <X509Certificate>MIIF...Bw==</X509Certificate>
      </X509Data>
    </KeyInfo>
</Signature>
</Problem>

Let's call this document exhibit D.

This document is the same as exhibit B, but has the Definition element digitally signed. Note the /Problem/Signature/SingedInfo/Reference[@URI] value. Digital signature is performed only on the Definition element and not the complete document.

Now, if one would validate the same document without the prefixed namespace declarations, as in:

<?xml version="1.0"?>
<Problem xmlns="http://www.gama-system.com/problems.xsd">
<Name>XML DigSig Problem</Name>
<Severity>5</Severity>
<Definition Id="b01cb152-cf93-48df-b07e-97ea7f2ec2e9">CgsMDQ8=</Definition>
<Description>Ambient namespaces break digsigs.</Description>
<Signature xmlns="http://www.w3.org/2000/09/xmldsig#">
...
</Signature>
</Problem>

... the signature verification would fail. Let's call this document exhibit C.

As said earlier, all XML processing is done on the XML Infoset level. Since ambient prefixed namespace declarations are visible in all child elements of the declaring element, exhibits C and D are different. Explicitly, element contexts are different for element Definition, since exhibit C does not have ambient declarations present and exhibit D does. The signature verification fails.

</theory>

Solution?

Much simpler than what's written above. Force XmlSerializer class to serialize what should be serialized in the first place. We need to declare the namespace definition of the serialized document and prevent XmlSerializer to be too smart. The .NET Framework serialization mechanism contains a XmlSerializerNamespaces class which can be specified during serialization process.

Since we know the only (and by the way, default) namespace of the serialized document, this makes things work out OK:

// serialize problem
XmlSerializerNamespaces xsn = new XmlSerializerNamespaces();
xsn.Add(String.Empty, "http://www.gama-system.com/problem.xsd");

XmlSerializer ser = new XmlSerializer(typeof(ProblemType));
FileStream stream = new FileStream("Problem.xml", FileMode.Create, FileAccess.Write);
ser.Serialize(stream, problem, xsn);
stream.Close();

This will force XmlSerializer to produce a valid document - with valid XML element contexts, without any ambient namespaces.

The question is, why does XmlSerialzer produce this namespaces by default? That should be a topic for another post.

Categories: CLR | XML

Wednesday, 19 September 2007 21:57:57 (Central Europe Standard Time, UTC+01:00)

Comments

Technology Philanthropy