⛵ Argo

Version 1.2.0. Compatible with GraphQL October 2021 Edition.

Argo is a compact and compressible binary serialization format for GraphQL. It aims to:

  • Minimize end-to-end latency of GraphQL responses
    • Including serialization, transport, and deserialization
  • Minimize bytes on the wire, with and without external compression
  • Be easy to implement

Argo:

  • Takes the place of JSON in GraphQL responses
  • Usually meets the needs of mobile clients (and server clients) better than web clients
  • Works best with code generation, but also works well with interpretation
  • Does not currently support GraphQL Input types

Compressed Argo responses are typically 5%-15% smaller than corresponding compressed JSON responses.

Uncompressed Argo responses are typically 50-80% smaller than corresponding JSON responses.

Introduction

This document defines Argo. It is intended to be the authoritative specification. Implementations of Argo must adhere to this document.

Design notes and motivations are included, but these sections do not specify necessary technical details.

1Overview

Argo is designed to work with GraphQL queries and schemas which are known in advance, but executed many times.

In advance, Argo walks over a given GraphQL query against a particular GraphQL schema and generates a Wire schema which captures type and serialization information which will be used when [de]serializing a response. Later, when serializing a GraphQL response, Argo relies on this and does not need to send this information over the network—this reduces the payload size. Similarly when deserializing, each client relies on the Wire schema to read the message.

The serialization format itself is a compact binary format which uses techniques to minimize the payload size, make it unusually compressible (to further reduce the payload size over the network), permit high-performance implementations, and remain relatively simple. These techniques include:

Argo separates its work into two phases: Registration Time and Execution Time.

Registration Time happens once, before a payload is serialized with Argo. On a mobile client, Registration Time is typically compile time. On a server, Registration Time is typically when a query is registered by a client application (perhaps whenever a new client version is compiled).

At Registration Time, Argo generates a Wire schema on both the server and the client. Optionally, code may be generated at Registration Time (based on the Wire schema) to decode messages. See Creating a Wire schema.

Execution Time happens many times, whenever a payload is [de]serialized with Argo.

At Execution Time, Argo relies on the Wire schema (or code previously generated based on it) to read a binary message. This only works because information was collected previously during Registration Time. See Binary encoding.

Note Nothing prevents running Registration Time and Execution Time steps at Execution Time. However, this is a low-performance pattern, and is most likely only useful during development.

2Types

GraphQL uses two type systems:

Argo uses these and introduces:

2.1GraphQL types

GraphQL types refers to the types defined in the GraphQL spec. Briefly, these are:

  • Scalar, Enum, Object, Input Object, Interface, Union, List
  • Scalar includes, at minimum: Int, Float, String, Boolean, ID

GraphQL also allows for custom scalars. Argo supports this, though an @ArgoCodec directive is required to tell Argo how to represent it on the wire.

GraphQL response types refers to the serialization types sketched in the GraphQL spec. These do not have rigorous definitions in the GraphQL spec. These include (but are not limited to):

  • Map, List, String, Null
  • Optionally, Boolean, Int, Float, and Enum Value

2.1.1GraphQL input types

GraphQL also includes Input types, which encode arguments and requests. Presently, Argo does not specify how to encode GraphQL input types because the expected benefits are small. However, this is a natural, straightforward, and backwards-compatible extension which may be added in a future version. See #4 for more discussion.

2.2Wire types

Wire types are used by Argo to encode GraphQL values as bytes.

Argo uses the following named Wire types:

  • STRING: a UTF-8 string
  • BOOLEAN: true or false
  • VARINT: a variable-length integer
  • FLOAT64: an IEEE 754 double-precision binary floating-point (binary64)
  • BYTES: a variable-length byte string
  • FIXED: a fixed-length byte string
  • RECORD: a selection set, made up of potentially-omittable named fields and their values
  • ARRAY: a variable-length list of a single type
  • BLOCK: describes how to store the underlying type in a block of value data
  • NULLABLE: may be null or have a value
  • DESC: a self-describing type

2.2.1Self-describing types

Self-describing types are used by Argo to encode values with types which are not known in advance (e.g. at Registration Time). Primarily, this is required to comply with the GraphQL specification on Errors.

Argo uses the following named self-describing types:

  • Null: marks that a value is not present (like null in JSON)
  • Object: an object, made up of named fields and their values
  • List: a variable-length list of (potentially) mixed types
  • String: a UTF-8 string
  • Bytes: a variable-length byte string
  • Boolean: true or false
  • Int: a variable-length integer
  • Float: an IEEE 754 double-precision binary floating-point (binary64)

3Wire schema

GraphQL uses a Schema to capture the names and types of data. GraphQL queries select a portion of data from that Schema. Given a Schema and a query on it, Argo produces a Wire schema with all the information needed to [de]serialize data powering that query against a compatible GraphQL Schema.

Note In a client-server architecture, GraphQL schemas frequently change on the server according to certain compatibility rules. Therefore, while Argo cannot assume the Schema used for serializing data is the Schema used for deserializing, it does assume they are compatible with each other (i.e. no breaking changes have been made).

The wire schema is a single WireType which describes how to encode an entire payload:

WireType
STRING()|BOOLEAN()|VARINT()|FLOAT64()|BYTES()|DESC()|PATH()
FIXED(lengthInBytes: Int)
RECORD(fields: Field[])whereField(name: String, of: WireType, omittable: Boolean)
ARRAY(of: WireType)
BLOCK(of: WireType, key: String, dedupe: Boolean)
NULLABLE(of: WireType)

3.1Wire schema serialization

A Wire schema or WireType may be serialized in JSON. It must be a JSON object of the form: {"type": "typeName" ...attributes...} where typeName is one of the Wire types as a string, and the attributes are as sketched in WireType above.

Example № 1{
  "type": "RECORD",
  "fields": [
    {
      "name": "errors",
      "of": {
        "type": "NULLABLE",
        "of": { "type": "ARRAY", "of": { "type": "DESC" } }
      },
      "omittable": true
    }
  ]
}

This serialization may be helpful to avoid recomputing Wire schemas on the server. Other serializations may be more efficient but are out of scope here.

4Creating a Wire schema

Argo is designed to work with GraphQL queries and schemas which are known in advance. It generates a description of the types which may be returned by a query (one time), and uses this to serialize or deserialize (many times). This description is called a Wire schema.

It is helpful to first describe this process informally. Each selection set is walked over with the corresponding GraphQL schema easily available. Selection sets and selections are handled differently.

Note Though it seems more efficient to represent Enums as VARINT, there is no guarantee that the writer’s view of the Enum type exactly matches the reader’s. The schema may have changed: in the writer’s schema, if an Enum’s values have been reordered, or if an Enum value has been added before the end (but will otherwise never be sent to this particular reader), the reader and writer do not have enough information to agree on the correct numbering.

The following types will always be marked with BLOCK, with a key set to the GraphQL type was generated from:

STRING, VARINT, FLOAT64, BYTES, FIXED

These types will be marked to deduplicate within their Block by default (but it may be overridden by @ArgoDeduplicate):

STRING, BYTES

4.1Directives

Argo uses the following directives in a GraphQL schema to customize the wire schema:

enum ArgoCodecType {
  String
  Int
  Float
  Boolean
  BYTES
  FIXED
  DESC
}

directive @ArgoCodec(codec: ArgoCodecType!, fixedLength: Int) on SCALAR | ENUM
directive @ArgoDeduplicate(deduplicate: Boolean! = true) on SCALAR | ENUM

These directives may be omitted when custom scalars are not used and default behavior is desired. Otherwise, they must be added to the GraphQL schema.

@ArgoCodec
Specifies the Wire type to use for a scalar. The @ArgoCodec directive is required for custom scalars, and may be used on any scalar or enum. It specifies the Wire type to use for that scalar or enum.
String, Int, Float, and Boolean match the behavior for these built-in GraphQL types (i.e. they are transformed to STRING, VARINT, FLOAT64, and BOOLEAN respectively). BYTES and FIXED, used for binary data, correspond to those Wire types. DESC corresponds to the self-describing Wire type.
The fixedLength argument is required for FIXED scalars, and specifies the length of the fixed-length binary data. It is an error to specify fixedLength for any other Wire type.
@ArgoDeduplicate
Specifies whether to deduplicate a scalar or enum within a Block.
The @ArgoDeduplicate directive is optional, and may be used on any scalar or enum. The default deduplication behavior (used when the directive is absent) is described above in Creating a Wire schema, and is based on the codec used. Note that deduplication is still only allowed on Labeled types.

4.2Algorithms

A wire schema is generated by the following algorithms. Typically, CollectFieldWireTypes() should be called with the root type of the GraphQL Schema’s operation (typically, Query) along with the selection set—usually from a Query, but potentially for that returned by a Mutation or Subscription.

Note Much of this code may be easier to follow in the reference implementation’s Typer class.
Note CollectFieldsStatic() is based on GraphQLs CollectFields() algorithm.
GraphQLTypeToWireType(graphQLType)
  1. If graphQLType is a Scalar or Enum:
    1. If graphQLType has an @ArgoCodecDirective, set codec to its argument
    2. If graphQLType has an @ArgoDeduplicateDirective, set deduplicate to its argument
    3. Otherwise, let deduplicate be false
    4. If graphQLType is an Enum:
    5. If graphQLType is a String or ID:
      1. If codec is not set, set it to use the STRING codec
      2. Set deduplicate to true
    6. If graphQLType is an Int:
      1. If codec is not set, set it to use the VARINT codec
    7. If graphQLType is a Float:
      1. If codec is not set, set it to use the FLOAT64 codec
    8. If graphQLType is a Boolean:
      1. If deduplicate is set, fail with an error because BOOLEAN cannot be deduplicated
      2. return Nullable(BOOLEAN)
    9. If graphQLType is a custom scalar:
      1. If codec is not set, fail with an error because codec is required for custom scalars
      2. If deduplicate is not set, set it to the corresponding value above for the type of codec
    10. Set blockID to the name of the graphQLType‘s type
    11. Return Nullable(Block(codec, blockId, deduplicate))
  2. If graphQLType is a List:
    1. Set underlyingType to the result of calling GraphQLTypeToWireType() with the underlying type of the List
    2. Return Nullable(Array(underlyingType))
  3. If graphQLType is an Object, Interface, or Union:
    1. Set fields to the empty list
    2. Return Nullable(Record(fields))
  4. If graphQLType is NonNull:
    1. Set type to the result of calling GraphQLTypeToWireType() with the underlying type of the NonNull, then removing its Nullable wrapper
    2. Return type
CollectFieldWireTypes(selectionType, selectionSet)
  1. Initialize recordFields to an empty list.
  2. For each alias and ordered set of fields in CollectFieldsStatic(selectionSet):
    1. For each field in fields:
      1. Initialize omittable to false
      2. If field was selected by a fragment spread, set typeCondition to the name of the type condition specified in the fragment definition
      3. If field was selected by an inline fragment and a type condition has been specified, set typeCondition to the name of the type condition specified in the inline fragment
      4. If typeCondition is set, but not set to the name of selectionType, set omittable to true
      5. If field provides the directive @include, let includeDirective be that directive.
        1. If includeDirective‘s if argument is variable, set omittable to true
      6. If field provides the directive @skip, let skipDirective be that directive.
        1. If skipDirective‘s if argument is variable, set omittable to true
      7. If field was selected by a fragment spread or inline fragment that provides the directive @include, let includeDirective be that directive.
        1. If includeDirective‘s if argument is variable, set omittable to true
      8. If field was selected by a fragment spread or inline fragment that provides the directive @skip, let skipDirective be that directive.
        1. If skipDirective‘s if argument is variable, set omittable to true
      9. If field is a selection set:
        1. Set wrapped to the result of calling TypeToWireType() with the field‘s GraphQL type
        2. Let wrap(wireType) be a function which recursively applies NULLABLE, BLOCK, and ARRAY wrappers around wireType in the same order they appear in wrapped
        3. Set type to the result of calling CollectFieldWireTypes(field.type, field.selectionSet)
        4. Set type to the result of calling wrap(type)
        5. Append Field(alias, type, omittable) to recordFields
      10. Otherwise:
        1. Set type to the result of calling TypeToWireType(field.type)
        2. Append Field(alias, type, omittable) to recordFields
      11. For any field in recordFields which shares a name and is a selection set, recursively combine fields into a single selection set field which orders selections in the same order as in recordFields
      12. For any field in recordFields which shares a name but is not a selection set, remove all but the first from recordFields (these will be equivalent in all valid queries)
      13. Return recordFields
CollectFieldsStatic(selectionSet, visitedFragments)
  1. If visitedFragments is not provided, initialize it to the empty set.
  2. Initialize groupedFields to an empty ordered map of lists.
  3. For each selection in selectionSet:
    1. If selection provides the directive @skip, let skipDirective be that directive.
      1. If skipDirective‘s if argument is always true, continue with the next selection in selectionSet.
    2. If selection provides the directive @include, let includeDirective be that directive.
      1. If includeDirective‘s if argument is never true, continue with the next selection in selectionSet.
    3. If selection is a Field:
      1. Let responseKey be the response key of selection (the alias if defined, otherwise the field name).
      2. Let groupForResponseKey be the list in groupedFields for responseKey; if no such list exists, create it as an empty list.
      3. Append selection to the groupForResponseKey.
    4. If selection is a FragmentSpread:
      1. Let fragmentSpreadName be the name of selection.
      2. If fragmentSpreadName is in visitedFragments, continue with the next selection in selectionSet.
      3. Add fragmentSpreadName to visitedFragments.
      4. Let fragment be the Fragment in the current Document whose name is fragmentSpreadName.
      5. If no such fragment exists, fail with an error because the referenced fragment must exist.
      6. Let fragmentSelectionSet be the top-level selection set of fragment.
      7. Let fragmentGroupedFieldSet be the result of calling CollectFieldsStatic(fragmentSelectionSet, visitedFragments).
      8. For each fragmentGroup in fragmentGroupedFieldSet:
        1. Let responseKey be the response key shared by all fields in fragmentGroup.
        2. Let groupForResponseKey be the list in groupedFields for responseKey; if no such list exists, create it as an empty list.
        3. Append all items in fragmentGroup to groupForResponseKey.
    5. If selection is an InlineFragment:
      1. Let fragmentSelectionSet be the top-level selection set of selection.
      2. Let fragmentGroupedFieldSet be the result of calling CollectFieldsStatic(fragmentSelectionSet, visitedFragments).
      3. For each fragmentGroup in fragmentGroupedFieldSet:
        1. Let responseKey be the response key shared by all fields in fragmentGroup.
        2. Let groupForResponseKey be the list in groupedFields for responseKey; if no such list exists, create it as an empty list.
        3. Append all items in fragmentGroup to groupForResponseKey.
  4. Return groupedFields.

5Binary encoding

Argo’s binary encoding does not include field names, self-contained information about the types of individual bytes, nor field or record separators. Therefore readers are wholly reliant on the Wire schema used when the data was encoded (or any compatible Wire schema), along with any information about custom scalar encodings.

Argo always uses a little-endian byte order.

Note Reading Argo messages often involves reading length prefixes followed by that many bytes. As always in situations like this, use bounds checking to avoid buffer over-read.

5.1Message

An Argo Message consists of these concatenated parts:

  • A variable-length Header
  • 0 or more concatenated Blocks containing scalar values, each prefixed by their length
  • 1 Core, which contains the Message’s structure, prefixed by its length

5.2Header

The Header is encoded as a variable-length BitSet. After into a fixed bit array, each bit in the BitSet has a defined meaning described below.

Numbered least to most significant bits:

0: InlineEverything
1: SelfDescribing
2: OutOfBandFieldErrors
3: SelfDescribingErrors
4: NullTerminatedStrings
5: NoDeduplication
6: HasUserFlags

When a given flag is set, Argo’s behavior is modified as described below. Each may also be referred to as a Mode of operation, and the corresponding bit must be set if and only if the messages uses the corresponding Mode.

InlineEverything
In this Mode, Blocks are omitted, along with their length prefixes. Core‘s length prefix is also omitted. Instead, scalar values are written inline in Core (i.e. at the current position when they are encountered).
This generally results in smaller messages which do not compress as well. Useful when the Message will not be compressed. For tiny messages (say, dozens of bytes) this usually results in the smallest possible payloads.
SelfDescribing
In this Mode, Core is written exactly as if its type were DESC. This makes the message value self-describing.
This generally makes the payload much larger, and is primarily useful when debugging.
OutOfBandFieldErrors
In this Mode, GraphQL Field errors are guaranteed not to be written inline, and instead appear in the errors array, if any.
This makes it easier to convert JSON payloads to Argo after the fact, but eliminates the benefits of inline errors.
SelfDescribingErrors
In this Mode, errors are not encoded as usual. Instead, each is encoded as a self-describing value (which must adhere to the GraphQL spec). This applies to both Field errors and Request errors.
This makes it easier to convert JSON payloads to Argo after the fact, but gives less type safety and generally results in larger error payloads.
NullTerminatedStrings
In this Mode, all messages of type String are suffixed with a UTF-8 NUL (i.e. a 0x00 byte). This byte is not included in the String’s length, and is not considered part of the String. Other NUL bytes may still appear within each String.
This makes it possible to implement zero-copy in language environments relying on NUL-terminated strings, but generally makes the payload larger.
NoDeduplication
In this Mode, the message is guaranteed to never use backreferences. This may be because the encoder chose to duplicate values, or because duplicates were never encountered. The decoder MAY safely skip calculating backreference IDs, which carries a small cost.
HasUserFlags
In this Mode, the Header BitSet is followed by another variable-length BitSet called UserFlags. The meaning of entries in UserFlags is up to the implementation, and remain outside the scope of this specification.
This is useful to prototype custom implementations and extensions of Argo.

5.3Blocks

Argo Blocks are named contiguous blocks of encoded scalar values of the same type.

Each begins with a Label encoding the length of the block in bytes (not counting the length prefix).

Concatenated to this is every value in the block. The encoding of each value is defined below. Generally, this will not include any metadata, only values.

The name (or key) of each Block is not encoded in the message.

5.4Core

The Core of a Message contains the primary structure of the payload.

The Core is prefixed with a Label encoding the its length in bytes (not counting the length prefix). This is omitted when operating in InlineEverything mode.

The rest of the Core consists of a single value which encodes the payload. This is almost always a RECORD corresponding to GraphQL’s ExecutionResult.

5.5Label

Argo uses a multipurpose binary marker called a Label which combines several use cases into one compact representation. A Label is written using the variable-length zig-zag coding. A Label is essentially a number which should be interpreted differently according to its value and the context in which it is used.

  • For variable-length data, such as a STRING or ARRAY, non-negative values represent the length of the data that is to follow
    • The units depend on the data: STRING lengths are in bytes, while ARRAY lengths are in entries
  • For BOOLEANs, 0 means false and 1 means true
  • For NULLABLE values, -1 means null
  • For NULLABLE values which are not null and are not normally prefixed by a Label, 0 means not-null
    • Values which are prefixed by a Label even when non-nullable omit this non-null marker entirely, since we can rely on the Label’s value to tell us it is not null
  • For NULLABLE values, -3 means there was a Field Error which terminated its propagation (if any) here
  • For fields which may be omitted—such as fields that come from a selection set over a Union, and therefore may not appear at all—the value -2 is used to represent absence, called the Absent Label
  • All other negative numbers are used for Backreferences: identification numbers which refer to values which appeared previously in the Message

Types whose values are prefixed with a Label or are themselves a Label are called Labeled. For example, STRING, ARRAY, BOOLEAN, and all NULLABLE types are Labeled.

Types whose values are not prefixed with a Label and are not themselves a Label are called Unlabeled. For example, non-nullable RECORD and non-nullable FLOAT64 are Unlabeled.

5.6Data encoding

Data are encoded in binary as described here.

STRING
STRING values are encoded as UTF-8 and written to their Block. In Core, a Label is written which is the length of the encoded value in bytes. Typically, repeated STRING values may be deduplicated by instead writing a backreference to Core.
In NullTerminatedStrings mode, an additional UTF-8 NUL (0x00) is written to the block following the UTF-8 value (this is not counted in the length written to Core).
BOOLEAN
BOOLEAN values use the value 0 for false and 1 for true, and are written as a Label to Core.
VARINT
VARINT (variable-length integer) values are written to Core and use the variable-length zig-zag coding.
FLOAT64
FLOAT64 values are written to their Block as 8 bytes in little endian order according to IEEE 754’s binary64 variant. Nothing is written to Core.
BYTES
A BYTES is encoded as unaltered contiguous bytes and written to its Block. In Core, a Label is written which is the length of the encoded value in bytes. Typically, repeated BYTES values may be deduplicated by instead writing a backreference to Core.
FIXED
FIXED values are written to their Block as bytes in little endian order. Nothing is written to Core. The number of bytes is not included in the message in any way, since it is in the Wire schema.
RECORD
RECORD values are written as a concatenation of their Fields to Core. Each Field is written recursively in the order it appears in the Wire schema. If a Field is omittable and absent, it is written as the Absent Label. If a Field is omittable and present, but its underlying type is Unlabeled, a non-null Label is written to Core before writing the field’s value. The number of fields and their types are not included in the message in any way, since that is in the Wire schema.
ARRAY
ARRAY values are written as a Label in Core which contains the ARRAY‘s length (in entries), followed by a concatenation of its entries recursively.
BLOCK
BLOCK is not written to the Message directly. Instead, it modifies its underlying type, naming which block it should be written to and whether values should be deduplicated. Block keys match the name of the type in the GraphQL schema it is generated from. For example, ‘String’ for the built-in type, or a custom scalar’s name. Deduplication is configurable with the ArgoDeduplicate directive, with defaults specified under Wire schema.
NULLABLE
NULLABLE values are written differently depending on whether the underlying value is Labeled or Unlabeled. The value null is always written to Core as the Null Label with the value 0. If the underlying value is present and Labeled, non-null values are simply written recursively and unmodified using the underlying value’s encoding. If the underlying value is present and Unlabeled, first the Non-null Label is written to Core, then the underlying value is written recursively.
DESC
DESC values are self-describing, and primarily used to encode errors. This scheme is described in Self-describing encoding.
PATH
PATH values represent a path into a GraphQL response, such as are used inside Error values. Inline field error paths are relative to the location they appear, and all others are relative to the response root. First, GraphQL spec-compliant paths are transformed to a list of integers as described in Path value transformation. Then, this list of integers is encoded exactly as an ARRAY of VARINT values.

5.6.1Variable-length zig-zag coding

The variable-length zig-zag coding is a way to encode signed integers as a variable-length byte sequence. Argo uses a scheme compatible with Google Protocol Buffers. It uses fewer bytes for values close to zero, which are more common in practice. In short, it “zig-zags” back and forth between positive and negative numbers: 0 is encoded as 0, -1 as 1, 1 as 10, 2 as 11, 2 as 100, and so on. A bigint variable n in TypeScript can be transformed as follows, then written using the minimum number of bytes (without unnecessary leading zeros):

ToZigZag(n)
  1. return n >= 0 ? n << 1n : (n << 1n) ^ (~0n)
FromZigZag(n)
  1. return (n & 0x1n) ? n >> 1n ^ (~0n) : n >> 1n

5.6.2Self-describing encoding

Argo is intended to rely on known types taken from GraphQL queries and schemas. However, the errors array in GraphQL is very free-form as specified in the GraphQL Spec. To support this, as well as to ease debugging in certain circumstances, a self-describing format is included.

Self-describing values use a Type marker, a Label written to Core with a predetermined value representing the type of the value to follow.

In the self-describing format, most values are encoded as usual, including using Blocks. However, values in this format only use the following Blocks:

  • A block with key “String”, used for all values marked String
  • A block with key “Bytes”, used for all values marked Bytes
  • A block with key “Int”, used for all values marked Int
  • A block with key “Float”, used for all values marked Float
Note These Blocks may also be used for non-self-describing values. This is intentional.

To write a Type marker, encode the given value as a Label and write it to Core.

To write a self-describing value, first map the desired value to the closest type described in Self-describing types. Then, write each type as below (reading follows the same pattern):

Null (-1)
Written as Type marker -1 in Core.
Boolean false (0)
Written as Type marker 0 in Core.
Boolean true (1)
Written as Type marker 1 in Core.
Object (2)
Begins with Type marker 2 in Core, followed by a second Label in Core encoding the number of fields which follow. All fields follow in order, each written as a STRING capturing the field name (with no preceding Type marker), then recursively written the field’s value using the self-describing encoding. These alternate until completion, concatenated together.
List (3)
Begins with Type marker 3 in Core, followed by a second Label in Core encoding the length of the list. Each entry is then written recursively in the self-describing format, concatenated together. Note that heterogeneous types are allowed: this is important for GraphQL’s error representation.
String (4)
Written as Type marker 4 in Core, followed by a non-self-describing STRING with Block key “String”.
Bytes (5)
Written as Type marker 5 in Core, followed by a non-self-describing BYTES with Block key “Bytes”.
Int (6)
Written as Type marker 6 in Core, followed by a non-self-describing VARINT with Block key “Int”.
Float (7)
Written as Type marker 7 in Core, followed by a non-self-describing FLOAT64 with Block key “Float”.

5.7Backreferences

Argo Backreferences are numeric references to values which appeared previously in the Message. Backreferences are encoded as Labels with negative values.

Argo reduces data size on the wire by avoiding repeated values. Whenever a potentially-large value is read or written for the first time in a given Block, it is remembered and given the next available backreference ID number (which is always negative). When it is next used, it can be identified by the backreference ID, eliminating the need to encode (and later decode) the entire value again.

Each Block has a separate backreference ID space. This means backreference IDs are not unique across types: a backreference ID -5 refers to a different value for String than it does for a hypothetical MyEnum. Backreference IDs count down, beginning at the largest non-reserved negative label value: this is -4, since the Error label (-3) is the smallest reserved value.

Note For certain messages, this allows Argo representations to remain small in memory by avoiding duplication even after decompression (and further, after parsing). It also helps keep Argo messages small without compression.

When encoding, the encoder SHOULD deduplicate by returning backreference IDs instead of re-encoding duplicated values. This is typically implemented with a Map data structure.

Note The encoder MAY choose to duplicate values instead of returning backreferences whenever it chooses. For example, an easy optimization is to simply duplicate values which are smaller than the backreference ID itself.

When decoding, the decoder MUST track backreference IDs for Blocks with deduplication enabled, usually by storing an array of previously-encountered values. However, this MAY be skipped for messages in NoDeduplication mode.

In order to maintain a compact data representation, backreferences (and therefore deduplication) are only supported for Labeled types. Note that even Unlabeled values may be written to Blocks, to impove compressability.

5.8Errors

Errors require special treatment for three reasons:

  1. Field errors are inlined with data (except in OutOfBandFieldErrors Mode). This makes it easy to distinguish between null and an error as soon as a value is read, and also makes their representation more compact.
  2. The “extensions” portion of each errors object must be self-describing. This is in contrast to all other data: we don’t know the schema/types of “extensions” data, and it may vary between objects.
  3. The “path” portion of each error object is not representable directly in GraphQL (or Argo) types. This is because it mixes String and Int primitive values, which GraphQL forbids for data.
Note Normally, Argo does not allow for direction extension to field error objects outside of the extensions field, even though the GraphQL spec allows for (but discourages) it. This is on the grounds that it is very easy to recover this information by simply moving it to the extensions field when migrating to this data format, and it simplifies Argo. If required, SelfDescribingErrors can be used to allow for this.

5.8.1Error values

Error values are written in a specific format, which has the following schema in GraphQL (and a corresponding schema in Argo):

type Location {
  line: Int!
  column: Int!
}

type Error {
  message: String!
  location: [Location!]
  path: PATH
  extensions: DESC
}

These all take the values described in the GraphQL spec with these exceptions:

  1. The path field uses the Path encoding described below. Paths should be converted to a more convenient format in the reader’s code generator, such as intermixed path strings and integer indexes.
  2. The extensions field is written as a nullable Object in the Self-describing object format with any values the writer chooses, or as Null if there are no extensions.
Note path and extensions are not representable as normal GraphQL responses: path mixes String and Int primitive values, which GraphQL forbids for data; extensions must be a map, and has no other restrictions. Based on path‘s behavior (which violates GraphQL’s typing rules), this seems to include values only representable in the transport layer (like JSON, or this spec). There is no information about the extensions map in the schema or any query.

When operating in SelfDescribingErrors mode, errors are not encoded as described here. Instead, each is encoded as a self-describing value (which must adhere to the GraphQL spec). This applies to Field errors and Request errors.

5.8.2Request errors

Request errors are stored in an errors array in the usual response location, encoded as Error values.

5.8.3Field errors

Nullable fields are the only valid location for Field errors. When Field errors are encountered, the errors propagate to the nearest nullable encompassing field, and then an Error Label is written to Core. All relevant field errors should then be written to Core as a ARRAY of Error values using the format above. However, the path field should only encode the path from the field which the error propagated to to the field which the error occurred in. This is because the path from the root of the query is knowable due to where the Error Label is encountered. This makes the representation more compact. However, implementations should make full path easily available to users.

When operating in OutOfBandFieldErrors mode, errors are not written as described here. Instead, an Error (preferred) or Null Label is written to Core (with no additional error data following), and the error is written separately to the errors array. The path must include the full path from the root.

5.8.4Path value transformation

Argo transforms GraphQL location paths before encoding them as PATH in order to make them more compact.

PathToWirePath() is used to transform a GraphQL location path into a list of integers, and WirePathToPath() transforms an encoded list of integers into a GraphQL location path.

PathToWirePath(path, wireType)
  1. If wireType is RECORD:
    1. Set fieldName to the first element of path, which must be a string
    2. Set fieldIndex to the 0-based index of the RECORD field which matches fieldName
    3. Set tail to an array equal to path with its first element omitted
    4. Set underlyingType to the underlying type of wireType
    5. Return fieldIndex prepended to PathToWirePath(tail, underlyingType)
  2. If wireType is ARRAY:
    1. Set arrayIdx to the first element of path, which must be an integer index
    2. Set tail to an array equal to path with its first element omitted
    3. Set underlyingType to the underlying type of wireType
    4. Return arrayIdx prepended to PathToWirePath(tail, underlyingType)
  3. If wireType is NULLABLE or BLOCK:
    1. Set underlyingType to the underlying type of wireType
    2. Return PathToWirePath(path, underlyingType)
  4. Otherwise, return path (which must be an empty array)
WirePathToPath(path, wireType)
  1. If wireType is RECORD:
    1. Set fieldIndex to the first element of path, which must be a string
    2. Set fieldName to the name of the field at the 0-based index fieldIndex in the RECORD
    3. Set tail to an array equal to path with its first element omitted
    4. Set underlyingType to the underlying type of wireType
    5. Return fieldName prepended to WirePathToPath(tail, underlyingType)
  2. If wireType is ARRAY:
    1. Set arrayIdx to the first element of path, which must be an integer index
    2. Set tail to an array equal to path with its first element omitted
    3. Set underlyingType to the underlying type of wireType
    4. Return arrayIdx prepended to WirePathToPath(tail, underlyingType)
  3. If wireType is NULLABLE or BLOCK:
    1. Set underlyingType to the underlying type of wireType
    2. Return WirePathToPath(path, underlyingType)
  4. Otherwise, return path (which must be an empty array)

6Argo APIs

Argo is suitable for a variety of contexts, but it is primarily designed for encoding responses to GraphQL queries over HTTP.

6.1HTTP considerations

If a client initiating an Argo HTTP request prefers a specific Argo Mode, it MAY include the Argo-Mode header with the case-insensitive names of the preferred modes separated by semicolons.

Example № 2Argo-Mode: SelfDescribingErrors;OutOfBandFieldErrors

6.1.1MIME type

When an HTTP client supports Argo, it SHOULD use the MIME type application/argo in the Accept header, ideally with a Quality Value exceeding that of other encodings (such as application/json).

When an HTTP response is encoded with Argo, the Content-Type header SHOULD also use the MIME type application/argo.

6.1.2Compression

Compression of Argo messages is generally recommended. The Blocks are designed to make Argo particularl amenable to compression.

The reference implementation compares different compression schemes. Based on this, Brotli (at quality level 4) is recommended for most workloads. This is a nice balance of small payloads, fast compression and decompression, and wide support. If Brotli is not available, gzip (at level 6) is a good alternative. Small responses (say, less than 500 bytes) need not be compressed at all.

Without compression, Argo results in much smaller payloads than uncompressed JSON. If CPU usage is a concern, consider using a very fast compression algorithm (e.g. LZ4).

AAppendix: Motivation and background

GraphQL typically serializes data into JSON, but GraphQL is designed to support other serializations as well. Argo is purpose-made to improve on serialization for GraphQL.

A.1JSON

JSON is the standard serialization for GraphQL data. In the context of GraphQL responses, it has many strengths as well as a few weaknesses.

Strengths of JSON:
  • Ubiquitous
    • Many stable, high-performance implementations
    • High quality tools for working with JSON
  • Self-describing (simple and usable even without tools)
    • Independent of GraphQL schemas, documents, queries, and types
  • Human-readable and machine-readable
Weaknesses of JSON:
  • Large data representation (relative to binary formats)
    • Repetitive data format (e.g. field names) leads to large uncompressed sizes
    • Self-delimited self-describing data uses additional space
  • Limited data type availability
    • Byte string data must be “stuffed” into Unicode strings
    • 64-bit integers don’t work reliably across all platforms
    • “Stuffing” other types into e.g. String can introduce inefficiencies

A.2Tradeoffs

In most cases JSON is a great choice for GraphQL. However, it can be difficult to address the weaknesses. Primarily, these are related to performance: reducing the size of payloads, and reading and writing them quickly.

The value of reading and writing data quickly is self-evident. The benefits of reduced payload sizes can be somewhat more subtle:

  • Decreased latency across the stack
    • Most importantly, over the network
      • TCP Slow Start, QUIC Slow Start, and other congestion control mechanisms mean larger first response payloads can significantly increase user-observed latency (especially on app open)
      • Dropped and retried packets are more likely with larger responses, especially over unreliable mobile connections
      • Smaller payloads transfer more quickly
    • Time spent serializing, deserializing, and copying around data
    • Time spent cleaning up data, such as garbage collection
  • Increased I/O throughput across the stack

A.3Argo

To address the aforementioned weaknesses, Argo makes a different set of tradeoffs than JSON.

Strengths of Argo:
  • Compact binary format
    • Not self-describing: relies on GraphQL types instead
  • Unusually compressible
    • Stores data of the same type in blocks to assist compression algorithms, e.g. all Strings are stored together
  • Maximizes re-use
    • Deduplicates repeated values, so deserializing and converting to required types happens once
  • Flexible type availability
    • GraphQL scalars specify their encoding/decoding with a directive
    • Supports all GraphQL types
    • Also natively supports:
      • Variable-length byte strings
      • Fixed-length byte strings
      • Variable-length integers
  • Simple to implement relative to other binary formats (e.g. protobuf, Thrift, Avro)
Weaknesses of Argo:
  • As of today, reference implementation only
    • No stable, high-performance implementations
  • Almost no tools for debugging or analysis
  • Binary format which is not self-describing in its intended mode of operation
    • Relatively difficult to debug or analyze without tools
    • Requires GraphQL schema and query be known
  • Input types not supported
    • Simpler implementation, but JSON still needed

A.4Recommendation

Overall, JSON is the best choice for most GraphQL deployments. However, Argo is a good choice for systems where performance is paramount. Please consider the tradeoffs above.

BAppendix: Design notes

This section is not a part of the technical specification, but instead provides additional background and insight.

B.1Ideas which did not pan out

  • Use bitmasks on each selection set to mark null (or absent) fields. See Design Notes for more.
  • Require all Field Errors be represented inline. This would be nice, but it makes it more difficult to convert JSON responses to Argo. Therefore, this is left as an optional feature (see the OutOfBandFieldErrors flag).
  • Specify a Wire type definition for ExecutionResult, particularly errors. Unfortunately, error paths mix string and int, which has no corresponding GraphQL type. We could serialize this all as string and convert back later.
  • Make the self-describing format able to describe the Wire format. This made things more complex.
  • Avro uses an array format where an arbitrary number of “segments” can be added independently without knowing the final array length, which makes streaming encoding easier. During development Argo was focused on stream support and followed suit, but this was dropped due to lack of compelling GraphQL use cases, and because it conflicted with other techniques (namely Blocks).

B.2Ideas which were not pursued

  • Use Label before Floats to represent the number 0s to pad it with. Due to JSON/JS, many doubles are actually smallish Ints, which in IEEE 754 means leading with 0-bytes. This might work out on average, especially since 0 takes only 1 byte.
  • Specify how to write compact Paths to values to represent errors. A series of tags.
    • If errors inlined, path up to propagation stop is implicit. The path from there down into any initiating non-nullable field would need to be explicit though, need to account for
  • Whenever a default value in encountered for a scalar type which is deduplicatable, implicitly store it with a backreference ID and use it later. This may break if the schema evolves.
  • Bake in default backreferences for common strings: ‘line’, ‘column’, ‘path’. For certain small messages, this could make a difference. The extra complexity doesn’t seem worth it though.
  • Instead of a self-describing format, simply embed JSON. This is not a knock-out win, especially for the resulting API.
  • Use a variable-length compact float format, such as vf128, compact float, or even ASN.1’s REAL BER/DER. This would be most helpful for GraphQL APIs which return many Floats with values near zero. Other options might be ALP: Adaptive Lossless floating-Point Compression or the “Pseudodecimal Encoding” from BtrBlocks.
  • Encode the entire ExecutionResult‘s type in each Wire schema, including the errors array. In particular, the user would need to provide their intended extensions format and stick to it, and we’d need to fudge the type of path (which mixes numbers and strings in the GraphQL spec). The upshot would be total elimination of the self-describing format and the inconvenience, inefficiency, and complexity that causes.
  • Specifying which types actually use backreferences in a given message could be made more granular. For example, the header could be extended with scheme similar to UserFlags, where a flag is set in the main header and an extra BitSet follows the Flags BitSet. This extra BitSet would set one bit in order for each potentially-deduplicatable type encountered in the message, in order. This could work around client-side inefficiency in bimodal deduplication patterns. However, this seems unlikely to be enough of a problem to justify the complexity.
  • Default values. Fields could be marked (in the query or the schema, with query taking precedence) with a default value. Ideally, we would reserve a value (similar to Absent, Null, Error) to indicate when the default is used. (Alternatively, we could reserve/pun the first slot in the backreferences when a type ever uses a default.) This would avoid ever sending the full value, instead of sending it once. This would work best for very large strings which first appear very late in the message, or for non-deduplicatable types (like VARINT) with large encodings which appear many many times. These use cases seem to niche to justify the additional complexity.
  • @stream and @defer will likely require additional support. #12 covers some of this. In addition, Blocks will need to become extensible. One scheme for this is to number each block in the same way as Backreferences. Then each new message begins with a Block section, but each block is prefixed with its backreference number. Alternatively, we could include all blocks, but I expect that will mostly be a bunch of zeroes. It will also need to support blocks not seen in the original message (though the possibilities will be known from the query).
  • Constant values, outside of CONST_STRING for stream/defer. These are natural extensions, but have no use yet in GraphQL.

DFormal

D.1Conformance

A conforming implementation of Argo must fulfill all normative requirements. Conformance requirements are described in this document via both descriptive assertions and key words with clearly defined meanings.

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative portions of this document are to be interpreted as described in IETF RFC 2119. These key words may appear in lowercase and still retain their meaning unless explicitly declared as non-normative.

A conforming implementation of Argo may provide additional functionality, but must not where explicitly disallowed or would otherwise result in non-conformance.

D.2Versioning

Argo is versioned using SemVer 2.0.0. Each version of Argo explicitly targets one version of the GraphQL spec, which is usually the latest at time of writing.

EAuthors and contributors

Argo was created and authored by Mike Solomon.

A big Thank You to these fine folks who have contributed on GitHub!

FChangelog

F.1Version 1.2

F.1.1v1.2.0

  • Permit self-describing types (i.e. DESC) to be specified in @ArgoCodec directives.
  • Fixed typos in Self-describing encoding section.

F.2Version 1.1

F.2.1v1.1.4

Renamed Field.type to Field.of in the wire schema’s JSON representation.

F.2.2v1.1.3

Clarified merging of fields which are not selection sets in CollectFieldWireTypes() .

F.2.3v1.1.2

Added additional notes and links.

F.2.4v1.1.1

BREAKING CHANGE – some changes are backwards incompatible, but no known implementation relied on them.

F.2.5v1.1.0

BREAKING CHANGE – some changes are backwards incompatible, but no known implementation relied on them.

  • Introduced compact paths for errors (and with an eye to streaming) by encoding as a list of integers, described in Path value transformation
  • Added new PATH wire type – closes #1
  • Inline errors are now arrays of errors instead of a single error – closes #2

F.3Version 1.0

Initial release.

§Index

  1. @ArgoCodec
  2. @ArgoDeduplicate
  3. ARRAY
  4. Backreferences
  5. BLOCK
  6. Blocks
  7. BOOLEAN
  8. Boolean false (0)
  9. Boolean true (1)
  10. BYTES
  11. Bytes (5)
  12. CollectFieldsStatic
  13. CollectFieldWireTypes
  14. Core
  15. DESC
  16. Execution Time
  17. FIXED
  18. Float (7)
  19. FLOAT64
  20. FromZigZag
  21. GraphQL response types
  22. GraphQL types
  23. GraphQLTypeToWireType
  24. HasUserFlags
  25. Header
  26. InlineEverything
  27. Int (6)
  28. Label
  29. Labeled
  30. List (3)
  31. Message
  32. Mode
  33. NoDeduplication
  34. Null (-1)
  35. NULLABLE
  36. NullTerminatedStrings
  37. Object (2)
  38. OutOfBandFieldErrors
  39. PATH
  40. PathToWirePath
  41. RECORD
  42. Registration Time
  43. Self-describing types
  44. SelfDescribing
  45. SelfDescribingErrors
  46. STRING
  47. String (4)
  48. ToZigZag
  49. Type marker
  50. Unlabeled
  51. variable-length zig-zag coding
  52. VARINT
  53. Wire schema
  54. Wire types
  55. WirePathToPath
  56. WireType
  1. 1Overview
  2. 2Types
    1. 2.1GraphQL types
      1. 2.1.1GraphQL input types
    2. 2.2Wire types
      1. 2.2.1Self-describing types
  3. 3Wire schema
    1. 3.1Wire schema serialization
  4. 4Creating a Wire schema
    1. 4.1Directives
    2. 4.2Algorithms
  5. 5Binary encoding
    1. 5.1Message
    2. 5.2Header
    3. 5.3Blocks
    4. 5.4Core
    5. 5.5Label
    6. 5.6Data encoding
      1. 5.6.1Variable-length zig-zag coding
      2. 5.6.2Self-describing encoding
    7. 5.7Backreferences
    8. 5.8Errors
      1. 5.8.1Error values
      2. 5.8.2Request errors
      3. 5.8.3Field errors
      4. 5.8.4Path value transformation
  6. 6Argo APIs
    1. 6.1HTTP considerations
      1. 6.1.1MIME type
      2. 6.1.2Compression
  7. AAppendix: Motivation and background
    1. A.1JSON
    2. A.2Tradeoffs
    3. A.3Argo
    4. A.4Recommendation
  8. BAppendix: Design notes
    1. B.1Ideas which did not pan out
    2. B.2Ideas which were not pursued
  9. CLegal
    1. C.1Copyright notice
    2. C.2License
  10. DFormal
    1. D.1Conformance
    2. D.2Versioning
  11. EAuthors and contributors
  12. FChangelog
    1. F.1Version 1.2
      1. F.1.1v1.2.0
    2. F.2Version 1.1
      1. F.2.1v1.1.4
      2. F.2.2v1.1.3
      3. F.2.3v1.1.2
      4. F.2.4v1.1.1
      5. F.2.5v1.1.0
    3. F.3Version 1.0
  13. §Index