nobe4 / Vim Syntax Generator

When you feel like knowing all this syntax stuff, perhaps write a blog post ;) - VanLaser

I may not know everything about the syntax mechanisms in Vim, but at least I’ll share what I understood building a syntax file generator.

TL;DR

How I created a syntax file generator for displaying StackExchange API values.

I have some data:

{
  "has_more": true,
  "items": [
    {
      "answer_count": 1,
      "creation_date": 1441047662,
      "is_answered": false,
      "last_activity_date": 1441047900, 
      "link": ...

And I define a formatting style:

let g:questions_format="{title}: [{score}/{answers}/{views}]({owner})"

And using only with Vim script and syntax I want to display the datas like:


   1 Remove empty list tags in apache cxf (camel): [0/0/2](Bart)
   2 Reference to a Method in a Package: [1/3/22](Sammy Esmail)
   3 Incorporating External Html into a jQuery Mobile page: [2/2/7835](Ben Pearce)
   4 Plotting random point on Function - Pandas: [0/3/14](Nicky Feller)
   5 Optimizing a cycle sort implementation: [1/1/1010](Olathe)
   6 Robot Framwork not recognizing some class keywords: [0/1/13](Ziwdigforbugs)
   7 How do I use the RabbitMQ delayed message queue from PHP?: [0/1/9](Jesse Weigert)
   8 How to pass dynamic value to byte array: [1/1/24](Alok Sharma)
   9 Points as Images: [0/0/2](Vangogh500)
  10 get class name in decorator python 2.7: [0/2/12](brian)
  11 getElementByID for Multiple IDs: [0/4/31](the_new_guy)
  12 Error in api-ms-win-core-path-l1-1-0.dll: [-2/1/38](A. Grifon)
  13 Reading complex numbers in comma separated form a text file into a MATLAB variable: [0/1/6](Naveen)
  14 Change tag of html with regex when content has specific word in SublimeText: [-2/1/21](Magritte)

The purpose of the code explained below was to interface with a plugin a friend and I were writing at a time. This plugin displayed results from StackExchange API calls using a user defined formatting. In this blog post I will focus only on the rendering part of the process.

Defining the data

There are 4 data types used to build up the syntax file and to display the information:

Dataset

The StackExchange API produce a dataset that is well defined, I can easily extract the following information from it:

[ 
  {
    "answer_count": 1,
    "creation_date": 1441047662,
    "is_answered": false,
    "last_activity_date": 1441047900,
    "link": "http://thelink",
    "owner": {
      "display_name": "John Doe",
      "reputation": 2278,
      "user_id": 237115
    },
    "question_id": 32318151,
    "score": 0,
    "tags": ["javascript", "nodejs", "html5"],
    "title": "the title",
    "view_count": 7
  }
]

Formatting String

The user should be able to select the information he wants and the order/styling he wants. For example, I defined the syntax as follows:

let g:questions_format="{title}\n
        \[{score}][{answers}][{views}]\n
        \Created {creation} (Modified {last_activity})\n
        \By {owner} {reputation}rep\n
        \{tags}\n\n"

Mapping

I can make a relation between the data and the formatting string. I could use the JSON key as is, but instead I wanted to be able to manipulate the data a little more, hence the current mapping:

let s:map = {
      \ 'title':         'title',
      \ 'answers':       'answer_count',
      \ 'creation':      {'function' : 'GetDiffDate', 'param' : 'creation_date'},
      \ 'last_activity': {'function' : 'GetDiffDate', 'param' : 'last_activity_date'},
      \ 'tags':          {'function' : 'GetTagsFromList', 'param' : 'tags' },
      \ 'score':         'score',
      \ 'views':         'view_count',
      \ 'id':            'question_id',
      \ 'owner':         ['owner', 'display_name'],
      \ 'reputation':    ['owner', 'reputation']
      \ }

Some keys remains unchanged, but on others, I decided to change the name or add functionalities, we will discuss them later on.

Syntax Format

This format is based on the mapping and add some display information for the defined group.

let s:syntax_rules={
      \ "question": {
      \   "id":            { "link" : "Ignore" },
      \   "title":         { "link" : "Tag",        "show" : 1, "matchgroup" : 1},
      \   "score":         { "link" : "Function",   "show" : 1, "matchgroup" : 1},
      \   "answers":       { "link" : "Constant",   "show" : 1, "matchgroup" : 1},
      \   "views":         { "link" : "Identifier", "show" : 1, "matchgroup" : 1},
      \   "creation":      { "link" : "Include",    "show" : 1, "matchgroup" : 1},
      \   "last_activity": { "link" : "Underlined", "show" : 1, "matchgroup" : 1},
      \   "owner":         { "link" : "Structure",  "show" : 1, "matchgroup" : 1},
      \   "reputation":    { "link" : "Special",    "show" : 1, "matchgroup" : 1},
      \   "tags":          { "link" : "Error",      "show" : 1, "matchgroup" : 1},
      \  }
      \ }

We will see the defined options in the next part, but for now we can say is that each key defined in the mapping object will have display options through the syntax.

Workflow

Here is how the data is handled:

Building The Content

This step uses the data and the formatting string to create the formatted dataset that will be displayed.

Main Function

let l:content = []
  for l:item in a:items
  let l:description = <SID>SubstituteMap(a:group_name, a:format, l:item, a:map)
    call add(l:content, l:description)
  endfor
return l:content

I fill the l:content array with all lines of the formatted data.

SubstituteMap

The SubstituteMap is defined as follows:

function! s:SubstituteMap(group_name, format, item, map) abort
  let l:format = a:format
  for [l:key, l:value] in items(a:map)
    if type("string") == type(l:value)
      let l:format = substitute(l:format,
            \ '{'.l:key.'}',
            \ <SID>WrapValue(a:group_name, l:key, a:item[l:value]), '')
    elseif type({}) == type(l:value)
      let l:format = substitute(l:format,
            \ '{'.l:key.'}',
            \ <SID>WrapValue(a:group_name, l:key,
            \   call('<SID>'.l:value['function'],
            \        [a:item[l:value['param']]]))
            \ ,'')
    elseif type([]) == type(l:value)
      let l:format = substitute(l:format,
            \ '{'.l:key.'}',
            \ <SID>WrapValue(a:group_name, l:key,
            \     a:item[l:value[0]][l:value[1]])
            \ , '')
    endif
    unlet l:value
  endfor
  return l:format
endfunction

Let’s break this down:

  let l:format = a:format
  for [l:key, l:value] in items(a:map)
    " ...
    unlet l:value
  endfor
  return l:format

I process each items of the a:map map, note that the l:value type may change ("string", {object}, or [array]) so I make sure I unlet the value before the next iteration.

Next I have the 3 cases of type for l:value.

String

if type("string") == type(l:value)
  let l:format = substitute(l:format,
        \ '{'.l:key.'}',
        \ <SID>WrapValue(a:group_name, l:key, a:item[l:value]), '')

This is pretty straightforward: I replace in the l:format string the {key} block with some value (I will explain the WrapValue function in a moment).

Object

elseif type({}) == type(l:value)
  let l:format = substitute(l:format,
        \ '{'.l:key.'}',
        \ <SID>WrapValue(a:group_name, l:key,
        \   call('<SID>'.l:value['function'],
        \        [a:item[l:value['param']]]))
        \ ,'')

Remember a sample mapping:

'last_activity': { 
    'function' : 'GetDiffDate', 
    'param' : 'last_activity_date'
}

I defined a function named GetDiffDate that compute the difference between a timestamps and the current timestamps.

I am also replacing in the l:format string using a custom call :

call('<SID>'.l:value['function'], [a:item[l:value['param']]])

This call the function named in the object and pass as argument the value of the item defined in the object.

With the previous values, the function call will be:

call <SID>GetDiffDate(a:item['last_activity_date'])

Array

elseif type([]) == type(l:value)
  let l:format = substitute(l:format,
        \ '{'.l:key.'}',
        \ <SID>WrapValue(a:group_name, l:key,
        \     a:item[l:value[0]][l:value[1]])
        \ , '')

You can guess what this part is doing by looking again at the mapping:

'owner': ['owner', 'display_name']

This will take the following value from the dataset:

a:item['owner']['display_name']

WrapValue

This function is wrapping the value inside custom defined blocks to enhance the syntax:

function! s:WrapValue(group, key, value) abort
  return '{'.a:group.'_'.a:key.'}'.a:value.'{/'.a:group.'_'.a:key.'}'
endfunction

The group is questions in this example, so a generated example will be:

return '{questions_title}The Title{/questions_title}';

Inserting the content

This part will be quicker, as it is not our main concern. For each generated line, I append it inside a buffer, splitting all lines by \n.

for l:line in a:lines
  call append(0, split(l:line, '\n'))
endfor
setlocal syntax=customsyntax
setlocal concealcursor=n

Now I set the syntax to be our custom defined syntax and to conceal the cursor to normal mode (more on this later on).

Generating The Syntax

Creating a syntax in Vim is quite a journey, doing it automatically is interesting and this is what we will see here.

We will only see three parts of the syntax: clearing, defining, linking. Even if there is more to see (:h syntax).

Introduction On Syntax

I will try to introduce as clearly as possible the few concepts behind Vim’s syntax that we will use.

Setting a syntax with :syntax enable or :syntax on does multiple things (:h syntax-loading):

You can put anything in the filetype, any valid ex command (just like in you .vimrc). The file will be sourced and will apply it modifications to the current buffer.

There are multiple types of syntax items (:h :syn-define):

I defined a syntax that will suit the region item: {start}value{/end}, so this is what I will be using.

The last part is highlight the matched syntax item. You can define your own background/foreground color or link the group to an existing one (:h group-name).

There is a mechanism that really fascinates me when I understood what can be made with: conceal. Conceals means that a part of the region will be hidden or replaced. The text is still in the file (you don’t replace anything, it is only a display setting) but you can render otherwise.

As an example of this you can take a look at the lambdify plugin that displays λ instead of function. The function is still here (the file will be valid) but you see a lambda instead.

In our case, we want to display only the content of the region (and not the enclosing pattern), or hide completely the content.

Syntax Rules

In my syntax rules mapping I defined some options to render the items:

let rule = { 
  \ "show" : 1,         " Shoud the item appears ?
  \ "link" : "Tag",     " Highlight linking category
  \ "matchgroup" : 1    " Capture the matchgroup
}

There are two basic cases: an element we want on screen, and an element hidden (but still in the buffer). Keeping data in the buffer without displaying them can be usefull for further reference to an item (for example, hidding his id).

Defining The Regions

Based on the two cases, I defined a function that create the two possible syntax region definitions:

syntax region QuestionId start="{id}" end="{/id}" conceal
syntax region QuestionTitle matchgroup=QuestionTitleWrapper start="{tite}" end="{/title}" concealends

The function simply takes the rule and expand it into the proper format. For each field in the rule, add the corresponding part of the syntax:

let l:syntax_string = "syntax region"
let l:syntax_string .= " ".a:group_name
if(has_key(a:properties, 'matchgroup'))
  let l:syntax_string .= " matchgroup=".a:group_name."_wrapper"
endif
let l:syntax_string .= " start='{".a:group_name."}' end='{/".a:group_name."}'"
if(has_key(a:properties, 'show'))
  let l:syntax_string .= " concealends"
else
  let l:syntax_string .= " conceal"
endif
exec l:syntax_string

Linking the region to a highlight group

We can define a set of colors for each of our syntax region, but for the sake of this example I decided to reuse the existing color groups.

You can see all the defined syntax-group in the documentation: :h group-name. Thus, it is only a matter of taste to choose which color goes to which group.

The linkin is pretty straightforward:

exec 'highlight link '.a:group_name.' '.a:link_name

Result !

Here is the fecthed result without the syntax :


    1 {question_id}32318063{/question_id}{question_title}Remove empty list tags in apache cxf (camel){/question_title}
    2 [{question_score}0{/question_score}][{question_answers}0{/question_answers}][{question_views}2{/question_views}]
    3 Created {question_creation}31 Aug{/question_creation} (Modified {question_last_activity}31 Aug{/question_last_activity})
    4 By {question_owner}Bart{/question_owner} {question_reputation}126{/question_reputation}rep
    5 {question_tags}[cxf-codegen-plugin]{/question_tags}
    6 
    7 {question_id}32317186{/question_id}{question_title}Reference to a Method in a Package{/question_title}
    8 [{question_score}1{/question_score}][{question_answers}3{/question_answers}][{question_views}22{/question_views}]
    9 Created {question_creation}31 Aug{/question_creation} (Modified {question_last_activity}31 Aug{/question_last_activity})
   10 By {question_owner}Sammy Esmail{/question_owner} {question_reputation}21{/question_reputation}rep
   11 {question_tags}[dispatch]{/question_tags}
   12 
   13 {question_id}11786310{/question_id}{question_title}Incorporating External Html into a jQuery Mobile page{/question_title}
   14 [{question_score}2{/question_score}][{question_answers}2{/question_answers}][{question_views}7835{/question_views}]
   15 Created {question_creation}02 Aug{/question_creation} (Modified {question_last_activity}31 Aug{/question_last_activity})
   16 By {question_owner}Ben Pearce{/question_owner} {question_reputation}1240{/question_reputation}rep
   17 {question_tags}[inject]{/question_tags}
   18 
   19 {question_id}32317251{/question_id}{question_title}Plotting random point on Function - Pandas{/question_title}
   20 [{question_score}0{/question_score}][{question_answers}3{/question_answers}][{question_views}14{/question_views}]
   21 Created {question_creation}31 Aug{/question_creation} (Modified {question_last_activity}31 Aug{/question_last_activity})
   22 By {question_owner}Nicky Feller{/question_owner} {question_reputation}52{/question_reputation}rep
   23 {question_tags}[ipython]{/question_tags}
   24  ...

:set syntax=mysyntax


    1 Remove empty list tags in apache cxf (camel)
    2 [0][0][2]
    3 Created 31 Aug (Modified 31 Aug)
    4 By Bart 126rep
    5 [cxf-codegen-plugin]
    6 
    7 Reference to a Method in a Package
    8 [1][3][22]
    9 Created 31 Aug (Modified 31 Aug)
   10 By Sammy Esmail 21rep
   11 [dispatch]
   12 
   13 Incorporating External Html into a jQuery Mobile page
   14 [2][2][7835]
   15 Created 02 Aug (Modified 31 Aug)
   16 By Ben Pearce 1240rep
   17 [inject]
   18 
   19 Plotting random point on Function - Pandas
   20 [0][3][14]
   21 Created 31 Aug (Modified 31 Aug)
   22 By Nicky Feller 52rep
   23 [ipython]
   24  ...

And there you go, you have a fully customizable syntax and highlight generator. Here you can see the result with the colorscheme I use generated directly within vim: :TOhtml.

Conclusion

I played a lot with Vim to build this little piece of code, but yet I didn’t even scratch the surface of all the possibilities the Vim syntax can offer.