Into the Mind Of a Schema Ninja: A Dialogue with Jarno van Driel About Semantic Search, Content and Triples
The year is 2009. Some folks from Drupal have just published a paper about RDF and web pages, titled Produce and Consume Linked Data with Drupal! Around that same time, even a year earlier, Jarno van Driel has already been producing Linked Data, working as a webmaster for several magazine websites and following his penchant for deconstructing things through conceptually modeling their meaning. Back in these days, as you will read in this Dialogue, Jarno was doing what Dan Brickley once warned about: “While you can avoid RDF, it is harder to avoid complicated data and complicated computer problems.”. So not avoiding the complexity of representing things and relationships with web pages, Jarno started experimenting with RDFa, which in turn brought his work in a W3 mailing list post and further led him to participate in an emerging community, surrounding schema.org.
This Dialogue is special for a number of reasons. In the first place, it is giant. A start of a book, as Jarno was kidding. Next, it is bearing the fruits of Jarno’s labor for more than a decade, which today me and you have the opportunity to enjoy. Last but not least, it is an inspiration for all of us, striving, and sometimes failing, to walk the talk of an accessible, meaningful and trusted web content and Web in general.
Grateful to Jarno for sharing his passion about triples and knowledge, I am now leaving you to bask in the details of a steep learning curve and level up your semantic SEO practices with down-to-the-earth advice and examples, from the ninja’s mouth.
I have divided the Dialogue in sections for better webonaut navigation experiences.
Semantic Web, Knowledge Graphs, SEO
Semantic SEO Matters
Jarno, let’s start with a blast 🙂 What is Semantic Search and how is it related to doing business on the Web 🙂
For me ‘Semantic Search’ represents a giant leap forward for people, because Google and Bing have moved on from exact match keyword based search (= limitation), towards entity-relation based search (= limitless).
Keyword based search was ultimately the cause for the old ways of keyword stuffing – while playing word-bingo with sentences and links – as not doing so meant your content would not show up in search such that it would get noticed.
The leap forward means people are now able to express themselves through content which resonates in search results based on its meaning, its applicable context, and the concepts and relations it expresses – without having to joggle exact-match words or (parts of) phrases.
And due to the recent advances in Natural Language Processing, people are also no longer forced to do so in just any specific form of syntax either (for example: semantic HTML, RDFa, microdata, JSON-LD).
Just a plain old text document can now suffice to express meaning which is extractable through Natural Language Processing.
All we have to do as creators is focus on making statements about the Things we want to express and inform our audience about, in a structured and meaningful manner. Something semantic search (and the technical reasoning behind it) made possible.
Why is it so hard to do “semantic” SEO by yourself?
Probably its main hurdle is that ‘Semantic SEO’, still does not have a well defined, communally accepted definition, describing what it is and should entail.
There are some (sort of) authoritative sources out there which attempted to provide some boundaries and insight into to its concept:
- Bill Slawski (RIP): What is Semantic SEO?
- Kingsley Idehen: Semantic Search Engine Optimization (SSEO)
- Schema App: Semantic SEO: What you need to know
- Wordlift: What is Semantic SEO?
- Wix: Semantic SEO: How to drive more meaningful traffic
But the biggest issues I have with the boundaries and insights these resources provide is:
- Intended or not, they can be interpreted as being expressions of corporate entities (through people) which have commercial interests in the matter.
Although it is not so much the fact who wrote them but more the fact that they have been written in isolation, and as a consequence do not properly align with each other. - There is no official-ish 3rd party (W3C, Wikipedia, SearchEngineLand, or even Search Engine) resource which provides a foundation by means of at least some (short) abstract about the concept, which a majority within the niche can agree on and expand upon.
- The lack of a widely accepted definition of the concept means many have their own interpretations and applications when they think/talk about it.
Making it a very hard sell still as interested leads (= possible clients or internal stakeholders) often seem to have their own thoughts about and interpretations and expectations of SSEO.
Because of my past experiences as an Accessibility engineer I am more than just aware of how writing for accessibility contributes to the semantic optimization of content.
Content which, as a result of that process, is much easier to digest for people, and much easier to translate into Semantic Markup because of its structured and descriptive nature.
So if you ask me what SSEO is/entails then my short answer is: An aggregated and pruned version of the resources I mentioned + writing for accessibility + a probably-too-much-to-ask-for personal desire to have this include WCAG guidelines as well.
And once we have such a definition we can start focussing on educating practitioners while beginning to define quality standards.
Although I am highly doubtful we will ever get that far given that the latter does not even exist for regular SEO.
There are plenty of opinions out there but as an industry we are terrible at getting together to create definitions which the majority embraces.
SEO specialists seem to be stuck at ‘it depends’ and ‘I think’…

Is there a recipe for “semantically and factually correct yet simple content”?
I think there is, educate yourself in writing for accessibility!
The techniques involved in this form of writing 99/100 times end up in very well structured and descriptive content (independent of its final medium), without having to be academically trained to be able to do so.
With the added bonus of producing something which is not only more accessible to people with different forms of disabilities, but also for people of different intellectual and educational levels, over different cultures.
And 99/100 times, well written, substance containing, accessible content is extremely easy to parse for both people and NLPs alike, as well as being easy to translate into machine readable markup (by both people and machines alike).
In my view writing for accessibility only entails WINS while also being the easiest of all methods for directly influencing results without involving lots of other people and large amounts of resources!
Tell me about the early days of schema.org 🙂 and how you ended up being part of the party 🙂
Hehe, well I am not going to spill all my beans here as the answer to this question covers a good part of a chapter in my first attempt at an artistic expression (Short story).
[Updated] Several months after this interview, in February 2024, Jarno finished this “short” story and it almost broke the schema.org validator 🙂
In summary: Accidentally, coincidentally and a causality identified as Aaron Bradly…
When I was working on Drupal themes & templates, for the first time in my career, I was doing a lot of research trying to figure out how Drupal was supposed to work.
And during my search for resources I accidentally ran into publications by some of the people working on Drupal.
Publications which mentioned Linked Open Data markup and because of it got me started experimenting with RDFa, prior to schema.org’s existence.
Which coincidentally led to some of my work being mentioned in a W3 mailing list post I ran into; One that mentioned the upcoming schema.org vocabulary.
And after having lurked around in W3 mailing lists for a few years it was Aaron who pushed me over the edge, causing me to dive into participating in the community surrounding schema.org.
And that is about the gest of it.
What is Jarno van Driel up to lately?
Please share some more information about the #nuggets you started sharing on LinkedIn, containing valuable information about how we can do better with structured data.
Full disclosure?! A combination of my physical disabilities, overenthusiasm, overambition, overreaching, overworking and disappointing outcomes (for me) led to running into the mother of all burnouts some years back.
And as a result of that experience I had to retreat myself for an N amount of time; to contemplate my career, its impact on my personal life and my health, and attempt to refocus myself by finding out what it is that truly matters to me as a person.
Being on such a personal voyage for some time now, led me to the point where I realized I was really missing being occupied with the ‘semantic web’ and ‘semantic optimization’.
Which was not that much of a shocking conclusion of course, as at my core I am a nerd. And these topics feed right into that personality trait; Which in return is basically the main reason for me ending up being interested in this ‘stuff’, roughly 15 years ago to begin with.
And so my recent online reappearance resembles an expression of me trying to reinvent myself and not being afraid to do it publicly, transparently and without a care about anything professional it might/possibly/maybe lead to or have any influence on.
For the first time it is just me, being who I am, expressing it, and occupying myself with that which interests me, while sharing some of things I have learned throughout the years with others. Nothing more, and nothing less.
Part of which is a series of #schemaorg #nugget posts I have started sharing on Twitter (X?), Linkedin and Mastodon. A series which so far only contains 4 posts, though that is because I have taken a summer break (new priorities).
I intend to continue this series in September 2023, covering as many of the things I have run into, throughout the years, as I can.
Although my experiences are finite, and not all of it can be covered in just a short nugget. And yes, I intend to start a blog to cover these, but all in good time. I am not going to stress myself over it.
If you (Teodora), or you (the reader) have food for thought about topics you would like me to cover in my social media series, please reach out to me on any of the social media channels I mentioned!
I do not want the series only to reflect things I have experienced. It should be about things people (plural) experience and are trying to resolve.
So please, and by all means, let me know what it is you are/have been struggling with or want to tell the world about but are shy to do yourself.
How do you see the state of search now?
Confusing, bipolar, difficult to predict, and even more schizophrenic than it already was.
Especially the latter as I am a person living in the Netherlands, and as such I have to deal with the fact that Search within Europe, let alone on the other side of the pond (USA), differs so much per country/industry vertical that it has become very hard to tell which state-of-search applies to you, me, or anybody else working internationally.
And given the differences in local states-of-evolution, acting on search related issues can be very different and have different priorities, depending on where somebody or some organization is located and which areas (locations) they serve.
At least they all do share a similar path ahead though: Things consumed and generated by AI, fed by tons of different sorts of ‘smart’ systems, supported by and corrected through knowledge graphs.
The biggest question I am struggling with right now is: “How do people, globally, get to influence/control the facts contained within the growing depths and amounts of knowledge graphs out there, which are distributed globally?”.
Because even though I am a big fan of the technologies involved, I am growingly feeling uncomfortable with the fact that some of the biggest corporations and governments in the world are the gatekeepers to and of some, very important and influential, graphs out there.
Graphs people only get to see the minutest parts of and which we generally have no (editorial) access to. Heck, we do not even know where we exist anymore nowadays.
There is a rapidly growing shortage of insight in and control over any and all data being collected and used globally, even with the arrival of regulations like GDPR.
It is not just about search results and social media anymore…
Semantic Web, Knowledge Graphs and SEO
When did you first meet the Semantic Web as a concept?
That would be around 2008-ish? I think… It was just around the same time people like, Stephane Corlosquet, a person who (I would like to read about in an interview, by you) started publishing about machine-readable LOD markup through Drupal (an initiative which he was a big driving force behind).
Tell me about your first immediate experience with RDFa and website building?
When I discovered its concept for the first time, it felt like the gates to geek-heaven had opened up for me…
Only to come crashing down like the dark angel into hell once I dared to try to implement RDFa, plus things called ‘ontologies/vocabularies’ (schema.org did not exist yet).
Back then, finding any resources about either the syntax or any ontologies one could/should use, why to use them, how to use them, and which constraints one should apply while doing so, simply were nowhere to be found outside academic resources.
Resources which were ‘accessible’ on ‘servers’, under or on top of the desks, of professors at universities I had never heard of; sometimes only available during university opening hours.
And the only alternative for these were either W3 technical guidelines (which in those days should have been declared a crime to read) or – almost just as terrible – W3 mailing lists…
Lists that nearly exclusively consisted of those who understood things at an academic level.
People who were publicly fighting amongst themselves like teenagers, only using fancy words (I think this is called: academic discussions. But hey, what would I know, I have no PhD).
The harsh truth of the matter is that if it had not been for my discovery of the initiative which would result in schema.org, I think I would have given up on the whole idea behind the semantic web and its application for web content and beyond.
Back then it was just too complex, too ambiguous and too hard to tell how and why it should be used by the rest of us (outside the academic world).
Let alone finding justification for the amount of efforts and resources the implementation of semantic web methodologies required.
Though when schema.org arrived at the scene, we became friends again.
And we even became best of friends after the arrival of JSON-LD and the consequential decoupling of content and its semantic annotations.
May it “Live long and prosper”!
Why should SEO specialists care about LOD?
Wow, that is a hard one to answer without dedicating an entire article (possibly even a paper to get an official degree for myself).
The absolute shortest answer I can come up with is that if you want to learn about why LOD matters, you should probably start by reading about (writing for) web accessibility and grammar.
Once you start feeling comfortable with those, it becomes a whole lot easier to recognize how Semantic triples fit into the expression of things through speech, written words and links (and do not get me started on how this can help add to/express content of a more ‘artistic’ nature).
And when you start to see the connection between semantics and relations expressed via natural language (implicit) and machine-readable language (explicit) – in just about any sentence – you will no longer need my words.
You will come to realize how both influence each other and how the relation and interaction between them can be of enormous value for both people and machines alike.
There is an easy business case to be made of you becoming able to connect the dots yourself. Because once you do you will see so many opportunities to improve things, you will be retired before you finish.
Once you called LLMs and GenerativeAI “the ultimate End-boss” use case for KGs? Can you elaborate on that?
This has mostly to do with the current state of affairs of things, most notable being the hallucinations of generative AI half the web seems to be (rightfully) falling over.
So far publicly available AIs produce content that is ignorant and sometimes even blatantly false and misleading, while also failing to cite resources which ‘inspired’ them.
In a very short time machines have just become really good at grammar, language, and at producing different forms of media.
I am somewhat scared though (call me boomer) of the possible negative impact AI generated text, images and videos might have on societies. This is not the fault of machines though, but people.
Content which has been created through machines that currently lack knowledge (graphs) and ‘understanding’.
Which by no means implies I feel that KGs are all that we need because it is nearly impossible for anybody without, what feels like, 2 related PhDs to get any ‘knowledge’ out of them.
And with that statement I am assuming somebody actually has access to a KG which contains what it is they need for their purpose(s). Unless I am mistaken, none of my private life friends or family members have any KGs laying around for me (or them) to use.
Any guesses as to how many people in this world, without a PhD, have their own KG at home and use it themselves (with them being aware of it)?
It is therefore that I consider the integration of AI and KG technologies (and the systems feeding these) to be the End-boss use-case for all of them. The interfaces for interacting with, and getting output from AI = the people friendly interface the LOD/Semantic web was lacking.
And vice versa, KGs offer the information AIs need to prevent them from hallucinating, while also facilitating them in being able to cite resources.
I have not figured out yet which machine is supposed to handle the ‘understanding’ part of it all though?
But yeah, to me it seems to be a match made in geek-heaven; One which will hopefully serve all people. And not just certain groups of.
Where do conceptual models start? How do we build them with a view to website architecture and good UX?
Again a question which is very hard to answer simply because of the fact that there are so many angles and entry-points to take into account, all leading to variants of such an answer. Sometimes even being completely different and conflicting variants. Meaning it is not possible to give any ultimate and conclusive answer.
It is the technology stack underneath a website (but also apps, or full blown software applications) and any additional plugins these use, that mostly facilitates/limits what you are able to realize.
A reality, more often than not, one simply gets confronted with and is not able to change. It often is a matter of: It is just the way it is…
I have had plenty of run-ins with developers, throughout the years, who perfectly understood the results I was looking for, yet who had no other option than to inform me that what I wanted simply was not possible due to platform constraints, and therefore data constraints.
Constraints which make it impossible to create, edit, store and retrieve information in a certain structured manner; which you need to be able to generate entities and express their descriptive values and relations with other entities.
So one should always start by studying what the technology stack available to you will allow you to do. Prior to coming up with any very specific entity, content, application or architecture models.
Next to that it is always good to create some abstracts of the things you hope to be able to express – upfront – so these can act as a guide during your research into the technology stack(s) available to you. So that you at least have some idea of the things you need to investigate.
Now since this, most of the time, covers the markup generated at a web page template level, this can lead to needing to make changes to the presentational layer people are served. As a rule of thumb it is therefore always a good idea to talk to a Conversion Rate Optimization specialist – prior to – introducing new information on a page and start shifting information around.
So – to prevent – you from making changes which might have a negative impact on business conversion targets. Although CROs often – also – want things which are hard to substantiate prior to the act, yet useful nonetheless…
Use the latter as ground for discussions during a business meeting and you are done for the day. So you might want to try to create a business case together with a CRO.
Plus, in my experience, there are often a lot of – insights – to be gained by hearing out CROs on their desires as they often want – more and/or better information – as opposed to less. Which can be the reason for a – happy data accident – as in, there is a chance this will lead to adding even more than you were hoping for.
And lastly, entity and content models for the web content itself…
Which also has no straightforward answer as this greatly depends on the type of content you have to deal with and its intent.
Creating a content model for a product detail page (and the products it contains) encompasses totally different needs than when you are writing markup for a webinar page (and the video/stream it contains).
Though the one thing all different types of web content have in common is that the first step in creating entity/content models is to figure out the intent of the content by documenting the needs of its intended target audience first.
Once you figured that out, the rest is easy peasy.
What’s wrong with the CMSs these days 🙂
“Wrong”? Well, nothing much really. Although I do understand what you mean to ask though…
What is happening here is that you are looking at current CMS platforms while being biased. A bias which is based on a primary building block of the semantic web: semantic triples.
Metaphorically speaking it is like you have traveled back to the winter of 1984, walked into a coffee shop, ordered a warm chocolate milk with whipped cream and/or marshmallows, and asked the shop’s employee for their wifi password.
The foundations of the, currently most used, CMS platforms were created in an age in which semantic triples were not much of a thing yet. Let alone being a thing taken into account during the first moments of imagination and creation of those platforms.
And given that semantic methodologies require different technologies and processes, adapting platforms to accommodate triples simply is not possible for 99.9% of them. It would require rebuilding those platforms from the ground up.
Which in return would upset roughly 60% of the internet (I’m guesstimating here). There is hope on the horizon though…
I recently participated in a workshop by Philippe Höij of DFRNT 🧨, which he and I continued in another session, directly after the workshop had finished, because I was utterly mind blown 🤯 with all I was hearing. And why I am bouncing around 🤸♂️ like a kid on his birthday 🎉, opening gifts 🎁, in the second video.
Reason being that he is building something on top of TerminusDB (open-source), which could rock our world!
Now this is only the first initiative of its kind that I have come across, though I can not imagine it is the only such initiative out there. So to anybody reading this I have to ask:
If you know of, or encounter anything similar, please ping me on social media?
Semantic Modeling for the Fun of it
What’s the weirdest thing you have had to model?
Well…
The ‘weirdest’ would definitely have to be some of the products I had to model for my last employer.
For whom I developed an entire business/manufacturer/wholesale/retail/ecommerce/marketplaces/affiliate/advertising vocabulary and their taxonomies…
For [insert drum-roll] ‘erotic lifestyle’ products.
Now I will not describe the actual products I had to model here as this interview would probably get an ‘adult’ label and disappear behind Google’s safe-search filters, but enough should be said by stating that, at a certain moment, even my colleagues were afraid to glance over my screens for more than half a second. It even got me a nickname I can not repeat here.
Times which were ‘different’ to say the least; Times which covered the full spectrum of the rainbow inspired flags and then some…
Just data, text, image, video and audio files and physical products though. No intimacies, nothing rancid nor disrespectful, for the bigger part empowered by women, highly professional and utterly customer service focused.
It just included a whole lot more laughter and goofing around than normal for 99% of the offices out there, as it is extremely difficult to stay 100% serious when working in an adult-industry environment.
There were simply too many stimuli laying around at every step we took not to burst out into tears of laughter with each other, at least 3 times per day.
With your over-the-top interest in Things, what is the most challenging concept you have tried to make sense of?
‘…coerce the coalescing of entities…’, and everything directly involved in it. Literally!!!
I am a non-native English speaker, with an interesting case of dyslexia. And that specific fragment of a sentence has felt like Latin to me, for what feels like infinity.
Translating and understanding the individual words simply does not equal understanding the bigger concept underlying it, and urbandictionary.com was not of any help either.
Luckily many people involved in the creation of semantic web ‘stuff’ were very patient and considerate with me during their and my efforts in getting it through my thick skull.
And I think that because of the combination of these things, it took me only close to a decade before the concepts behind that expression, and its translation to Dutch, finally made sense to me (or at least, I dare to hope it does).
A sense which – until the moment that bomb went off in my brain – was solely based on literal technical W3 documents, and the lack thereof by the search engines.
Though when I finally grasped the full concept, it was like discovering there is a universe beyond the milky-way.
A discovery which led me way beyond just a “5 year mission to boldly go…” and dive into – what feels like – the Quantum universe.
A universe which has proven to be so tempting for me that I decided to make a comeback (hopefully a version from a parallel universe; one who will not push things too far and go over the edge).
Who’s gonna interview the interviewer?
I have this section where we change roles with people I talk to 🙂 And I am supposed to answer your questions. I would be grateful if you could spare several minutes to ask me a question or two. 🙂
Jarno: Hehehe, be careful with what you request…
You should not have asked for a property-value pair though. So do not mind me, but I have made it a triple questionnaire:
- In your words, though such that my ‘dyslexian’ brain can make sense of it, could you describe what a philologist is, does and dreams of?
- How does the study, which resulted in that title, demonstrate itself in what it is you have done in the past, do these days, as well as into the future?
- In which directions would you like to evolve, and in which branches of those directions do you see the Semantic web contribute to what it is you dream of and how so?
Teodora: Lovely questions, rivaling my attempts to sneak peek in your head.
Jarno: In your words, though such that my ‘dyslexian’ brain can make sense of it, could you describe what a philologist is, does and dreams of?
Teodora: A philologist swims into the immense ocean of written and spoken language. In my case, I studied Classical philology, meaning I studied Latin and Ancient Greek and the Roman and Greek cultures inevitably related to language. As my professor in Ancient Greek Culture, prof. Bogdan Bogndanov, taught us:
“Talking about texts can and should go hand in hand with talking about contexts and discourses, since each text connects and relates to another, with which it forms a complementary whole.”
What does a philologist do? I don’t know in general, I can tell you what I do. I travel worlds, entering them through portals, called words. For example, the quote above I have to translate. I had to build a bridge between a world full of certain context into another one full of different context. And that bridge – what is it built of? This is what I do – I wonder. I also have some answers. Do you know where the word translate comes from? It comes from “trasnfero”, it is the past participle, and transfer to means bring from one place to another. How cool it is to bring things and concepts from one place into another, from one paradigm into another, from one context into another, thinks me the philologist.
What do I dream of? A world where we understand the archetypal need for connection connection, exchange and understanding, and are not divided by the vocabularies that make these forces look different.
Jarno: How does the study, which resulted in that title, demonstrate itself in what it is you have done in the past, do these days, as well as into the future?
After I studied Classics, I studied Creative Writing. Both led me to text. I love text. And this is what I do these days. Now at 42, I can say that without a framing that I would otherwise use, as to sound anything. So the answer is: I love text. This is what I do. And I express this love for text also as a Love for the Web. For me, personally, the Web is a textual pond of a peculiar type in which I see my words making ripples across systems and minds.
The future? Love knows no past and future. It is a constant flow of being and dancing.
Jarno: In which directions would you like to evolve, and in which branches of those directions do you see the Semantic web contribute to what it is you dream of and how so?
Teodora: I would like to evolve in my understanding of the business processes language, and in Jira narratives. Also in the practice of understanding the limitations peoplework with and the hardship they see when having to think about the future, the important vs. the urgent work. This sounds strange and too non-visionary to me too, but this is what I need if I want to fight for semantic web standards applied to content creation and publishing.
Why do I want these standards? Because I want to be able to express my thoughts in triples, so that future systems might read them. In other words, I ant to codify knowledge, to set it in stone… rather in standard code. Why would I want to do that for marketing? Because marketing is now about knowledge management. I know that in my bones.
So, I want to work towards knowledge graph based marketing communications on the Web, and in general for a cleaner, smarter and ethical Cyberia, where we evolve together, and like in relationship marketing, we are farmers, not hunters of data or knowledge.
The Semantic Web is just that. A fertile soil for the Web of People.
Instead of an Epilogue:
What’s your advice for people who just want to add some markup and do things right, that is publish content with clean metadata and mind their own business, can this be achieved?
Chances are most will not need much advice any longer, though it does depend on the type of content and platform it is published on.
For the major ecommerce and publishing platforms goes that free (but upgradable) plugins, like those of Yoast, Rankmath, Schema App and Wordlift will already, automagically, do much of the heavy lifting for you. They can help you publish most of the basics, as long as you make use of a CMS platform the way it is intended to be used.
If the basics are enough for what you want/need, nowadays you should be covered through the functionality of plugin providers like the ones I mentioned.
And for those that need that ‘extra’ there are options via saas/service providers like Schema App, Wordlift and Inlinks. Although each has its differences in regards to what it is they do and how they go about it.
Meaning before contemplating such a step you should first figure out what it is you want and why that is before you go vendor shopping.
Now of course, maybe you do not want to be locked into any tool, saas or service provider, in which case you always have the option to hire a specialized consultant who can create templates for you. Templates which then have to be implemented on your site by somebody like a programmer.
Which means no monthly repeating bills, though at the start it requires a more substantial sum of money compared to when you start where I started my answer, plugins…
With that our Dialogue is over. But the fight for a better Web experience and accesibility is not. Follow Jarno van Driel on Twitter: @jarnovandriel and on LinkedIn for more (there’s more! :)) tips for Schema markup and other structured data insights. Also make sure your keep an eye on the Invisible Graph.