-
I normally record myself too, but you guys are going to record it, right?
-
Yeah. I just started recording.
-
Excellent, excellent. This is fine. It’s really good. The trip has been good so far. I’ve been in Taiwan now for nearly two weeks, actually over Chinese New Year. My Chinese skills are gradually developing. I can now order some food and things.
-
(laughter)
-
That’s awesome.
-
I hope in a year’s time when I come back, I’m practicing quite heavily, that I would maybe be up to giving a lecture in Chinese. That would be a bit ambitious. [laughs] I went to Tianjin yesterday and gave a talk to the city government there.
-
I think it was great. I know the audience was already selected on the basis they could understand English, but if I could improve my Chinese, it would make a big difference.
-
Anything I can help you with?
-
I guess what we could start with is I’d love to hear a little bit from you, really, about what’s on your mind lately.
-
What are issues or what are you really interested in at the moment or are you seeing as challenges that are coming up? What I’m interested in, my overlap, that’s what it is.
-
Challenges.
-
And opportunities.
-
Hopes and fears. [laughs]
-
Hopes and fears, yeah.
-
I’ve been here for four months or so. I’m the first so‑called anarchist minister, which is kind of an oxymoron. But really, what we mean is that I bring what we do in the open‑source communities ‑‑ which is a volunteer in the cat‑herding organization, kind of organization style ‑‑ into the administration.
-
There’s my anarchist principles: I don’t take commands. I don’t give commands. I only work with volunteers, so all my staff here are volunteers. Which means that aside from two executive secretaries, everybody else of the 15‑people or 16‑people team were originally working with different ministries.
-
But because they’re interested in this work, so they sign up for what we call the PDIS, which is roughly modeled after the Policy Lab or GDS, whatever.
-
Because the PDIS is not a officially ordained unit yet, we’re not at a stage as in Singapore where they have hundreds of people working at the Singapore GDS. Some PDIS members are career public servants in different ministries, who tell their ministers or deputy ministers that they want to ‑‑ instead of working on for example tax or other duties ‑‑ they want to work on open government.
-
Then, I sign a letter of intent, saying that this person now works in the office.
-
Yes, I understand.
-
Exactly. For as long as I’m the Digital Minister, I have this temporary staff. This is very interesting, because in Agile software development, we have this concept of a on‑site customer.
-
If we are doing e‑government planning, what’s better than having those on‑site customers evaluating on behalf of their colleagues, that the system would actually work, or these open data infrastructures actually reduce their workload instead of increasing their workload, and so on?
-
We do all our system designs in this co‑design kind of way, so that we know for sure that their original colleagues will be up to this kind of design.
-
That will be of value.
-
So we’ve been experimenting with the 15 people team. Two, three designers now, about nine coders, and legal experts, and so on. It’s been good. Then, we were scaling this up as of this week, to this wider network, to what we call our "participation officers". It’s similar to "engagement officers" in other jurisdictions.
-
That means, in addition to volunteers, every single ministry in the national government need to assign one to three people to work as officers to drive engagement with non‑specific people.
-
Engagement mostly starts online of course, because it’s easier to engage non‑specific people online. But, it also includes offline modes, like public hearings and other notes of consultation, and eventually collaboration.
-
For the largest ministries ‑‑ Ministry of Interior, Ministry of Health and Welfare, and so on ‑‑ they already have these kinds of systems in place, where they have facilitators and people who use data visualization, and other engagement tools.
-
For the more inward‑facing ministries, they don’t really have kind of this training, or even recognition for this kind of engagement. What we’re doing is building a learning circle, where we meet every week to discuss one topic pertaining to one particular ministry’s participation officer’s duties, then we get other participation officers to help them to get a user journey going, or a co‑design process going, so that they know for sure what to tell their ministries to implement it.
-
The first topic, which was this Friday, was around "internal join platform". We already have an external join.g0v.tw platform. Have you seen this website?
-
I haven’t, no.
-
Let’s give it a look. We also do a lot of telerobotics and virtual reality, but it’s not the current subject. The Join platform has four different sections. The first one is the "Petition" section, which is like "We the People" and it has a threshold of 5,000 countersignatures, and there’s a response process. You probably know the drill.
-
The "Talk" section is similar to regulations.gov in the US, where all the regulations, before its announcement, there has to be at least 60 days of public commentary period. Instead of posting just, say, a notice with an email address or telephone number, we now post every single one of it automatically to this public forum, where we engage in a discourse like in chat room discussions.
-
For example, the NDC is planning to have a Foreign Talent Act to encourage more foreign contributors. We post in both languages, the consultations, and then we get a lot of feedback, both in English and Chinese, as of how this should be implemented.
-
We can see the Ministry of Interior here, and the National Development Council, and so on. Each ministry has those participation officers, who then faciliates reflection on people’s input, and draw a consensus, and so on. This the consultative section.
-
In addition to petitions and consultations for every regulation, we also have a "Supervision" part. This part has not seen anything quite at the scale of the previous two sections, but we are going to post 1,000s of government projects here soon. Oh, the first one is the open data plan. How quaint.
-
In any case, every project here is allocated a multi‑year, or single‑year fund, which gets a visualization here. For example, this plan is about preservation of historical sites.
-
I randomly click on it, and then you can see very easily that it’s a four‑year project, and that we are on the year one of four. It’s expected to be completed somewhere over here.
-
Then it shows two charts of both the KPIs, the Key Performance Indexes, how it’s accomplished, and the budgets that it has spent, and then whether it’s updated quarterly or monthly. Then the details, like all these dates corresponds to the procurements that they did to accomplish this project. This then can connect to the open procurement part.
-
Anybody can then share their ideas, or discussion, or their five‑star rating. I’m not quite sure what the five‑star ratings are for. Anyway, this section at least assigns a single URL for each of those projects.
-
Then there’s umbrella projects, like the Asia Silicon Valley as well as other major presidential promises. On the presidential promises, you can then see its budget drilldown. Have you seen Budget.Taipei before?
-
No, you can show me it, as well.
-
Budget.Taipei is this whole new thing.
-
Yes, we’ve seen it.
-
This is the classical open spending visualization. Then you can drill down to each of those details. Then this corresponds to the original projects, desciring how the plan came from, and then it goes back to a discussion forum.
-
The main innovation here is to have each budget item be a discussion area, and then have the public servant carrying this to go here. They respond directly, instead of through an intermediary, which may have any kind of distortions.
-
That’s fantastic.
-
That’s the main platform.
-
A couple questions. One is are you guys building the platform? Who’s building this Join platform?
-
The National Development Council is in charge of doing this. The majority of the budget is actually not spending on programming, which is off the shelf nowadays, but is on training.
-
It’s on getting the public servants to see this as a important engagement tool, but, also, now they must essentially become their own investigative journalists. They need to respond in a way that people hold them to the same standard as media, which is not a very high standard, anyway. [laughs]
-
What we’re saying is that they have to provide context, the narrative, the lead, and then everything, in order to get people the whole context, because people see one random sentence and then... So we give them journalism training as part of this plan.
-
One of the questions, I suppose, then, mostly when I’ve seen other countries, one of the big challenges of doing what you just showed me with the project stuff is having the data linked up that way. Is that already there in Taiwan’s government budget? It already has that level of linkage? That data’s already well‑structured?
-
That’s right.
-
Because, for example, in the UK that doesn’t really exist.
-
That’s right. We took a lot of time on doing this. This is called the GPMNet, which is used to monitor government projects.
-
The fortunate part is that the National Development Council is a merge of two previous councils. One is for the audit projects, the Office of Research and Development. The other one is the Council for Economic Planning And Development, including presidential promises.
-
Because they’re now merged together, they do have data for both sides of the things, on the planning stage and the execution stage. These are already structured data, and it’s already managed in a way that’s entirely paperless, so it’s very easy to say, "Since everything unrelated to private data, this is open by default in Taiwan."
-
What we’re doing is, essentially, taking these from those relational databases and saying, "Now they must be CSVs on the data.gov.tw platform." Once this goes here, then the Join platform consumes, every day, those open data, and then turns this into thousands of topic for public discussion.
-
Got you. It makes a lot of sense. That’s super impressive in the sense of how it is joined up. It’s great to see, over the years, how things are getting really polished and standardized.
-
One of the questions would be about, then, in terms of your data. One of the things I see with a lot of open data projects ‑‑ I’ve been involved in many, obviously ‑ is data quality.
-
One of the projects I’ve been involved in, I don’t know if you know, it’s the Frictionless Data. You know about that, then, the Frictionless Data stuff?
-
Mm‑hmm.
-
You might want to bring it up, if you don’t mind. One of the things they are about is the data package model. I don’t know whether you guys are using that at all.
-
You mentioned using CSVs. That’s, obviously, one of the things that we’ve put a lot of time into thinking about. One is that CSV’s an incredibly easy to consume format that most people use.
-
CSV, the decked one, would be least exciting. If you go to the Specifications at the top and pick, for example, Data Package Specifications, that’s the one you probably want to look at.
-
I’ve seen this one.
-
You know this one, but things like that. I also wonder how are you finding it, getting people to produce high quality CSV data and to produce good metadata, be it data package, or otherwise? You don’t find that a problem?
-
It used to be, in the previous administration, the previous Vice Premier Simon Chang issued this open data mandate and everything. After I become a Digital Minister, because I was understudy to the previous ministers, I was aware that there were a lot of problem about chasing the sheer quantity of data sets, because then a lot of it are DocX files or ODTs.
-
ODTs are technically XMLs, so they are two‑stars or even three‑star open data [laughs], published on the open data platform.
-
As a natural language processing geek, these are actually useful. I wouldn’t say it’s not data, but it’s not in a data format that’s very easy to consume, because from one meeting to another, the markup is different.
-
I’m a programmer, myself, so I understand all those issues.
-
Exactly. One of the things that I did is to standardize on the Akoma Ntoso format for the transcripts.
-
I mean more on the data, rather than textual stuff, so more CSVs. I absolutely...
-
Numbers?
-
Yes. I know data can mean anything that was bits. I mean less textual, like meeting notes, and more things like spreadsheets, essentially, and tabular data.
-
One of the things that we’ve been doing here is...
-
Have you thought of using the Data Package Specification at all, yet, for doing that?
-
No, but we’re using JSON Schema. [laughs] After I become the Digital Minister for numeric data, what we’ve been doing here is to standardize GraphQL, OData, and the OpenAPI standard from the OAI. It’s a different route to get to frictionless data.
-
One route is saying you have things in relational databases, usually SQL databases ‑‑ and then we provide those automated tools, so that they are published automatically as a package, and so on.
-
Then, from that, into API? Then, from that, into services?
-
Right, because services are just read and write interfaces to this data store.
-
Yes, but it’s a fundamental tenet of open data thought to not go API-first, because once you tend to go the API-first route, you tend to lose bulk access. You tend to have much more complex service architecture.
-
You tend to start running together service obligations by running APIs and running together data provision obligations by writing APIs...
-
That’s exactly right.
-
...and run together data provision obligations, which are distinct things.
-
I can argue both ways.
-
They’re both valuable.
-
Switching to the API side, what we’ve been doing is that, because open data is "format, license, and access", these three are actually orthogonal. You can have the best structured data, but under a non‑free license.
-
Or, you can have a free license, fitting the open definition, but with terrible structure. None of these would be called "open data," because it has to satisfy those three criteria.
-
Strictly, the definition of open data is that it’s under a license that is open and it is machine readable.
-
I thought it must also provide access to people. (Seciton 1.2 of Open Definition)
-
No, that’s not a requirement. First of all, that would be incredibly difficult. Strictly, the machine readable definition can always be debated a bit. For example, you gave a really great example before. Is DocX a machine‑readable format, for example, for tabular data?
-
I would say no, it isn’t. It’s extremely difficult for machines, on a regular basis, to consume DocX‑encoded, like a Word document version of a table, because it will change all these semantics. The machine readable has some flexibility in it, but in general, the requirement that it be accessible to non‑technical people is not a requirement.
-
No, I mean accessible, to non‑specific people.
-
What do you mean by non‑specific?
-
Like you can click to download it, instead of filing a form and waiting for 17 days.
-
Oh, non‑specific, so there can’t be requirements like that, formalities involved. Yet, it should be automatedly accessible.
-
"Accessible," that’s the word.
-
Absolutely. You’re right. So open data, in the definition, says you must be free to access it. You must have an open license. You actually must be able to access it. You must be able to then be able to do whatever you like when you’ve got the data. The other aspect is the data must be machine readable.
-
Right, so: License, Format, Access.
-
Yes, that’s great.
-
The approach that I was explaining was that if we use an API‑based, API‑first approach, my current thinking ‑‑ which may change ‑‑ is I tell those people in the government that you can first forget about the access part. Then you start working only on format, which is government‑to‑government only, because then they’re interested in getting data that the other ministries’ people are interested in.
-
That’s a very wise approach.
-
Then they care about the data quality, because then it’s part of the daily work pipeline. If this breaks, their job is at stake.
-
I’ve always said that, "Focus on the government as a consumer." The external users, they are important, but they’re much less real to an ordinary civil servant than their colleague next door, absolutely.
-
That’s right. Then the question they will invariably ask is, "So, this OpenAPI format, does this mean that it’s world readable? Because our servers are not ready to be world readable?" I always tell them that, "No, you don’t have to. You can just spread to the two ministries around you."
-
It’s still an API, because it’s conforming to the OAI format. But it’s not open data, because it’s then not open accessible.
-
I understand. I just think the term "OpenAPI"...
-
It’s a dilution.
-
It’s a massive dilution, it just confuses people. It has almost no meaning, an open API.
-
It’s the name of the standard now. [laughs]
-
Now they’ve made Swagger into OpenAPI. People have used open APIs for years. I don’t quite know what an open API stack is, and I know it’s a very nice term, but it is a massive dilution in this area.
-
Exactly, of the term "open."
-
It has almost no meaning in here.
-
It means it’s machine readable and human readable.
-
But APIs were machine readable.
-
This includes API documentation, the schema metadata.
-
I know that. I know what Swagger is because I use it. I’ve got it, but I’m just saying. I’ve got it, and there’s GraphQL coming along, which is completely different.
-
Exactly.
-
The question, I’m just saying that the "open" part of it is a massive dilution.
-
It is. It is.
-
It’s misleading people, so I would avoid using it, in general, if I could.
-
What else would you say for things like GraphQL and Swagger?
-
I would just say they’re an API definition language for people. I wouldn’t call it OpenAPI. First of all, it’s even worse here, because open APIs were traditionally used ‑‑ I still think badly ‑‑ to simply mean, "Our API is documented."
-
For a decade or more, 15 years, since Web APIs existed, people talk about, "Google has an open API."
-
But that’s the old days.
-
They’re not just the old days. For the last five, 10 years, people said that. I would say that OpenAPI here, as an initiative, has been borrowing this term. What they’re really saying is, "We’re going to take Swagger..." Basically it’s an API definition language, and it’s got nothing to do with an open API, per se, in the traditional sense.
-
Also, open APIs, even originally, had no useful meaning. You could have an open API, which you charge for usage. The meaning originally arose because back in the old days, you had APIs ‑‑ or example SDKs or something ‑‑ where you might not even get the documentation without paying money. So the idea was that open APIs were APIs that were documented...
-
And in a uniform way. That’s in the definition.
-
And it’s free to get that.
-
That’s right.
-
But it wasn’t free to use. For me, that usage of "open" is massively misleading and diluting for people, so I just don’t use it. Words matter. [laughs]
-
I think we’re on the same page. When we say "OpenAPI," we meant an API format that is open. That’s the definition part.
-
Let’s not say "the OpenAPI format." Let’ just say "the OpenAPI definition."
-
Then this says nothing about the license nor the access.
-
Yeah, we have an API and it’s got a definition.
-
Right, it’s format‑only.
-
It’s not even quite format. This is why I’d be cautious. APIs as a simply data definition language... Swagger has got advantages. I’ve looked at a lot when we looked at data packages and other stuff.
-
First of all, it’s generally JSON‑oriented, which may not be the only structure you have with your data. In addition, I’m just saying, having built...
-
I’m sure you built a lot of data platforms. APIs are great, but if you like one representation of your own, you might ship the data as bulk data to people. In many cases, for example, on spending data, APIs are completely insufficient.
-
But Google Checkout is also an API, which is a bulk download API.
-
But then, what do we mean by API? Google Checkout, as I understand as an API, is an API to make requests, to then be given a response.
-
And the response is a zip file.
-
Right. But my point is to simply say we could use API as a data definition language. The request protocol is like "GET, POST, PUT" along with the actual data structure protocol. That data structure is very useful.
-
But we’re getting into a lot of definition...
-
I think we have very few opportunities to meet each other, and I think you both are great minds and also with technical understanding of how these work. But I think one thing that is most invaluable is share a vision. How you get to that vision, everybody will find their own path. Both Rufus and you have, at heart, something very core and similar.
-
For me, this meeting is more about sharing vision, understanding each other’s vision, to then see, how can both, together, support or contribute to that vision.
-
Right, but the frame dilution, I actually care about that a lot. I really want to get this figured out.
-
(laughter)
-
So if I may.
-
Yeah.
-
Behind Swagger and GraphQL, as well as OData, the most important thing is the schema, because they refer to a schema as their core data model. On top of JSON Schema, they have these metadata fields, where they associate, with each schema fields, a human‑readable, one‑line description to each column.
-
My strategy so far was, just by pushing government‑to‑government API bridging, to get them then to build a JSON schema. If you had a JSON schema, the description languages that not only the engineers, but people across the other ministry can understand. Then, once they switch...
-
We already have a national mandate that says, if it’s FOIA and it’s in a data form, then it must be under this one, single license, which is what I just submitted to the open definition forum, which is the Taiwan open government data license, OGDL. It is compatible with Creative Commons attribution license through a switch-over clause ‑‑ we get this for free.
-
Then the only other thing to fix is the general access, which then we have data.g0v.tw, which means that ministry‑to‑ministry data, they just need to check a checkbox. Then they will go there.
-
It’s already got a well‑known URL here. It’s just intranet only. Once they check that checkbox, the NDC people will then take care of putting this on data.g0g.vtw, in which case it becomes automagically open data.
-
Each unit can just say, "we crawl this weekly," or daily or whatever. Then the "GET" part of the API automagically become open data, and in a universal structural way, because they already have this JSON schema documented. Then we can wrap this up in a data package presentation form for outsiders to consume.
-
This has been my road map, but I do agree that calling this open format or open definition language "OpenAPI" is a dilution of the term "open." That I do agree.
-
Good. You really know your stuff. Having seen a lot of government stuff, I’m really impressed at how much you’re managing to roll through here.
-
Normally, there’s a lot of roadblocks. Six years ago, seven years ago now, I was involved just trying to get the UK government to publish consistent spend data. It went nowhere. [laughs] Six years on, the data is really poor quality and not consistent.
-
I’m really impressed. It’s great. This part that you’re doing here is crucial. The quality of the data is one of the main themes missing in many open data initiatives right now, many data initiatives in government. Whether it’s JSON schema or...
-
...or data packages. [laughs]
-
JSON schema is a great choice. To have any structure that’s consistent, validatable, and has some even human‑readable description as a bonus, is just great. The one thing I’ll add and I’m sure you know is that validating that is...
-
Right. As of next month, this will be our procurement requirement, so that any mission‑facing part of the new procurement must now declare their JSON schema. As part of the procurement validation process, this now will be machine validated. We’ll run a schema validator, an API validator.
-
Unless it becomes green, they don’t get this part in their score card. If multiple vendors bid for government procurement, those with the validatable schema wins, basically.
-
I got that. First of all, this is great. I think the automatable...there’s some very minor questions there, but I agree on the open one.
-
I have used the term "open format," in the sense that it has an open format and that it’s free to use, but I would reserve "open" for the whole bundle, just say, "This is open." I would say this is like a well‑defined API or a defined API.
-
In that case, instead of "open license," what would you say?
-
The license I would say an open license, but it’s a big thing. The thing is that open API has been so abused for so long. It’s been so used...,
-
Say it was the open format. Would that make it better? Open license? Open format? Open access?
-
Then taken together...
-
You know what I mean? Since the license is so huge. The license is such a big right. For example, why I emphasize this is, while it’s not part and parcel, it’s not easy, if you have bulk access, and remember the fundamental requirements of open data are an open license, the right to use, reuse, and redistribute, bulk access, and machine readable access.
-
If you have those things, ultimately you can produce whatever format you want. It’s true that, in practice, if the format isn’t... My point is, that this thing generally, with it actually having access, allows you to generate whatever formats you want.
-
That’s right.
-
Let me say why. Whether we get technical, I will say overall, open data is all those things. You can’t say open format. The thing is, nowadays, it’s not true.
-
In a way, it’s like, "Yeah, you could definitely say open format." You can say open format.
-
Open access has the academic meaning, but its meaning actually overlaps a lot, which was what we’re saying here.
-
You could say the freedoms. If you wanted to be Stallman-like, it would be the data freedoms.
-
[laughs]
-
You could also talk about freedoms act as freedoms, but yeah, exactly. I just say on the API stuff, it’s just open APIs have been so abused, and it needs to... On the point you said at the end about procurement, that is great.
-
One thing that I think is important, that I think at the moment, is that sometimes it’s like Google Takeout. People think a little bit, that if they have access to the data or they’ve got a well‑defined API, it doesn’t matter whether they buy proprietary or not buy it proprietary.
-
I see this argument sometimes. For example, I see it in, there’s a vendor who supplies data platforms called Socrato. Socrato make a big play that, because they competed for years with CKAN. I’ve seen their pitches. They go on and on about how, "Look, we..."
-
They first of all try to pretend they’re open source, but then they’re not. When they sometimes say, "Oh, you know, it doesn’t matter that we’re not open source, because everything is going open API."
-
We’ve got this Socrato open data API. It doesn’t matter that you’ve got...
-
It’s much more writable. It’s much more readable. However, it’s still a black box.
-
What does it matter? The truth of it is, first of all, APIs are quite brittle.
-
If you look at, for example I’ve implemented S3‑based APIs for years, because everyone copied S3 APIs. If you look at the stack for Boto, and you look at Google, it’s a very, very powerful, rich company. They struggle to make it compatible with the Amazon API. You have a dominant position. Boto attracts, and obviously Google trying to have that storage system be compatible with S3 or with so on.
-
It’s a struggle to do that. APIs ultimately, first of all, if you get very rigid about APIs, for example you talked about your scorecard. There’s a good aspect of that, but there’s also a bad aspect which is you can write API at procurement time, that turns out to be wrong on an Agile basis a year later.
-
It’s parts of a project, as I showed, so that each component...
-
They can evolve.
-
What I’m saying is that, the individual sub‑components, as we are now essentially mandating decoupled architecture for procurement. So already, the back‑ends, the middle tier and the front‑end must talk with APIs.
-
If they’re very brittle and they break, the whole service breaks. It’s sensitive to keep this...
-
I got that, but what I’m offering, what I’m just saying is basically there can be a tendency when you also have this API focus, and people talking about open APIs generally that they still have to think that the software stuff, like you were talking about, the black box doesn’t matter.
-
In my experience, it does matter, a lot. It also goes back, the other reason I’m a bit concerned about APIs, why I get on a bit...
-
No, that’s fine. I do agree.
-
I bet a lot of governments now, the last 6 to 12 months, who are going, "Oh! API-first. We don’t need to do bulk data download." No one wants it. We just want to do APIs first.
-
That’s the way to deliver services. I’ve seen it in Finland recently, and so on.
-
That is the trend.
-
There’s a trend in this direction.
-
It’s a trend.
-
Personally, I am quite concerned about it. First of all, it leads to a point where you don’t need bulk data. No one’s asking for it.
-
The best story I have on this one is always, a little bit, was UK government was releasing data, and they released, for example, financial data. A lot of people used the data, for example, The Guardian used it, and other people...
-
Then, they did a police one. They decided they wanted to release police crime data. They said, "Well, this one? One of the things last time was, we didn’t have enough traffic on our...not enough people came to our website, so we’ll build our own map."
-
We’ll go take the data and build our own app. It was very successful. Millions of people visited. Obviously it completely substituted for other third‑party usage. It’s similar on the API a little bit, obviously government does need its own APIs, but as you go the route of API, you tend to neglect basically the shipping the bulk data, and also you tend to crowd out the use of it, in certain ways.
-
One of my points is, once you’ve got bulk data, you can build APIs. We’re being a little bit...but in terms of the principles, so go back here.
-
What you’re saying is that, if we say import and export, which means bulk, essentially, what’s the term for that? It’s not portable data, because "portable data" are more like My Data, it only means personal.
-
I guess, as a designer, with this architecture...
-
(laughter)
-
One design you could think about in the technical architecture, and sometimes a little bit artificial, but if you’ve got a database here, which is normally your internal read‑write database. It’s going to be Oracle or something in some ministry. One model you can have is like, "Hey, basically we put it into a read only replica."
-
It depends on your model ‑‑ Read API, which is you pull the data into your own local database, so you don’t endanger the SLA on the main one. Then, you’ve got the world out here.
-
The thing I try to emphasize is saying, "Hey, rather than doing that directly, why don’t you...?" This should really be bulk export into this database. That bulk export also goes to that, say, S3. Then, from here, you pull it into your read API.
-
One of the nicer things about that is, you have a nicer sandboxing and partitions. For example, you can then open source stuff. You can say, "We’ve published from," whatever. You talked about the NDC, every week, pulling in the new version... That should go to S3, or whatever your flat file, or whatever your structured data storage is.
-
Then, the part of it that pulls it into to run an API, you’ve got a nice partition line, here, a nice service break line where you can, A, test.
-
You can also give out the software if you’re doing that. If other people want to run APIs, also a point about that, and this is actually the one. So that’s point one. The other point is, what’s the logic of open data?
-
Open data makes sense because it’s cheap to give out data. It’s not cheap to write APIs. That way, you can...
-
Well, if it’s machine-written, is it still expensive?
-
In my experiences, orders of magnitude more expensive. You need sys admins and DevOps people. Having to run a bunch of APIs, they go down...
-
You mean, creating new APIs. OK. But S3 is a very uniform API.
-
S3, my point is that, the cost of raw data read, let’s be clear what we mean by APIs. There’s a knowledge API, which is thin layer format. Otherwise, we’re overusing the word API.
-
Agreed. It’s the same format basically for any...
-
There’s a web API. Could we just not use that term, API? Let’s call that a format.
-
That’s just fine. Which is why I ask, if I can switch to say "open formats."
-
Perfect idea.
-
But my point is that, this system down here, which is an API, running APIs they need a DevOps person, they go down, they need to deal with spikes in traffic. The cost of running web servers for APIs, if you just look at even AWS pricing models, the cost of running S3 will be 10, to 100, to 1,000 times cheaper than running a server to serve an API to a reasonable amount of traffic.
-
That’s right.
-
So, the cost implications. Generally, it’s not necessarily reasonable to say people should have free APIs. That APIs should be costless. Most people, most companies in the world, they charge for API usage at some level.
-
It might be that you get, 10,000 requests a month free, it might be 20,000, but at some point you pay for requests.
-
What you’re saying is that API has his cultural connotation of "Freemium" in it.
-
It should do. That’s not just that it’s accidental. That’s a good idea, because as an economic model it has a cost.
-
It’s called the "API economy." Go ahead.
-
(laughter)
-
APIs, basically the cost per request is not trivial. Whereas, relatively the cost of serving off S3 is again not truly zero, but it’s very, very low relative to the cost of serving each API request.
-
On this I have something to say, but please continue.
-
[laughs]
-
The point I was saying, at some point they might converge, that’s fine.
-
In my experience, when you go to, whether it’s governments or even corporations, running those APIs has a significant cost ‑‑ possibly running to tens or hundreds of thousands of dollars a year ‑‑ versus providing raw bulk data, has a cost in the dollars, or hundreds of dollars, or even cents. That difference is very significant.
-
One of the reasons I was explaining is, the logic of open data makes sense, because it’s reasonable to go to government and say the cost of supply of this data is essentially zero once you’ve produced it.
-
Whereas, that is not true as a cost of supply over an API, not zero.
-
Over individual, granular, APIs.
-
Granular APIs. Just to be clear, I don’t consider S3 an API in that sense. It’s just basically a basic bulk file download tool. Otherwise, we’re just starting to confuse our term API.
-
I know that you could say...
-
But what you store on S3, may also be either individual records, or even the granularity itself...
-
You could split it into the individual records. Even there, I would offer it is better, compared to running an API.
-
First of all by the way we’re getting a little bit technical here, but to explode the query space and the official API into a completely rarefied form, to basically cache onto S3, generally has a huge space explosion complexity.
-
It depends. It varies. I’ve done this with open spending.
-
That’s my favorite style.
-
We have a queue, right? You can reify the whole queue. We’ve done, and you can look at the space constraints...
-
That’s because of the disk cost. It’s now essentially zero, compared to the bandwidth.
-
Yeah. What I’m saying is, even then you can have enough explosion in the size of your cache space, that you don’t do it. That’s why. The point is, in general, is yes you could.
-
Even then, the functionality of API first is that disc. How do I put it? Normally, once you’ve put that, let’s say you did take your cache, your whole structure in API, onto S3.
-
You normally got a problem, because it becomes almost unusable to people, because they need something like an API to find the stuff [laughs] inside your S3. I’ve got you. That’s what we do in open spend.
-
By ID, by user, whatever, right?
-
Yeah, you can cache onto S3. That wasn’t what I meant by bulk download. That leads the way of now designing APIs, which is you could bulk cache off S3. But we’re now talking about just doing it, how we actually implement an API.
-
My point is though, going back, open data makes sense as bulk.
-
Which one in the picture?
-
This one. This, and maybe if I could actually get one more sheet. Can I get one more sheet?
-
If you’re to draw this, you’ve got this database, and I’ve spent most of this conversation having a very detailed discussion about APIs. This is, I drew this for the Finland guys, last autumn, as well.
-
I can send you the drawing.
-
(drawing sounds)
-
This one is like...
-
(drawing sounds)
-
Yeah?
-
(drawing sounds)
-
This is just what I see happening, is people starting to do this. They’re basically architecting their systems to go straight into their read only APIs, via non-clearly-partitioned, open data bulk store. I think that’s an error, because as that happens this system will atrophy.
-
If you do it, with this model, you’ve built a system where you are first of all eating your own dog food. You don’t get access to the original raw database. You have to run off your open data store, so you end up, you force your government IT system to eat its own dog food from open data.
-
You don’t have this risk that the bulk open data, and the data they’re getting live from API, end up getting out of sync. It also means that, if the government ever went away, we’ll simply deprecate that API or stop maintaining it.
-
The data’s still there.
-
The data’s still there, but even better, if this stuff was properly designed, anyone outside government can sit and let you pick up your open source stack, and boot their own API. If you have API limits, for example, you might well do if it gets very popular, it costs a lot to run an API.
-
Someone in your government might simply say to you, "God, we’re spending $1 million a year running these servers!" That’s too much. We’re going to start throttling it or whatever. Other people outside government can go and start taking your API, but they offer it for pay, or for free, or they run it on Sandstorm, which I noticed you have.
-
I knew the Sandstorm guys back in the day, in Berlin. You can go boot this stuff yourself, and so it enforces a nice service architecture, and one that’s reproducible in or outside of government, because you’re always running through your bulk open data export rather than off your original, raw, gov‑only DB.
-
That’s exactly right. I agree with the whole argument.
-
You’re a rare minister, by the way.
-
(laughter)
-
I haven’t met that many people around the world that have this conversation within government at this level.
-
I do completely agree. If I may just recap a little bit. First, we don’t call this scenario open API. We stop calling this open API. We say, this is a government, for example this person may very well be another government. Let’s call this another ministry.
-
In this scenario, which is, they have a well‑defined whatever, API, to connect to one government database to another government service, which then talks to people. This makes it a round‑trip whatever.
-
Then, when we say this, what we’re also saying is that this, because it’s well‑structured ‑‑ it’s got a JSON schema, whatever ‑‑ we encouraged them to run a API proxy here. So that it can do a lot of extraction and transformation...
-
Absolutely. It’s also a great point you just made, here, which is by having the open data in whatever structure format, you can boot up. Sometimes you may even want to have a proxy or sometimes you might want to boot an entire stack that runs a different type of API, because it has different performance requirements, different structure, or whatever.
-
By the second year of this integration running, we expect that the revision here will be relatively solidified. Then, once it does that, we basically take a snapshot saying, "OK, now we know these views are free of privacy issues, FOIA‑compatible."
-
Then the checkbox, what I mentioned before, can be checked. Then, we take this, and then make an open data repository out of the now‑frozen part of this. Then this switches to this scenario, where it’s not depenent on the original data store anymore
-
Even better, this one should not be coming from their API that you take the frozen, if possible.
-
Of course.
-
Whichever way...
-
What we’re saying is that because this is frozen, we can take the DDL of here, and then we say, this, the DDL, it’s now the canonical metadata.
-
Now, it goes here. Because the DDL is an exact mirror of this, the proxy can just reconfigure their upstream sources here.
-
That requires zero line of source code changes.
-
Absolutely, and that is perfect. By the way, you just made me think. I don’t know whether it makes sense, but the idea is that the schema is clean, and maybe you could call this stuff clean API. When it goes public, then this is open data API, or open API.
-
This is clean API. It’s well‑defined. It allows for clean government where different people can...everything’s tidy, but it’s not open.
-
In Taiwan, we have two words. We have 公開, which is the freedom of information access, which is read only, and we say 開放, which means it permits derived work.
-
We have two words for this. So when we say 開放API, it implies an open format with an open license. For a "clean" format we would probably say 結構化. Maybe that’s...
-
...confuse it with open in any way. In English, "clean" is very different from "open", so you probably weren’t saying...
-
By the way, we don’t have the "free software" ambiguity in Chinese. (自由 is "freedom"; 免費 is "gratis".)
-
No, open software.
-
Yeah, open also works for software here. (開源軟體 for "open-source software")
-
That brings me back, actually to one other thing. Maybe, a challenge you’ll have. Which is, it sounds like you’re doing it all really well, you’re doing alright. For example, when you say that NDC’s spending most of their money on training.
-
That’s totally right. The training is going to be more important, and tougher, than the software. The one thing, and I mentioned this just in having thought a lot about the data package.
-
What data package represents is, for me, first of all it’s basically borrowed from elsewhere. Nothing here is technically original, it is borrowed from packaging and software that’s based on the node package model.
-
It was over the last 10 or 12 years, and it’s also an effort to be Zen. Basically, as a specification, it was an attempt to take everything out rather than to add anything in. It’s very different from linked data or Symantec Web which, I am not a huge fan of now, because no developers use it.
-
The thing I just want to mention about it though, that’s come up a lot, which is the challenge. One of the things, the data quality you mentioned, was just how we fill descriptions in...
-
...the DDLs.
-
The DDLs. One of the things that we’ve done a lot is that, maybe in one big push now you get a lot of JSON schema. One of the challenges I see is that a lot of the time, the people who really will know the metadata information will be civil servants who are not technical.
-
I’m not saying it will happen here, but one day sometimes there’s a big push. People will write a format, they’ll do all this work now. But then in three years, as the schema evolved, you’ll get someone who, "God, I don’t even remember how we edit this JSON schema." They work in Excel, whatever they work in.
-
One of the things we’ve thought about a lot in Data Package ‑‑ and isn’t quite resolved but is in progress ‑‑ is for example how to have a version where you can write the metadata in, for example, Excel as part of defining your spreadsheet, and then have it automatically come out into the data package JSON, whatever you have, or even have that be a first‑class version of Data Package.
-
It’s something that I care about. We don’t need to talk about it in technical detail but just to think about, which is doing some of your user research as you go along on how, particularly practicing the change problem, right now everyone is agreed on an API, but imagine you had to add some new fields.
-
You may have already done that work, but how will that work? How will someone who is not technical in one of the ministries deal with that problem?
-
So you’re proposing to establish a standard operation procedure for ALTER TABLEs.
-
For ALTER TABLE. Particularly, to take example, one of my learnings ‑‑ and I’m sure you know already ‑‑ is that whereas ultimately nontechnical people can just about grasp the concept of best data in Excel, most of JSON is foreign.
-
Certainly, JSON is very picky, like you need quotes here and there. You know that, but I’m saying it’s a machine exchange, but it’s not a language. I’m saying this to someone who originally thought maybe five years ago, "Oh, I could get people to write JSON who are not..."
-
It’s easier than XML.
-
It’s easier than XML, that’s for sure.
-
(laughter)
-
That’s not saying a lot.
-
I don’t want to say too much now, just as a flag to you to think about it.
-
Well, you already know Sandstorm.
-
Yeah, I know Sandstorm. What I know is that it’s invented by this protobuf hacker.
-
Right. It’s Kenton’s work. We actually commissioned Kenton to build what they call a powerbox, an intent‑based, open‑format‑based exchange between the Sandstorm‑hosted apps, and also with the external world.
-
What we paid him to develop, which was just finished now, is a way for a grain ‑‑ which is the technical term for a microcontainer, a single‑document container running in Sandstorm ‑‑ to serve an HTTP proxy so that if it wants OpenID login or a process that feeds its spreadsheet updates to external services, now it can do that as of this week.
-
What we are saying ‑‑ and we are getting far more technical than usual, but I think it’s important ‑‑ is that the spreadsheet that we use internally, and inside our institutional platform, is EnterCalc. I happen to be the maintainer, so I can get whatever features in.
-
What we are doing here is essentially saying that any metadata maintenance, which is often collaborative, you can not only share access to internal stakeholders, but because it is capacity‑based architecture, you can also do a bulk download.
-
If you do a bulk download, it basically packages everything written, because that application services are mounted read‑only. By definition, any file that’s writable would be specific to this instance. So it’s data portability. If you click this arrow and you migrate somewhere else, then you automagically get data portability.
-
But all this is manual. As we see in Blue Buttons and other any manual steps, many people would overlook its existence. Not many people would know how to use it. Even if they can bulk download everything.
-
Right. You need an actual API connection.
-
Right. This is the webkey. You can grant others read‑only access to this metadata table, or write‑only access if you have set this up. What I’m saying is, back to this original drawing, if you maintain DDL in Data Package or in other description format in spreadsheets...
-
...exactly. I think doing one based on spreadsheets would work, because we’ve got tabular data here and on the Data Package Spec group who were getting close to something. There’s loads of ways, you could serialize JSON. You could get really generic and do JSON in spreadsheets. Particularly if you’re doing tabulative data, there’s something probably nice.
-
Exactly. That would be perfect. You could use collaborative spreadsheets.
-
The point here is, for your data definition language, thinking about doing it this way... it would be interesting to collaborate on, because we don’t have yet a perfect answer.
-
I think part of it is the process could be used while you’re using Data Package from others, you could use a similar process, but getting the pattern for that, maybe patterns here are a better concept for it.
-
Getting a pattern for how we do that well, and doing it in a way that was actually being road tested in a government by civil servants would be very interesting. One of my favorites, I don’t know if you have it now, but if you just look up bad data, and you look at...
-
( Website: http://okfnlabs.org/bad-data/ )
-
Look at the second... This one. This was just a fun project at one point, but was just an example of how people do this.
-
If you scroll down, this is one I wrote because I did it. This one is one of my favorite... They seem to have...
-
403 ?
-
Yeah, they’ve now obviously prevented login to that anymore.
-
Wow, this is bad data for sure.
-
(laughter)
-
This is my favorite. They even changed this because this me just proxying it, but first of all they have 65 separate files. They have 25 different data structures for the PSVs, but the one I really liked was this one.
-
This was my favorite of all time, because they saved this as a CSV and it was actually a 401 page. The woman uploading it to the website hadn’t realized that she’d saved her 401 access denied HTML page as a CSV.
-
The best part was at the bottom, if you scrolled all the way to the bottom of the file you can actually see her name. She hadn’t realized that she saved her access denied page as the CSV file for the site.
-
One of the things, having worked with people in government, I know that the simpler that we can produce these DDLs, the simpler we can make these steps and test them.
-
For example here, they have no tests, clearly that the data publish each month...
-
Yes. The NDC already is working on automatic checks for simple things like this...
-
The other thing you’re probable encountering is the human aspect of data maintanence.
-
This has been amazing because also I’m aware of time, and we all run out of time.
-
That’s OK. I have until either 11:00 or 12:00. I don’t know what’s your next...?
-
We have to go back to work a quarter‑to, roughly.
-
11:25 I think?
-
11:30.
-
We have another 30 minutes.
-
11:30.
-
One of the questions it gets is on the open software side, and you know my bigger thing that I’m interested in, and obviously this is very interesting, but my really big interest is how do we make an open world?
-
By an open world I mean a world in which all public information, and by public it has be non‑personal, I don’t mean government.
-
So all movies, music, research, software is open, and innovators and creators are recognized and rewarded. That’s the world I want to see, and obviously that would mean all software, all stuff like that is open.
-
Sure. All my work is under Creative Commons Zero.
-
I know you are, but the people we have normally, if we ever tried to persuade government about that at the moment for example, they’d be like, "Well how do we pay for music? How do we pay for movies, or how do we pay for this software, or how do we pay for the chip designs?"
-
We pay for their time. It’s a solved problem.
-
Right. What we need to do is systematically pay people. One question I have at the moment, I’m talking quite a bit, is let’s say about one area that government does control its IT spending, it buys IT.
-
Another area that it does looks by which is outside of your area a bit more, but is it buys drugs through its health service. Right?
-
Yup.
-
In both of these areas at the moment, to look at IT, and it buys software, most governments still mainly buy closed‑source software in terms of the proportion of spending.
-
We now mostly pay to rent, but yes.
-
Exactly, you buy from SaaS providers, but you buy from closed SaaS whose software is closed...
-
OK, not you, but speaking generally.
-
The majority of the spend, maybe not the majority of the use by the way, there is a lot of open source software use, but the majority of the spending is on closed software.
-
By definition, almost. Yes. [laughs]
-
It doesn’t have to be.
-
If they don’t have the budget for Oracle, of course they would use PostgreSQL... [laughs]
-
Let me give you an example of why I care about this. For example CKAN as you know is this open data portal. Now CKAN has been very successful. Hundreds of governments use it.
-
The number of them who’ve actually contributed to core development is about two, which is the US government and the UK government, and the Canadian government.
-
One of the big challenges with CKAN has been finding people...There is many people who now go and do even get winning projects and contracts and deliver it, but that money doesn’t necessarily flow back to core development.
-
You probably know with EtherCalc. Think of the things you could do if...
-
Yeah. At PDIS, we do have full‑time staff to work on EtherCalc integration to Sandstorm, or fixing bugs.
-
And that’s unusual. One of the things I’ve seen is that’s often not very systematic.
-
It will happen when you needed some new feature...
-
That’s right.
-
There’s often not a lot. Particularly core things that may support your features, like, "We need to rewrite this. We need to upgrade to Python 3.0."
-
Well, it’s usually outside government’s scope. They don’t do that for Oracle, either.
-
They do. They pay that for Oracle in their subscription fees. They do pay that. They just don’t think they do.
-
My point is to say we’ve got to change that. We’ve got to define openness systematically. Being honest, the current open‑source business model doesn’t work. It gets by, because open is such a good model for producing software. At the same time, while it is better in terms of feature addition, community participation, in terms of funding, there is a real problem.
-
One of the things I like to be saying to governments is rather than just paying for features on open source or occasionally procuring it, can you start creating open funds?
-
An open fund would look like this. You take the IT budget of, say, national government. You say, "OK, let’s just start with 10 percent. We’re going to put 10 percent of that money into a fund."
-
It goes into a fund, and then what we do with that fund is we don’t use it to buy or procure things during the year. All that happens is that at the end of the year, we look at what we’ve used that’s open software and we take that fund and we give it out, proportionally roughly based on use and value ‑‑ very crudely estimated. We don’t get too obsessed about value.
-
Maybe we only use one Drupal instance, but it was a really big one, versus we installed 20 WordPress instances, but they were small. We start using this fund on a transparent and semi‑automated basis to open projects.
-
What does that do? First of all, it starts paying based on usage, and for things you’re actually using. The money doesn’t go for just new features. It goes to pay for that...
-
It goes to basic maintenance.
-
Right, and it means you don’t have the Heartbleed bug. You have basic support. Second, it means also there’s a real problem for open source software ‑‑ marketing. Generally, it can’t afford the sales and business development.
-
Often what will happen, and I’ve had this experience with CKAN, "Great, I can go and get a deal one year, but three years later, I’ll have a load of competition." That’s good, because that’s the whole point of open source.
-
Three years down the line, anyone can supply CKAN. However, I’m competing in that first deal with Socrato.
-
Socrato will come and say, "Look, we’ll get you a whole instance for free. We’ll make it beautiful. We’ll import all your data. We’ll come and have three executives give you a pitch, we’ll spend tens of thousands dollars, maybe, up front, and we’ll give you a really cheap instance. It will cost $10,000 this year."
-
The fact that in two‑and‑a‑half years’ time, when your contract runs out and you’re now locked in, you’ll get charged 10 times as much. I’ve watched it happen. New York City, first time around, $15,000. Two years later, $100,000 was their price. I remember arguing with the guy. He would say, "But they’re just so cheap."
-
I’m like, "Can’t you use CKAN?" He said, "Oh, well. I’ve just got it through procurement. I didn’t want to deal. You guys don’t look as professional as they do."
-
While I introduced Sandstorm internally, I did get visits for Facebook for Workplace. I’ve got other visitors too ‑‑ it’s all on my public website ‑‑ doing exactly the same pitch. I’m very well aware of this.
-
One day, you will leave office, and so you’ve got to try and put in place changes, at a system level, that will live beyond you. One of them would be things like this fund.
-
Because by doing it, what it would allow is open companies, for example, to come and say, "OK, we’re willing to do some of the up‑front business development to get in the door with Taiwan."
-
"Because we know, if our software is being used, even if someone else is the SaaS maintainer next year, that money..."
-
Remember, in this fund, it doesn’t go the person running the service this year. It goes to the underlying software. It doesn’t go to the SaaS provider, this fund. It goes to the actual underlying software team maintaining it.
-
These ideas are kind of old. There’s various ideas for these. I like to call them also "open offsets," like carbon offsets. You’re using carbon dioxide. You may pay an offset on your flight. Similarly, if you’re using open software, but you’re not really paying for it, then you should offset your use.
-
And then you can renew the commons, so it’s sustainable.
-
Yeah, I’ve made that argument many times. [laughs]
-
Exactly! My question is, is there a way to persuade you guys, say, "Look, we tried this experiment."
-
I would pitch it to them, not on the explanation I’ve had, but to say, "Look. Look at OpenSSL and your Heartbleed bug. That caused vulnerabilities for you. It’s not that it’s more vulnerable than closedware software, but what was missing was you weren’t paying a regular license fee every year to make sure it gets quickly patched."
-
"We want you to start putting in place this fund, so that the software we use, which was hugely high‑quality and high‑value that’s open software, does see regular license fees. So this fund is there to really provide regular support to those companies and organizations to maintain this material year after year."
-
Yes, sure. I agree 100 percent. I’m a long‑time donor to the Software Freedom Conservancy. I was at your position [laughs] arguing for the same things.
-
As I see it, this is an incentive non‑alignment here, especially for the public sector. People made similar pitches to Apple, and I used to work with Apple ‑‑ until just four months ago ‑‑ for six years as an independent contractor.
-
For Apple to open‑source Swift, for Apple to donate to OpenSSL or WebKit or whatever, there’s a clear incentive alignment, because then Apple don’t have to maintain this whole thing anymore. Then subsidizing one or two core members to maintain interoperability...
-
Right.
-
...is clearly in Apple’s interest. Otherwise, they’d have to develop the same stack themselves, which is not a very interesting kind of work.
-
But for government, the incentive is not the same. For Google and Apple, supporting commons like this is a net cost‑saver for them, and actually a huge one.
-
Just be careful. It only works because they’re massive, abusive monopolists.
-
It only works because you’ve ended up with these massive monopolies. It was like AT&T.
-
Well, governments are monopolists of violence...
-
I agree, but I’m just saying works there is that government doesn’t have such a stake in the same way Google are so dominant that they will subsidize some kinds of software. They won’t subsidize software that competes with them.
-
Of course.
-
What I’m saying is, "No government is relatively as big as Google in the particular software verticals that Google operates in..."
-
That’s right.
-
Which is the incentive alignment issue.
-
Which is the incentive alignment problem, because then it’s not seen as crucial to government’s core promise to their people. What we’re trying to do here is to, is the idea that I am trying to introduce, what we call "Process as Commons."
-
I don’t have time to get into details, but the whole idea is that the government does have an incentive in establishing legitimacy, which is getting the policy‑making process as transparent as possible, so that people won’t see the government itself as a black box and go to the streets; this is what they call a "democracy crisis" nowadays.
-
One of the ways to fix that is to adopt the ICANN, IETF kind of a multistakeholder model, an open governance based on discourse...
-
I’m not actually an open governance guy, by the way. Open Knowledge is not open in that way...
-
Sure, sure, sure. I understand. IETF is not an open government organization, either.
-
No.
-
But what I’m saying is that, they get the same kind of legitimacy as even the UN has, because they conduct all their business in a transparent way.
-
It’s also because they’re geeks.
-
It’s true... [laughs]
-
It’s also they work in a very technical, narrow area, in which consensus is... much of the reason governments have so much a harder problem, and it’s important that we, as geeks, remember this, is because governments deal with issues in which there’s much less... IETF debating about Internet protocols, even there, geeks can get very angry with each other and have forks.
-
In government, you’re dealing with issues of a much less informed population, dealing with issues in which there’s much less consensus sometimes.
-
Yes, but I trust citizens more, which is why we’re doing this consultative process.
-
This consultative process gives legitimacy to the government‑initiated policies for things like Uber, which is basically a PR, or what I like to say, a memetic issue. If we don’t get this kind of transparent and accountable process, Uber will automatically seem more legitimate, just because they have better marketing. So we did this work.
-
Wow, that’s fantastic.
-
[laughs] What I’m saying is that doing this work, for example for Uber, we used this open‑source platform...
-
The Pol.is thing.
-
With the Pol.is system. Then we can go back to the Process as Commons idea, saying, "You know, the reason we get this to work is because we have this democracy‑enabling infrastructure in the commons. So we need to invest to bring Pol.is on Sandstorm, and you can call it whatever."
-
This process needs to be trusted by people. Otherwise, we have another voting machine crisis. Technology is part of democratic process.
-
I’ve talked about this in my big debate with Tom Steinberg. I’ll draw a diagram maybe.
-
Just to finish this one sentence. Then we get incentive for governments to fund infrastructure work, because now it’s part of democracy infrastructure, not just IT infrastructure. That’s my main thought.
-
So open...
-
(drawing sounds)
-
Basically, you can tell we should really have Sylvie drawing here. These are terrible drawings.
-
No, it’s fine. I’ve got it. [laughs]
-
I have a long piece. I don’t know if you can bring up here?
-
Yeah, sure.
-
There’s a post that would be great to read. It’s called "Managing Expectations," and then type "tech" and "open" or something, "tech open." Hit that. I think it should...
-
This one?
-
Yeah, that second one. No, sorry, the first one. I misread the first list. It’s the first one.
-
You can read it yourself. Basically, when I first went around in UK, I’d say five, six years ago, 2010, this woman came up. She said, "Look, I’ve heard stuff about government 2.0. It’s about open data, it’s about participation."
-
One of the things I would be skeptical is like this drawing. You can read this afterwards. Can we get back to the drawing just for a moment?
-
Which one?
-
The drawing on your...
-
Oh, OK. Just a second. OK, right. "Open data is necessary, but not sufficient."
-
But also there’s an orthogonality. Basically, what I’m trying to draw here, there’s open information.
-
Yeah, it satisfied one pillar of an a working open government, transparency.
-
Yeah, but also I’m not necessarily that interested in open government. [laughs]
-
That’s fine.
-
Or, I’m skeptical about open government. What I would say is, if you read the article, people are interested in better governance. I think that’s hugely important. I am very interested in it.
-
I just think the role of open information, transparency, is actually relatively small and I think the technology is relatively small. I think the problems evolved in governance are hard problems that technology has little impact on.
-
They are problems of collective action. They’re problems of informing yourself, of which...
-
It’s more of culture and education.
-
No, it’s culture and education and a collective action problem and a principal agent problem.
-
Maybe culture could help solve it, in the sense that if we all were pro‑social in a very strong sense, but there’s no magic fix. There’s no technology fix for those things.
-
That’s right. I think we need about five generations to get there.
-
I would say it’s not generations. It’s to do with maybe then changes in the way that we behave.
-
It comes back to this project I’m doing on society, that we don’t have time to go into today. One of the things that I wrote in that post in 2012, about managing expectations, was be very realistic about what tech will do.
-
For example, I see a bunch of the funders in this space, who funded Open Knowledge, who are all into transparency and accountability. They’re like, "Why are we not making all this difference to accountability?" I’m like, "Well, why was it ever going to do it?"
-
I’ve done activism for 15 years. I did a lot of activism around IT. My experience was, yeah, I could email people rather than write a letter. It was some help, but a campaign on issues was this part.
-
The irony was once you could email, everyone else could email. I remember all the representatives would say, "Now I get hundreds of thousands of emails. I don’t even read them anymore. [laughs] You have to come visit me to have me listen to you."
-
So the challenge I wrote here is I think open as participation in governance is hugely important. The role of tech, I think tech can have an impact here (open governance), but the impact here (open data) is big.
-
The other thing I’m trying to get as a story made into is that, for me, what I say about open world, I want a world in which all information is open. That means all movies, all music, all apps, all algorithms, all designs, etc.
-
The reason I’m particularly interested in that, is not specifically because I think of its impact on governance immediately. It would have an impact, because government information would be open, as well. We’d be able to see what we’re doing and may be able to hold them to account better.
-
Then there’s at least a foundation on which the other side is built.
-
Exactly, it’s a foundation. It’s a foundational stone, but it’s almost a joke ‑‑ Open Knowledge without open minds has little impact. The open minds part is very hard.
-
That’s exactly right.
-
Right! However, you’ve got to get more of them interested in the open world.
-
Why do I want an open world? Not especially, and I’m not saying this, I don’t want this because of government.
-
Yeah, I agree.
-
It’s because this transforms the economy, and that has an impact on inequality.
-
Because it removes artificial scarcity.
-
On top of it, it removes non‑artificial scarcity, because it’s an enabler of many other sectors.
-
It’s more than that. The information age has two big impacts. One is non‑rival goods. I’m going to use tech terminology, because you get it. The other is what I’d call platforms, otherwise known for many hundreds of years as marketplaces, but people now call them platforms.
-
It’s a non‑discriminating marketplace.
-
Right, but it’s why all the debate about Uber... when I go to lots of countries, people go, "Uber’s so new and it’s so digital." It’s got nothing. It’s like a marketplace. It’s like the market in Lyon in 1250.
-
The only thing new is the portable GPS device.
-
Yes, exactly, but these two things then combine with a choice. What then comes in is a big choice here. That choice is open versus closed.
-
This part already happens. The top part, this is just technology, if you like, or more the structure of the economy. The question is this choice.
-
I’ve drawn this diagram badly, but you’ve got the choice, plus the tech change. If you put them together, they give you two different worlds. I’ve drawn this diagram in Denmark a couple of months ago and did it better.
-
The point is, if we go the open route, we get innovation, we get freedom, etc. But we go the closed route ‑‑ and this is what’s important ‑‑ most people in government, in my experience of your colleagues, don’t get that the change of the information age, it isn’t about the fancy gizmos. It isn’t about the AI. It isn’t about virtual reality.
-
It’s about we stop playing zero‑sum games with a competing mindset.
-
Right. That goes both ways. The problem with this is you take non‑rival and platforms and you combine them with closed, you get massive inequality.
-
That’s right.
-
It’s not like the old world. This is a world in which one person gets everything. As we said, we get monopoly, basically.
-
Yeah, plus you get scalable surveillance for free.
-
Right, but forget even surveillance. Just think about Uber, eBay, even Android if you like, Wikipedia. This is a world of where there’s one thing.
-
Sure, it’s network effect taken to extreme on any particular field.
-
Right, but two effects there ‑‑ the network affects the platforms. The non‑rival good is the left‑hand side, which is the infinite economy of scale. One of the things is this choice.
-
Going back here, the point is that the open world, it isn’t so much about governance. It’s about if you transform the economy in this way, it’s kind of like if socialism with capitalism got together and had a child.
-
Yeah, it’s a shift in the human condition.
-
Yeah, capitalism and socialism had a baby. It’s like suddenly we’ve got the best of both worlds. We could have the innovation. We could have market‑based innovation, but we could have it all be open. The underlying good that we’re producing, the information, is fully open.
-
That’s why, also, this phrase I want to and share with you, if there’s one thing you take from this talk. We talked about brand. We had a very good concession that way, on "open," you get this.
-
I would offer to you, by the way, not to use the word "commons." I’m being provocative, but I’ve done loads of stuff. I did stuff from commons, as well.
-
The word "commons" is associated with physical commons, like fisheries or land or the atmosphere.
-
It’s like intellectual "property."
-
The thing is, the commons of land is a mirror image. If this were a mirror...
-
(drawing sounds)
-
...of the commons of information. The thing is this is scarce, and it is overused. This is abundant and under produced.
-
Yes. What would you call it?
-
We don’t need to worry about the technical term. Because when we’re talking about what world we want, often like what you were saying, you think like, "We wanted a commons."
-
Even earlier on you mentioned very commonly that many people don’t...They’ll even have events, like there’s one in Poland, the thing for the commons. Or like, we want digital commons.
-
I’m like, the commons, it brings all the wrong intellectual associated. I would say, just, we constantly say, "I want an open world."
-
That is the t‑shirt brand. We want an open world. Because we don’t get into the detail of the commons which also is controversial to people.
-
Basically, instead of saying "process as commons", do we say "common processes"...
-
What the process is...
-
What about "open processes?"
-
...which are shared narratives.
-
It doesn’t dilute. It’s open, including API and source code.
-
The source, I’m not so concerned there. The process is quite technical, so calling them commons wouldn’t matter so much.
-
You’d say, "What world do you want?" You might say, "I want a digital commons." Instead say, "I want an open world." I even start to use the word "open" systematically. I don’t even say, "Open source software." I say, "Open software."
-
We do that here too. (開源 instead of 開放原始碼)
-
Exactly. My point though is, what is the definition of open world?
-
This is what we as a community and a set of movement, an open movement, you need to be clear and get is, that across the world we could go anywhere and say, "What’s an open world?" I’d say, "What’s an open world?"
-
They’d say, "An open world is a world in which all public information is open, blah blah blah."
-
Just one quick thing. There’s that leadership program. We’re running a leadership program.
-
I have emailed you.
-
If there’s anybody that you recommend, that you think...
-
You’re probably already in touch with the Open Culture Foundation.
-
Yes, those I’ve dealt with.
-
You’ll want to contact Whisky Chang, of course.
-
Whisky Chang?
-
You probably want to contact him.
-
If we didn’t have...
-
Opendata.tw. Here is his contact.
-
If you know anyone in particular in g0v, I’m particularly interested in people who are civil servants who might come. It’s invite‑only in the sense, it’s got limited space. I will run it. It’s highly interactive. It is about this stuff.
-
That’s our public system architect, designer and everything.
-
(laughter)
-
Come, please come.
-
It’s on Saturday, 10:00 till 5:00. Here in Taipei. Write down the time.
-
This Saturday.
-
This Saturday? 10 till 5:00.
-
10:00 AM to 5:00 PM. It’s open‑leaders.com.
-
Yeah, that’s our website.
-
Who’s coming?
-
There’s several people from the OCF, and several others. There’s a couple of entrepreneurs. There are some people from the NTU, National Taiwan University. There’s also at least one civil servant.
-
Because we have some informal connections to young public servants who focus very much on this thing.
-
Could you mail them alternatively?
-
Our youth councilors, too. They may be interested.
-
That would be amazing. Could you do that today by any chance? Because it’s getting close to...
-
Sure.
-
We just wasn’t sure who else is coming.
-
It’s not going to be a public event. Many people must be in the policy, PhD candidate, deputy CEO of the Open Culture Foundation, which is obviously... OK. Thank you so much.
-
No problem.
-
I’m here until next Wednesday. Next Tuesday, Wednesday. If tere’s anything you want to follow up on let me know. This has been very, a privilege to meet you.
-
Sure. Let me just say a final thing.
-
Lovely.
-
I don’t see myself as a government person. [laughs]
-
No, I said governance.
-
Governance, even. What I’m trying to say here is that this is what everybody agrees, that we’ll get here.
-
Many people base their argument on solving inequality, which is I think the main motivation for your speech.
-
In Taiwan, it’s maybe not the best angle.
-
No. I’ve already got that. I think what I would say...
-
Because people here are...
-
...there will be innovation too.
-
Innovation. Here that word sometimes implies something competitive. It’s part of our education curriculum. We’re fixing that.
-
However, for the current generation... [laughs] That’s the old argument between Eric Raymond and Richard Stallman.
-
ESR wanted to argue that open source makes better innovation. Period. But the FSF doesn’t like this argument because they want to focus on equality of access and software freedom.
-
What I’m saying is that, historically in Taiwan and around East Asia, the innovation part...
-
Is the stronger one.
-
Is the much stronger one.
-
I did write freedom, innovation, healthy and wealthy. I agree.
-
I think the challenge of both for us is how we go outside of the tech sphere.
-
Ultimately to build an open movement of the current, you need to go beyond... We need to reach out. That’s why for example, inequality seems to be quite a powerful message globally.
-
Globally of course. Especially In Europe.
-
I suspect that it may not be here right now, but I think it will be in 20 years.
-
Definitely. It was so good to meet you.
-
Very good to meet you. Thank you.
-
Great fan of your work. [laughs]
-
Thank you. [laughs]
-
Real pleasure.
-
Real pleasure.
-
We’ll be in Taiwan probably...
-
We’ll be back. Sylvie’s Taiwanese.
-
If we can solve the DDL Data Package generation together, that will be great technologically.
-
You have our email, so we can just keep emailing.
-
Thank you so much as well. It’s a pleasure to meet you. Maybe we’ll see you on Saturday, who knows?
-
OK.
-
Thank you.