View Section: 2017-02-14 Open discussion with Rufus Pollock

( Whiteboard: https://ey.pdis.nat.gov.tw/shared/mz-dL7DJofqEZLbuZ00CG8AsgK13jqlHMV_WfUXKlPa )

Link in context Link

2:00 PM today, we’re going to design one of the larger common API — not with the OpenAPI spec — but a common, standard‑based refactoring of our national disaster response systems.

Link in context Link

Audrey Tang

That is actually a good, concrete example to talk about, because that would involve merging two established governmental units, at least their front‑end websites, together, and establishing a common data exchange pipeline around disaster data.

Link in context Link

Audrey Tang

Also, we need to somehow demonstrate that it’s better if you open‑source the part of code which may suffer from performance problems, rather than keeping it tied to proprietary DBs and and throwing more hardware at it.

Link in context Link

Audrey Tang

There’s many dimensions of that operation, and the kick‑off meeting is at 2:00 PM.

Link in context Link

Rufus Pollock

Perfect.

Link in context Link

Audrey Tang

So that’s actually a good case we can talk about.

Link in context Link

Audrey Tang

I’m interested in your book, also. If you want to develop some thoughts, I can help philosophizing, too.

Link in context Link

Rufus Pollock

We can talk about that, as well. One other thing, let me see if I can go and open...where did I write these notes?

Link in context Link

Rufus Pollock

Let’s discuss these topics. One is the practical platform question.

Link in context Link

Rufus Pollock

Second, there’s obviously the book, but particularly out of that is what I call the "open fund" through use open software in gov. Then we might add to that. Let me have a look. Here we go. This is the one I want. I guess I can actually open my OECD slides on data‑driven government.

Link in context Link

( Website: http://rufuspollock.org/2016/09/14/visiting-estonia-21-23-september-2016/ )

Link in context Link

Audrey Tang

Also allocations.

Link in context Link

Rufus Pollock

In terms of the platforming, do you want to tell me a little bit about... if we take 20 minutes on that. One thing I could also talk about a little bit is the Frictionless Data or data containerization ideas, if that would be useful.

Link in context Link

Audrey Tang

Sure, of course.

Link in context Link

Rufus Pollock

Here we go. I’ve got some things here. That’s the one I want, view, into full screen.

Link in context Link

Audrey Tang

You want to project this out?

Link in context Link

Rufus Pollock

I could just talk about...some of the ideas might be quite common for the other things. The one question I would have at the moment is you guys all... I will speak in English. Is that OK?

Link in context Link

Audrey Tang

Yeah, of course.

Link in context Link

Rufus Pollock

My Chinese is not good enough.

Link in context Link

Audrey Tang

It’s OK. If you speak JavaScript or SQL all of us also understand.

Link in context Link

(laughter)

Link in context Link

Rufus Pollock

There’s a really general problem, which is we want to reduce the friction of getting stuff from A to B, from tool A to tool B.

Link in context Link

Rufus Pollock

One thing that you could imagine would be that you are a government official, working in Excel. An ordinary policy‑maker or analyst in one of your departments is probably never going to use JSON, but they will use Excel.

Link in context Link

Rufus Pollock

Imagine a journalist at a newspaper. They might use Excel. They might use JSON if they’re a data journalist. Imagine what it would be like if it was one click to import data from anyone or anything on one of the data platforms into the tool of your choice, into R, into whatever, and then even to do things with it in there. What would you need to have that?

Link in context Link

Rufus Pollock

You need an exchange format, obviously. You need some way to exchange between these tools, and the other is you need tooling.

Link in context Link

Rufus Pollock

I emphasize that often we — certainly myself — as technologists, tend to focus on the exchange format, "If only everyone used RDF. If only everyone used JSON and JSON Schema."

Link in context Link

Rufus Pollock

That is part of the story, in that if you tried to convert from every format out there to every other format out there it would be a nightmare. There are too many of them.

Link in context Link

Rufus Pollock

At the same time, the tooling is hugely important. The integration into people’s existing tool workflow is crucial for adoption. We sometimes think, "If we build it, they will come. If we have our wonderful format, everyone will just adopt it." They won’t.

Link in context Link

Rufus Pollock

Also, in the Frictionless Data project here, the other principles that aren’t necessarily specific, but are distinguishing, are the Zen‑like simplicity.

Link in context Link

Rufus Pollock

For example, RDF does not have Zen‑like simplicity. A focus on existing tooling, for example, in the Frictionless Data project, is that we don’t really invent any data formats. We just invent some degree of a wrapper around some of them.

Link in context Link

Rufus Pollock

For example, we use CSV for tabular data because every tool on the planet that supports tabular data basically supports CSV. Every tool out there supports CSV. [laughs] It’s the most basic tabular structure you could have and it’s web oriented.

Link in context Link

Rufus Pollock

That’s another distinguishing feature, which is data containers. It’s this idea around containerization, which is a little bit different. It has a relationship with APIs, but a little bit different, this idea that you want to ship from different tools to different tools and allow the tool to unwrap what’s in the box, if you can.

Link in context Link

Rufus Pollock

Let’s talk about CSV as maybe exemplifying our approach. For example, why even CSV as a basic tabular format than let’s say JSON? Excel doesn’t support JSON as a native import format. Even if, in our format, you throw away all the data package stuff, you still are left with CSV.

Link in context Link

Rufus Pollock

Even if you don’t understand data packages, you could open the CSV. Whereas most average bureaucrats will not know what JSON is. You will need to build some tool that takes JSON to whatever they have, to some extent.

Link in context Link

Rufus Pollock

I’ll maybe just make a comment here. I was involved in an effort that’s something called IATI, for the International Aid Transparency Initiative. I‑A‑T‑I. It started about 10 years ago now, maybe more. I was involved in it from quite early on. Its aim was that people wanted to report about aid projects. There are people who pay for aid projects, and there are people who received money from aid projects.

Link in context Link

Rufus Pollock

You have issues where you have multiple donors funding the same project without realizing it. Donors don’t realize that there are some projects which are not being funded at all that are important. There’s just a lack of joined up thinking, which is similar to government sometimes. [laughs]

Link in context Link

Rufus Pollock

They created, against some of my advice, at the time their standard was built around XML, which was the JSON of 10 years ago. [laughs] JSON’s maybe better. It’s simpler. It’s web oriented and so on.

Link in context Link

Rufus Pollock

The thing is, they ended up having to spend huge amounts of time. First of all, no one natively can support XML. For example, we spent a load of time writing converters to convert XML back into CSV. That’s what all the analysts, all the people who want to look at the data, or have a spreadsheet and they want to import a spreadsheet. I would just go on about that.

Link in context Link

Rufus Pollock

Just to explain this format, and when you went for this format, basically the structure is to try and take something people are already using, which is CSV.

Link in context Link

Rufus Pollock

I don’t know if you know what I mean by progressive enhancement in JavaScript... It’s less done now, but traditionally progressive enhancement in web applications.

Link in context Link

Rufus Pollock

This is your HTML page. Then you can add JavaScript and progressively enhance it. You’ve got the CSV. Then you can add basically a schema, which is what’s missing in CSV. You don’t know the types. You don’t have rich types.

Link in context Link

Rufus Pollock

Then you can add a descriptor which said, "This data set was created by Audrey," or it was created by you. It was created on this day. This is the license. This is where you got it from. For example, even if you were doing JSON schema based APIs, you still probably need an agreement on what that standard metadata that you ship at the top of your API is.

Link in context Link

Rufus Pollock

You’ve got it perfect. You’re going to bring up an example. You can look at the data package JSON.

Link in context Link

Audrey Tang

Sure. Let’s look at the CSV.

Link in context Link

( Website: https://github.com/datasets/gdp/blob/master/data/gdp.csv )

Link in context Link

Rufus Pollock

Exactly.

Link in context Link

Audrey Tang

Because it’s progressive. Right?

Link in context Link

Rufus Pollock

Exactly.

Link in context Link

Audrey Tang

[laughs] It should be self‑descriptive. As you can see the existing github tooling is happy with that.

Link in context Link

Rufus Pollock

Yes. Exactly.

Link in context Link

Audrey Tang

You can store it in everything...

Link in context Link

Rufus Pollock

You can store it.

Link in context Link

Audrey Tang

...and find whatever. Then you can also see what’s a valid GDP value. Now that you know the country code has to be part of the iso-3-geo-codes/id geo codes.

Link in context Link

Rufus Pollock

Exactly. If you like, this is the part which I think is the least developed, but most exciting. What was the exciting think about link data? Link data I think fundamentally ultimately hasn’t really worked, but what was exciting was foreign keys.

Link in context Link

Rufus Pollock

One of the problems at the moment with the web, if you’ve ever done... You guys have probably taken part in hack days or anything like that. You spend 90 percent of your time preparing the data, downloading it, merging it with some other data set, tidying it up. One of the things here, like that foreign key would be a breakthrough, would be we start to be able to if we had some...

Link in context Link

Audrey Tang

...core data packages...

Link in context Link

Rufus Pollock

...that we could reference them again and again.

Link in context Link

Audrey Tang

It’s there.

Link in context Link

( Website: http://data.okfn.org/roadmap/core-datasets )

Link in context Link

Rufus Pollock

Exactly.

Link in context Link

Audrey Tang

[laughs]

Link in context Link

Rufus Pollock

There it is. Exactly. The irony I’d also say is you don’t need the whole link. One of the things I’d emphasize here. First of all, going back to SQL, foreign keys are more familiar. They’re very simple here. We still working on that, but the idea would then be you’d be able to do a join, and for example join into this data set some other information.

Link in context Link

Rufus Pollock

Here by the way, someone’s already bothered to denormalize a little bit. They’ve already inlined the country name in perfectly normal form, like in the database. You wouldn’t have the country name. You’d just have the ID, which is the foreign key and then the GDP values. Someone’s already denormalized a little bit.

Link in context Link

Rufus Pollock

Let’s say you wanted the continent or you wanted the region. For example a classic thing you might want to do is like, "I’ve got GDP, but I want to aggregate by region." In fact, if you look at the World Bank data, they kind of do this. Often they’ll include in their data set aggregates, for example, GDP for Europe.

Link in context Link

Rufus Pollock

Let me give you another one actually that I’m...I could, on my laptop, just very quickly give you a stupid example.

Link in context Link

Audrey Tang

There’s a text file here that describes all the core data sets.

Link in context Link

Rufus Pollock

At the moment linked most of the core data. It’s actually more coming. This is the list of all of them. You can build data. What’s also nice is, if you start up to their package, if you want to build really lightweight data catalogs, you don’t even need full on CCAN or something like that. You can just write a text file and then almost do it in JavaScript. Just have a little browser of your data packages.

Link in context Link

Rufus Pollock

Let me get you data.oecd. Here’s an example of something I’m actually interested in at the moment, just by coincidence. I’m putting up data.oecd. I think if you just search for OECD pharmaceutical expenditure, I think it’s the top, possibly that one. Here we go.

Link in context Link

Rufus Pollock

What’s great is if you just change the time by the way. Just go down a little bit here, and change the time to all the time. If you drag that over...

Link in context Link

Audrey Tang

Like this?

Link in context Link

Rufus Pollock

Just as far over as you can drag that. Then if you just change, for example, this to GDP per capita. Actually US dollars per capita is even better.

Link in context Link

Audrey Tang

I see.

Link in context Link

Rufus Pollock

I can’t remember who this is. I think this is the US.

Link in context Link

Rufus Pollock

By the way, this is how much we spend on buying...

Link in context Link

Audrey Tang

You’re right, by the way. This is the US.

Link in context Link

Rufus Pollock

One of the things, the US has gone crazy. This is partly Hepatitis C. There’s one drug category in 2014 that’s released. For example, I know you’re in Taiwan. I don’t know the numbers. In mainland China, there at 30 million people with Hepatitis C who will need this expensive...

Link in context Link

Rufus Pollock

The point being, just by the way, is if you just look at this dropdown list. Let’s say I wanted total spending on pharmaceuticals, which is the number I actually want. Guess what? It’s not in this list. I would need to join this data set with population per capita over time to do this.

Link in context Link

Rufus Pollock

That was just one example. We constantly need to do joined and often on the same data set, population, GDP. In fact, the data sets you normally need to join on are quite common. Not always the same, but often they’re the same data sets. Going back, I think you had a nice example of the data package that you...

Link in context Link

Audrey Tang

The GDP one?

Link in context Link

Rufus Pollock

Yeah. The GDP one was perfect.

Link in context Link

Audrey Tang

It was linking to the registry. The thing that it links is not the core yet because it’s not yet curated.

Link in context Link

Rufus Pollock

I should check that, whether it’s there or not. It’s a good...

Link in context Link

Audrey Tang

...it is a good example anyway.

Link in context Link

Rufus Pollock

It’s a good example anyway. I would check that. The other thing we have here that you can see by the way, this is the data packages. Just to look at it, it is actually in JSON. We’re also thinking that you could describe these in JSON, but also in possibly actually CSV itself.

Link in context Link

Rufus Pollock

My point is to the top. First of all, even if you’re using Swagger, the very top part of this before we have the resources...

Link in context Link

Rufus Pollock

The resources looks like a little bit how you describe your fields in Swagger.

Link in context Link

Audrey Tang

Yeah. It’s the JSON Schema part.

Link in context Link

Rufus Pollock

This part here, you could still be useful as just a convention whenever you’re shipping a file in your API. You may need to send some metadata with it. What’s the license? What is the title? I don’t know. When was it last updated? There’s some standard metadata that you might adopt, that’s just based on having gone through NPM or looking ASCII code metadata.

Link in context Link

Rufus Pollock

The thing that then is interesting is the description. This is the resources, which is very simple, which is the description of the fields in the file. It’s heavily based actually, some parts are on JSON schema. It’s unfortunately not exactly identical with it. It’s difficult to exactly converge it.

Link in context Link

Rufus Pollock

The question then is also you can extend the metadata. For example, this is a data package for describing tabular data. Your resources can be anything.

Link in context Link

Rufus Pollock

In a data package, you can put inside a container. You can put anything. You can put binary data in. It wouldn’t be able to be described so usefully. The resource here wouldn’t have any schema. It would just say name. It would just have those exactly, is the JSON one, exactly.

Link in context Link

Rufus Pollock

This is actually one with also a remote reference, where it says, "Hey. My resource lives somewhere else." You can package up stuff that lives somewhere else.

Link in context Link

Audrey Tang

Which is our usual way of doing things.

Link in context Link

Rufus Pollock

Right. You can also extend it. Finally, for example, I don’t know if we pull up fiscal data package...

Link in context Link

( Website: http://specs.frictionlessdata.io/fiscal-data-package/ )

Link in context Link

Audrey Tang

Sure.

Link in context Link

Rufus Pollock

Looking at this diagram, you can see on the tabular data package, which builds on top of data package JSON Table Schema and CSV. Then you can extend it further by saying, "I must have these fields." You might have restaurant inspections. This is sort of what you’re doing with your Swagger APIs probably. You’re going to have a list of schemas.

Link in context Link

Audrey Tang

This is the DITA idea back in the XML era.

Link in context Link

Rufus Pollock

Exactly.

Link in context Link

Audrey Tang

The Darwin Information Typing Architecture.

Link in context Link

Rufus Pollock

Exactly.

Link in context Link

Rufus Pollock

There’s a lot of people who have talked about this in the last few years, or last decade. Obviously even I talked in the US quite a bit. Where standardizing really simple specifications for like, "Hey. You’re going to put out restaurant inspection data, like ’Was the restaurant clean or not?’ What would be the columns you’d have?"

Link in context Link

Rufus Pollock

The thing I think for that to work is an obvious idea. It needs to be super simple, and it needs feedback. It’s so easy to write a spec. It’s really easy to say, "You should have column this, this, and this." It’s a totally different thing for people to actually do that. [laughs] The question is, for example, you need good validation.

Link in context Link

Rufus Pollock

For example, there’s this site called Good Tables that we’ve put up. Obviously they’re schema validated. Here we go. It’s the first one. There are themes like JSON schema validated. All I would say is having looked at them, and even worked at our own site, is let’s say you pick an example. Click on this one here.

Link in context Link

( Website: http://goodtables.okfnlabs.org/ )

Link in context Link

Audrey Tang

Sure.

Link in context Link

Rufus Pollock

If you click on that and then hit validate, you really need to work on your user experience. For example, I don’t think this is still a great user experience. You really need to find a way, and this is harder than you think, to tell people who are going to produce data for you, what’s not right about it if they make errors.

Link in context Link

Rufus Pollock

In a certain way, when you’re doing your JSON API, it all gets easy because you’ve got tech geeks that are producing the API. If you assume people further down the chain have to...In a way that just hides your complexity. That means someone inside the department needs to get the data in a form that can be published through that API.

Link in context Link

Audrey Tang

Yeah. Other system depends on it. It would break...

Link in context Link

Rufus Pollock

It will break.

Link in context Link

Audrey Tang

...if it somehow invalidates.

Link in context Link

Rufus Pollock

Right. I got it. That requires a lot of tech. How do I put it? Your real goal ultimately has got to be that civil servants can publish quite a lot of data themselves.

Link in context Link

Rufus Pollock

At the moment we tend to have a model where basically the civil servants won’t do stuff, but there will be an IT team who will take that data and put it into some schema that we have defined. That has a lot of demand on your IT team.

Link in context Link

Audrey Tang

I don’t think that’s the case in Taiwan though.

Link in context Link

Rufus Pollock

My question is here. Let’s pick something really...I don’t know. Inside the department you’ve probably done some research already. What’s a data set that people are producing by hand, where they’re filling out the Excel spreadsheet by hand?

Link in context Link

Audrey Tang

Meeting logs. That’s the primary example.

Link in context Link

Rufus Pollock

Great. Do they do that in Excel or do they have now a web form?

Link in context Link

Audrey Tang

Meeting logs are done in free text, mostly through emails and as Word documents in official document systems. Mostly printed in paper form and re‑scanned as JPEG files, if I’m not mistaken, or at best, as attachment to emails.

Link in context Link

Rufus Pollock

Perfect. Do you want that to become structured data like a spreadsheet?

Link in context Link

Audrey Tang

Yes.

Link in context Link

Rufus Pollock

In each of the departments you’re looking a little bit... What do you think it’s going to happen?

Link in context Link

Shuyang Lin

I never thought about that. That’s a good question.

Link in context Link

Audrey Tang

What we’re trying to do here is to... I was explaining the Taiwan situation a bit. I understand that there is usually an IT department in other countries, like in each siloed unit.

Link in context Link

Rufus Pollock

There might be an IT department. There might be a common one, as well.

Link in context Link

Audrey Tang

Sure, of course. They would be in a larger IT office or something. In some cases they report directly to an upper level IT unit, sometimes they report to the organization, the command structure doesn’t matter here.

Link in context Link

Audrey Tang

What matters is that whenever we want to do any data exchange in a normal way, there has to be either a hub in the meta-level, or a direction connection between these IT guys. Then the public servants are blissfully ignorant of it. That’s your picture.

Link in context Link

Audrey Tang

In Taiwan it doesn’t work that way. [laughs] In Taiwan the “IT” people is just a part of the picture. Let’s put it lowercase, like “it”.

Link in context Link

(laughter)

Link in context Link

Audrey Tang

Most of the time, it’s the private sector vendors...

Link in context Link

Rufus Pollock

Who do all the stuff?

Link in context Link

Audrey Tang

...who do all the stuff in building systems. Correct me if I’m wrong, but all of the cases that I’ve seen, it’s an externally procured provider that’s renewed annually, that provides the user-visible experiences. Maybe the infrastructure, the actual machines are maintained by the IT department, but the IT are more like MIS that way than actual information technology.

Link in context Link

Rufus Pollock

That’s true in many other governments too. They’ve communicated they’ve outsourced a lot of stuff.

Link in context Link

Audrey Tang

We outsource maybe most of our stuff.

Link in context Link

Rufus Pollock

Let’s go to the question of structured logs. Let’s call him or her who has to report the meeting logs? What’s her name or his name? Should we give them a name, just for the moment, for doing user research?

Link in context Link

Audrey Tang

Alice.

Link in context Link

Rufus Pollock

Alice here has to report the meeting log. You know what it’s like having a structured data. How is that going to work? Which unit are you going to create? Are you going to create a web form where she has to type it in?

Link in context Link

Audrey Tang

No. In PDIS usually just send her an Etherpad link.

Link in context Link

Rufus Pollock

You send her an EtherCalc link?

Link in context Link

Audrey Tang

An Etherpad link.

Link in context Link

Rufus Pollock

An Etherpad link?

Link in context Link

Audrey Tang

Yeah.

Link in context Link

Rufus Pollock

How do you get the structured data out of that, like the time our meeting started?

Link in context Link

Audrey Tang

We write a parser.

Link in context Link

( Website: https://github.com/audreyt/pad2an )

Link in context Link

Rufus Pollock

You give her a structure there in the Etherpad that they must follow?

Link in context Link

Audrey Tang

We just gave them a template basically.

Link in context Link

Rufus Pollock

You give them a template.

Link in context Link

Audrey Tang

The Etherpad always starts with the date of the meeting, the title of the meeting, and then speeches.

Link in context Link

Audrey Tang

It feels very automated; just start typing whatever. We put on a line of description that said, "Try to put a colon between the speaker and the words they speak." That’s it.

Link in context Link

Rufus Pollock

My point though is that as you do this, you’ve created a little format, which is your Etherpad format for reporting speeches. You will need it to be machine-validated.

Link in context Link

Audrey Tang

Yeah.

Link in context Link

Rufus Pollock

Sometimes people won’t follow the format. For example, one common thing I’ll do is type dates that you can’t pass as dates. Going back to that Good Table, one of the general challenges you have is of giving people good format.

Link in context Link

Rufus Pollock

What’s wrong here is there was a header column that was empty. Here the row dimensions are incorrect compared to the thing. For example, here there’s no value for amount of description.

Link in context Link

Audrey Tang

When they hit publish, what we need to do is to have this very clear, instead of random XML error from SayIt.

Link in context Link

Audrey Tang

Yes, I do agree, it’s an experience we need to work on.

Link in context Link

Rufus Pollock

I’m just trying to find...

Link in context Link

Audrey Tang

Currently, it’s either completely blocked through some cryptic error messages, or it goes through.

Link in context Link

Rufus Pollock

Let me see if this works. Thank you very much. I probably actually won’t have to, but it’s very interesting see how I do on the...Is this going to work here?

Link in context Link

Audrey Tang

Just to illustrate, basically, this is from the CastingWords folks. CastingWords is an external vendor. They have their own format. They can produce a VTT or an SRT, which would then timecode it with the YouTube.

Link in context Link

Audrey Tang

We handle that, too. Oftentimes, we pay for the cheaper option when we don’t care about the timecodes, which is most of the time, where they just paragraph delimit the speakers, and so on. Then we write a parser that translate it into a command console, and then feed it to SayIt. That’s our current platform.

Link in context Link

( Website: https://sayit.archive.tw/speeches )

Link in context Link

Rufus Pollock

All I’m going to say is, some things, for example, this is the UK dashboard we built for these. We’re also trying to build these dashboards at the moment. This is an example where this is spending data. Each government department’s supposed to publish spending data every month on what they spent money on.

Link in context Link

Rufus Pollock

Not just their budget, but individual line items, like we bought this mobile subscription, or whatever. I was involved — when I was on the government transparent board — in designing the format. Guess what? It’s written in CSV. It’s super simple. It’s in Excel or CSV.

Link in context Link

Rufus Pollock

It has basically eight columns. That’s all it has. Maybe six columns are compulsory. Now, I want you to guess. They’ve actually published now about 3,000 CSVs, because they publish roughly one every month. How many of them are valid?

Link in context Link

Shuyang Lin

How many?

Link in context Link

Rufus Pollock

How many out of 1,234 are valid?

Link in context Link

Shuyang Lin

Valid, you mean?

Link in context Link

Rufus Pollock

They actually have the six columns they’re supposed to, and the structure of those columns, if it’s supposed to be a number, it’s a number. If it’s a supposed to be a string...

Link in context Link

Shuyang Lin

Is there a person who really fill out the form?

Link in context Link

Rufus Pollock

They did it according to the structure.

Link in context Link

Shuyang Lin

Is it half, 50 percent?

Link in context Link

Rufus Pollock

Half, 50 percent. What do you guess?

Link in context Link

(pause)

Link in context Link

Rufus Pollock

80 percent, 20 percent, 10 percent, 100 percent?

Link in context Link

(discussions)

Link in context Link

Rufus Pollock

30 percent. OK, 30 percent.

Link in context Link

Rufus Pollock

I hear 20 percent.

Link in context Link

Audrey Tang

Five percent.

Link in context Link

Rufus Pollock

Five percent. The answer is zero. They didn’t manage to publish a single valid...

Link in context Link

Audrey Tang

I’m too optimistic. That’s great to know.

Link in context Link

(laughter)

Link in context Link

Shuyang Lin

For this one.

Link in context Link

Rufus Pollock

In addition, you need to track if they keep publishing. Many of them just stop publishing the files. Since the government’s got less interested in transparency, you can just tell here, and we haven’t been running this quite as recently. For example, many of these seem to have not published for quite a long time.

Link in context Link

Rufus Pollock

For example, these guys, this is one department, which is the Ministry of Defense has published since December 2016. The others are often quite a long time ago, like April 2016, etc. The file, one of the things that we experience, that we went and did user research with the people...

Link in context Link

Rufus Pollock

Because the civil servants, they’re not trying to be bad. They’re not like they don’t like this. Actually, many of the departments love this. In fact, their heads of department love this data, because they find it really useful for tracking...

Link in context Link

Audrey Tang

It’s like their GPS for quality.

Link in context Link

Rufus Pollock

They don’t just like this. They like the data. They like having the data. They’re not against it, but they struggle to publish, and do the Good Tables. What they found is just they don’t understand why changing the column names matters.

Link in context Link

Rufus Pollock

For example, they often just change the name of the column in the spreadsheet in random ways. They also do things. They often publish dates that aren’t valid, or change the format of the date. Sometimes they save from Excel when it’s switched to US mode, and then they put the month first, rather than the day first, all of this kind of stuff.

Link in context Link

Rufus Pollock

What it means, by the way, is if you want to actually do a dashboard of government spending, which you’d quite like to build out of this, it’s very painful. It’s very time‑consuming to clean this data up, and actually use it.

Link in context Link

Rufus Pollock

Why I’m trying to get at that is that it’s not just having the schema. It’s building tooling that gives feedback to users on are they valid with the format?

Link in context Link

Rufus Pollock

I don’t know in your Etherpad parser, have you done any experience of not having invalid Etherpad documents? What happens when they get it wrong?

Link in context Link

Audrey Tang

Then you see a broken XML message, a random message that says it’s invalid XML, basically, which is not helpful for the person. You have to click “back,” and then try to figure out where it’s wrong.

Link in context Link

Rufus Pollock

Try to figure out what’s gone wrong.

Link in context Link

Audrey Tang

It’s right. I wrote the conversion script myself, and I didn’t do that well on the error reporting thing.

Link in context Link

Rufus Pollock

It’s hard. I tell you, it’s like good UX for Apple iPhones. It’s harder than you think.

Link in context Link

Shuyang Lin

Every time I found a problem, I go to her.

Link in context Link

Audrey Tang

Well, I know exactly how hard it is, because I did write a parser and error reporter for the Perl 6 language.

Link in context Link

Audrey Tang

Reporting on the exact position when it goes wrong the user a helpful message is really difficult. I spent maybe a year with Larry Wall on that. [laughs]

Link in context Link

Audrey Tang

I know how hard it is. It’s just in the case of meeting logs, I was like, "No, it’s not worth spending that much time on it."

Link in context Link

Rufus Pollock

I got you. It’s stuff like also we’re geeks. For us, line column numbers and rows are really valuable. It turns out, in our experience, when we try to do it that way — I don’t know if you have just the good tables, if you just put it up again just for a moment here — it’s not perfect.

Link in context Link

Rufus Pollock

Originally, we could just say something like, "Header column number one is empty." This is a geek message. This was written by the...

Link in context Link

Audrey Tang

Yeah, it’s like, "I don’t know what to do." This is like my XML messages. It’s like saying the starter tag is empty.

Link in context Link

Rufus Pollock

Most ordinary users don’t even know what column number one is. You want to draw it like they visualize it. That’s why we try to start drawing out, but it’s harder than you think. [laughs] For example, how do you explain date formats?

Link in context Link

Rufus Pollock

It turns out that just explaining good date formats is harder than you think. It’s not just explaining the date format. For example, many of them go, "Oh, OK, but how do I go to Excel and change the date format?" They don’t know how to change date formats in Excel.

Link in context Link

Rufus Pollock

There’s also weird stuff, like Excel may show you one date format, but when it exports, it exports in a different date format. It’s things like that that turn out to be hard to do...

Link in context Link

Audrey Tang

Which is always the case here, by the way, because we start counting years from 1911 in the government.

Link in context Link

Rufus Pollock

From the revolution.

Link in context Link

Audrey Tang

From the revolution. It means that Excel is never storing our date formats natively.

Link in context Link

Rufus Pollock

Right. The thing is, all I’m saying is, this tooling, that’s the really valuable part of format. I can make up format, whatever, building this kind of tooling, so this tooling is built for data JSON table schema.

Link in context Link

Rufus Pollock

This is an error reporting system that’s built for JSON table schema. You can do, if you want, cool things like once you have this, and this is a Python library, and there’s also a JavaScript library. Now, we’re implementing libraries in every language.

Link in context Link

Rufus Pollock

You could do things like install it in Travis. Every time you push a file to GitHub, it will check whether your data is valid, and fail or succeed on your pull request.

Link in context Link

Audrey Tang

The PDIS.tw team here wrote that for our meeting tracking systems.

Link in context Link

Rufus Pollock

You know that. You can do that kind of integration. Here, the other thing I was going to mention, actually, is fiscal data package. There’s a whole specification that goes beyond just having a tabular data package. I think you search for fiscal data package.

Link in context Link

( Website: http://specs.frictionlessdata.io/fiscal-data-package/ )

Link in context Link

Rufus Pollock

There you go, the fiscal data package. This, what it does is it says, "It’s a data package. It’s a tabular data package." It’s got to have tables in it, CSV. Then it says what columns you’ve got to have. It has a model for that.

Link in context Link

Rufus Pollock

This, it basically says, "Be in CSV." Basically, it says, "It must be a tabular data package." Then it says what extra columns or stuff you must have.

Link in context Link

Audrey Tang

Like the Darwin information typing architecture.

Link in context Link

Rufus Pollock

Yes, that’s how you described it.

Link in context Link

Rufus Pollock

Really, by the way, the next level that you probably want if you’re going up, is you’ve got a basic container, like a steel box. Then you say, "My steel box is designed for pallets."

Link in context Link

Rufus Pollock

People know what I mean by a pallet. By the way, containers don’t all look the same, it turns out. There’s ones specialized for particular functions. Then you could say, "Mine is a steel box for pallets that hold bicycles."

Link in context Link

Rufus Pollock

Here, what you also have, really is, if you know what I mean by OLAP, you probably want to get into dimensional, not just saying, "Hey, I’ve got this columns," but columns go together. For example, these three columns together describe a customer. These three columns together describe a department, or so on.

Link in context Link

Rufus Pollock

The question is, then, on the tools, what we want to remember at the very beginning is I should have one click to install, just like you can type npm install, and the library installs. Now, geeks do that, but whether you’re in R, or whether you’re in Excel, or whatever in, you could install data packages or data into your system with one click.

Link in context Link

Rufus Pollock

The next step has to be, in all of this, it’s just like, "Here is all the tooling you can do. What platform integration can you build?" Going back just a point about see JSON, JSON’s awesome, you can all kinds of stuff with your JSON.

Link in context Link

Rufus Pollock

Just take an example. One of the standard things people do with data is put it in a relational database, or put it in spreadsheets. They dominate almost any other system — even NoSQL — by orders of magnitude today in terms of installed base.

Link in context Link

Rufus Pollock

Now, when you have JSON, people do all kinds of fun stuff with JSON. For example, they nest entities. You take one entity, and then inside of it, you embed something else. For example, you might have users, and then you put their posts inside them.

Link in context Link

Rufus Pollock

Once you start doing that, you have all kinds of fun normalization games. It’s not trivial to take JSON back into a relational database.

Link in context Link

Audrey Tang

Hmm, you can just create a field with type JSON. Problem solved.

Link in context Link

(laughter)

Link in context Link

Rufus Pollock

Yeah, but in that case, you’re basically just doing NoSQL in a relational database. You can do it.

Link in context Link

Rufus Pollock

All I’m pointing out is that, for example, I’ve had this debate with standardization efforts. For example, this thing called the Open Contracting Partnership.

Link in context Link

Rufus Pollock

Tim Davies is the guy who was quite involved in IATI. The thing is, in life, how do I put it? Many formats get dominated by publisher concerns, not by consumer concerns.

Link in context Link

Rufus Pollock

Because the people who write the spec are often involved in producing the data, they tend to focus on their concerns as publishers to express what they want to express. Actually, the success of many of your formats is going to depend on consumption.

Link in context Link

Rufus Pollock

From a point of view of publishers, HTML is pretty rubbish. HTML has no semantic structure, basically, until HTML5.

Link in context Link

Audrey Tang

I already wrote that on the whiteboard for you. [laughs]

Link in context Link

Rufus Pollock

Even HTML4 is better than XHTML.

Link in context Link

Audrey Tang

Agreed.

Link in context Link

Rufus Pollock

It was worse as a structured publishing format, but it was easier to consume. It won. One of my comments is that JSON is great, and you can convert to it, but for example, it’s great for APIs for geeks.

Link in context Link

Rufus Pollock

You can express however you want it. We could even have GraphQL etc. In some sense, those are optimized for a certain kind of consumption. GraphQL is a JSON API optimized for consumption of mobile or other devices, where bandwidth matters, or complexity of queries matter.

Link in context Link

Rufus Pollock

For your exchange with a lot of your data, once you’ve got a JSON field, for a lot of the tools, you want to get data back into, it’s not especially convenient, unless you’re building for web application, unless your web APIs are optimized for consumption by web applications.

Link in context Link

Audrey Tang

Which is the case we’re doing now.

Link in context Link

Rufus Pollock

Right, but you’ve got to ask yourself, is most of your data designed for consumption by web applications, or is it for consumption back into a database, or back into something else? I’m only pointing this out. I’m not saying you can convert CSV to JSON incredibly easily, like it’s trivial.

Link in context Link

Rufus Pollock

If you have foreign keys, you can even do complex joins. You can convert JSON back into a normalized structure with tables with no nesting.

Link in context Link

Audrey Tang

The only things you can do, is either exploding it to individual tables, or declare bankruptcy, and say it’s a JSON field.

Link in context Link

Rufus Pollock

Right, or it’s a JSON field, but is it a good example? Who here knows Redux?

Link in context Link

Audrey Tang

R‑E‑D‑U‑X?

Link in context Link

Rufus Pollock

Like React?

Link in context Link

Audrey Tang

Sure, of course.

Link in context Link

Rufus Pollock

Like front end. Do people use React and Redux?

Link in context Link

Audrey Tang

We use Angular 2, React, Vue, whatever.

Link in context Link

Rufus Pollock

One of the funny things about Redux is that if you notice the design of a Redux store, your Redux store is like a local cache of your remote database. One of the things you do in Redux is they encourage you to normalize your data.

Link in context Link

Rufus Pollock

To un‑normalize it, to un‑nest it. If you read the Redux docs, they’ll talk quite a bit about why you need to renormalize your data.

Link in context Link

Audrey Tang

Because it doesn’t want nesting.

Link in context Link

Rufus Pollock

It doesn’t want nesting. Why? It’s because it’s hard to do atomic updates of individual items. it becomes a very complex thing to track. Let’s say, for example, you retrieve a post from the server. If that’s nested under the user, then you have to go to...

Link in context Link

Audrey Tang

It’s expensive, but we can solve it with JSON operational transformation.

Link in context Link

Audrey Tang

That’s a lot of technical details that we don’t want to get into, though.

Link in context Link

Audrey Tang

We know nesting is more trouble than flat. That’s a consensus.

Link in context Link

Rufus Pollock

There’s the thing. I’ve got you. OData, by the way, is like, OData and data packages have quite a lot in common. OData is quite similar. I want to stop there on what I’m saying about data packages. What I would offer is terms of value whatever route you’re going down.

Link in context Link

Rufus Pollock

One is common descriptive metadata. You could adopt some of the similar descriptive metadata. The other is this question on publisher/user experience. For example, you might want to have a JSON API. At the end, that’s what you’re defining in Swagger.

Link in context Link

Rufus Pollock

As an experience, maybe you want to have a data package intermediary there. They’re going to publish from maybe Excel.

Link in context Link

Audrey Tang

Yes. We already agreed on that in the previous meeting. [laughs]

Link in context Link

Rufus Pollock

We did, exactly. I’m just saying, you might want to publish to that, and then convert to JSON for your JSON API.

Link in context Link

Audrey Tang

We also agreed to that.

Link in context Link

Rufus Pollock

Those are things that are valuable. We can stop on that one, then, for the moment. I’d like if you could tell me a little bit about this...

Link in context Link

Audrey Tang

No, it’s just fine. To recap, to augment our realities a little bit, there’s a note that we made some time ago. I think it’s on your Twitter. It’s also on my Twitter, so maybe we’ll bring it up on Twitter.

Link in context Link

( Website: https://twitter.com/audreyt/status/829201093368356864 )

Link in context Link

Audrey Tang

As the road map of what we’re doing this afternoon, it is a great example. This is the drawing that we had.

Link in context Link

Audrey Tang

What we essentially said was that there’s an existing system, which for one reason or another, doesn’t have a vendor that responds as quick as we would like to feature amendments, which is the E‑M‑I‑C system, just to put names to it. It’s the EMIC System.

Link in context Link

Audrey Tang

The EMIC System is where people go when we see typhoons, or any kind of natural disasters. The EMIC System has a back end for all levels of governmental units to register whatever information they have on, for example, a road is broken, or for example, a tree has fallen.

Link in context Link

Audrey Tang

Usually, their user experience is that of instant messages, which means that it’s unstructured data. Well, at least it’s plaintext, not handwriting.

Link in context Link

Audrey Tang

Maybe not as unstructured as it can get, but still, for textual data, it’s just a line of description.

Link in context Link

Audrey Tang

People then look at that line, and decide what to do with it. Most of the time, they just shove it into this time‑stamped append‑only log of disaster data exchange, and then forward to people who want to subscribe to this kind of alert information.

Link in context Link

Audrey Tang

Then for those subscribers, they maybe add a tag, or maybe even without a tag, meaning they just want to subscribe all the information from the Taipei City, for example.

Link in context Link

Audrey Tang

Then they get a relay of all those aggregated information out of it.

Link in context Link

Audrey Tang

What we would like to do, is if it describes a particular place, and we know the place already, we would like to convert it to some kind of geotagging. We’re geotagging it.

Link in context Link

Rufus Pollock

You enrich it, basically. There’s an enrichment.

Link in context Link

Audrey Tang

There’s an enrichment, exactly. Then, of course, the reported time may not be the same as the time that it actually happened. You also want that part of the ETL, which is the time part, not just the space part.

Link in context Link

Audrey Tang

The end result, what we want, of course, is a layer on top of a public-visible map, like OpenStreetMap, with anything that let people query what the disaster is like around your neighborhood.

Link in context Link

Audrey Tang

Of course, the EMIC System is already under a lot of stress performance‑wise from the existing governmental users, and the existing aggregation, and the queries.

Link in context Link

Audrey Tang

The existing hardware would not support any more ETL. What we said last meeting, even though we didn’t say EMIC specifically, is that because it’s all in this huge relational database, we need somehow to get it to export in bulk regularly, with a clean data description language that we can amend as the enrichment goes on.

Link in context Link

Audrey Tang

For example, out of every message, we currently only extract time and text. At some point we would like to extend a TopoJSON-compatible description, like when it talks about road we want the road there, which is a line or a path and so on. We don’t have the complete listing yet.

Link in context Link

Audrey Tang

We want the DDL to be able to grow, and when we grow it we want the API to stay clean, compatible with existing consumers but also versioned in some way.

Link in context Link

Audrey Tang

There’s already users in other government agency who broadcasts push notifications, and we don’t want those services to stop. It would be silly to rebuild that. We want this to continue, but against a clean API from now on which is where an data schema comes in. It’s already communicating over HTTP anyway, so why not structure it?

Link in context Link

Audrey Tang

What we talk about is we take the DDL here and say, "Every hour get me a data package of everything that happens around EMIC, and then publish it as open data packages," and then send it to a second system which is called the NCDR system. The NCDR system currently is for disaster prediction and reduction. The R stands for reduction I think.

Link in context Link

Audrey Tang

While NCDR doesn’t have the official mandate to publish alerts around disasters, their develops are in‑house, and they’ve got some good ArcGIS developers there. They can very easily do visualizations that not only look beautiful, but it’s also very useful.

Link in context Link

Audrey Tang

What they don’t have at the moment is first‑class access to the aggregated database of EMIC.

Link in context Link

Audrey Tang

Data packages obviously is a way for people who don’t have a relationship, let’s say, with each other, to nevertheless fully consume the other person’s data with the confidence that it will not break. What we are saying is that on the national open‑data platform, we would now house a schema for the data package.

Link in context Link

Audrey Tang

The EMIC will start producing structured data, and then the existing National Development Council’s open data validation team...I don’t know whether you’ve talked to these people.

Link in context Link

Rufus Pollock

No. I haven’t.

Link in context Link

Audrey Tang

They already have a data quality program this year. They will use machines to ensure that the EMIC keeps honoring its open data promise.

Link in context Link

Audrey Tang

Note that it’s not checking some API description. It’s just plain open data.

Link in context Link

Audrey Tang

Now we have NCDR consuming this data, and during disaster we want the frequency to pick up.

Link in context Link

Audrey Tang

We also want to do enrichments at the NCDR, because there are now developers in‑house. We can tell them to get all sort of push notifications, as long as it’s not called government official alerts which is outside of their purview. They can do a lot of even more value adding with for example the weather data, Water Dam control data and so on.

Link in context Link

Audrey Tang

Also they already partner with Google and other vendors, with standard CAP protocols. Then they can publish not with their own API, but with the existing web‑based APIs for disaster recovery to the consumers of Facebook Safety Check.

Link in context Link

Rufus Pollock

Yeah.

Link in context Link

Audrey Tang

And Google, and whatever. This is standardized. We don’t need to do anything here. It can also publish its own enriched data as a kind of derived data set, as also open data on the national data platform. People who want data don’t really need to look at the raw EMIC data.

Link in context Link

Audrey Tang

They can look at the enriched data that NCDR will already have, not only the raw fields but also the enriched data fields. That’s what we planned essentially with us meeting, without me telling you the used case. That’s essentially the idea.

Link in context Link

Rufus Pollock

That’s really great. One of the things I think about is that certainly when you’re having large ingestion of raw data, obviously one of the other things you can do is start... often, at the beginning, you could just be dumping...

Link in context Link

Rufus Pollock

Rather than put it into a structured database, one of the things I like at the moment is you could just use flat files. If you have a DDL and you’re dumping data packages, if you’re getting a lot of raw log data in, you could actually run a lot of your enrichment pipeline.

Link in context Link

Rufus Pollock

For example, we have this thing at the moment we’re calling — that’s about to be a post, I think — data package pipelines. Basically, ETL pipelines run around packaged input of each stage of the data package. They output the next stage of the data package.

Link in context Link

Rufus Pollock

If you do quite a bit of ETL, one of your pain points is normally, as they get complicated, "What is the agreement between different stages of the pipeline?"

Link in context Link

Rufus Pollock

We’re using this at the moment. This is very crude in the sense it’s a mini‑ETL language for a pipeline that’s based around tabular data, particularly.

Link in context Link

( Website : https://github.com/frictionlessdata/datapackage-pipelines )

Link in context Link

Audrey Tang

It’s kind of naive.

Link in context Link

Rufus Pollock

Yes, a very naive one, but even if you’re doing a complex code one, you could still have your contract between different parts of the pipeline be written around it. I’m doing one at the moment where I’m looking at cyber security incident data, and more also infection data, like your machine is not yet used for anything, but it’s vulnerable. We have billions of rows come in.

Link in context Link

Rufus Pollock

You can have a similar enrichment. For example, you just have an IP address. You need to add the geo location. You need to add other information.

Link in context Link

Rufus Pollock

This kind of aspect of, "How do you build that pipeline?" comes up a lot. I think it’s very useful.

Link in context Link

Audrey Tang

The "run" here is arbitrary dot‑separated identifiers that mean something to the tools. It doesn’t have a typology?

Link in context Link

Rufus Pollock

This is the name of an ETL stage. You could imagine this getting more enriched, where you even have a little library of the things that you can do. At the moment, it’s very basic.

Link in context Link

Audrey Tang

So it’s designed like a Docker Compose file?

Link in context Link

Rufus Pollock

Yeah, and we do run this in Docker, exactly. What you just say is, "OK, yeah, exactly." You can add things, format, and you can attach things.

Link in context Link

Rufus Pollock

It’s a really simple thing. It’s just a very basic convention of data packages, but it’s the tooling around it.

Link in context Link

Audrey Tang

So all the drawings here that we do can be described by a less naive, but still useful — usefully inspectionable — variant of data package pipeline.

Link in context Link

Rufus Pollock

Yeah. Just to be clear, you don’t have to do data packages with CSV. To really emphasizes it, you can have data packages with JSON in them.

Link in context Link

Rufus Pollock

Data packages, at the most naive level, are just like a steel container. You can put what you like inside them. You can have JSON data and a JSON Schema.

Link in context Link

Rufus Pollock

For tabular data, in my experience, the simplest thing is CSV and using JSON for table schema. Why? Size constraints, you could still CSV on gigabytes of CSV. You can stream it. You can stream JSON, but it’s a bit pain...the pause is much more complicated to stream.

Link in context Link

Rufus Pollock

You can stream CSV, gigabytes of it, over the web. You can stream it. Every language has well‑built, iterative structures for reading gigabytes of CSV and not breaking, like not having to load it all into memory ‑‑ just stuff like that, which, when you’re building ETL pipelines, really matters.

Link in context Link

Rufus Pollock

Should we talk about the open fund?

Link in context Link

Audrey Tang

Sure, of course.

Link in context Link

Audrey Tang

Thank you for the consultation, wizard.

Link in context Link

(laughter)

Link in context Link

Audrey Tang

Really, this is great. Before this case, we’ve never really dealt with true cross‑ministry regular pipelines.

Link in context Link

Audrey Tang

We dealt with different units in the National Development Council, and that’s working pretty well with the regulation preview and also visualization of the national budget.

Link in context Link

Audrey Tang

These are managed by different units within the NDC, so it’s easy, and they have a common IT department.

Link in context Link

Audrey Tang

This afternoon is the first case where we’re going in and say to two different ministries, "Hey, you’re going to do things the new way." I’m pretty excited. If this can be made to work and well‑documented, we can do this with more ministries.

Link in context Link

Rufus Pollock

That would be great. The one other question that I wanted to talk about was open software, actually more than the open data side. That also comes to the open fund.

Link in context Link

Rufus Pollock

I could start with a question, which is, does anyone know the list of vendors, the IT, the national government or city governments, for last year how much they spent?

Link in context Link

Audrey Tang

Of course. It’s our basic procurement data.

Link in context Link

Rufus Pollock

Great.

Link in context Link

Audrey Tang

This shouldn’t take long to find; it’s between $2-3 billion TWD per year.

Link in context Link

Rufus Pollock

The question is, has anyone then checked how much of that spending went to on open software to pay for or buy software or services? It could be actually running a system that used open software versus closed software.

Link in context Link

Audrey Tang

It’s got to be less that 0.1 percent, so I don’t think anyone...

Link in context Link

Rufus Pollock

No one bothers looking at it, got it.

Link in context Link

Audrey Tang

Exactly. No one bothered counting.

Link in context Link

Rufus Pollock

It did not even register.

Link in context Link

Audrey Tang

Actually, when I’m look at the raw numbers now, a lot of it is in services. It’s in building an IT system or maintaining an IT system, not on software licensing.

Link in context Link

Rufus Pollock

I got that, but the question would be how much of the software inside of that service was open software versus closed software, how much that would be. Often, governments don’t know. That would be my question.

Link in context Link

Audrey Tang

We do have a breakdown.

Link in context Link

Audrey Tang

It’s mostly hiring services, it’s building, it’s maintaining, and very little on licensing actually.

Link in context Link

Rufus Pollock

The classic example I also give is — I won’t bring up the diagram — let’s say this a piece of medicine. This is a pill that you take. How much of the cost of that is the information and how much of that is the cost of the manufacturing, the actual chemicals?

Link in context Link

Rufus Pollock

Let’s say you buy a treatment. You buy $100 of pills. How much of that $100 is the manufacturing cost? How much of that is the information cost? It will vary, right?

Link in context Link

Audrey Tang

It will vary depending whether it’s a generic or it’s a label drug.

Link in context Link

Rufus Pollock

Let’s say, an in‑patent drug, it might be that a dollar of that, if we would really draw this diagram correctly, this tiny bit down here would be the cost of the actual chemist and the actual manufacturing it. $99 would be the cost of the information. You can ask the same thing when you buy services.

Link in context Link

Rufus Pollock

It’s true people don’t normally buy a drug recipe. They buy a drug. They buy the actual pill. Similarly, most people actually buy a working software system with servers that are running and doing stuff, but, obviously, inside of that is the sysadmins, the electricity, the energy to run all this service. The other part is the software, the know‑how.

Link in context Link

Rufus Pollock

The question would be to ask of how much was spent on IT last year, how much of it was crudely for buying information, and the know‑how, and the actual rest of the software, and running that, and how much of it was paying for the engineers and the electricity to run the computers and stuff like that? I’m going to ask that as one question.

Link in context Link

Rufus Pollock

The question would be, how much of that went on information or buying information as opposed to paying for just the electricity and all the other stuff, which you pay for whether you’re using open software or closed software? How much of that went on open software? Then you could ask, what’s your goal?

Link in context Link

Rufus Pollock

How much of government spending should go on open software or closed software and why? The other question I would ask you is, if you went around who is in charge of IT budgets in given departments or across government, what’s the name of the guy who runs that right now?

Link in context Link

Audrey Tang

There’s a fixed portion that goes to SMEs.

Link in context Link

Rufus Pollock

A fixed portion?

Link in context Link

Audrey Tang

Yeah. There’s a fixed SME portion.

Link in context Link

Rufus Pollock

How much is it?

Link in context Link

Audrey Tang

Just a second. Let me double‑check before I say anything silly.

Link in context Link

( For 2016, it’s 40% at a minimum. Source: http://gpw.moeasmea.gov.tw/moeasmea/wSite/ct?xItem=60540&ctNode=822&mp=00205/ )

Link in context Link

Rufus Pollock

There’s a fixed portion? We could ask ourselves, what is their goal?

Link in context Link

Rufus Pollock

For me, a goal of having open software, that isn’t my goal. My goal is value for money, a good deal for Taiwan citizens or for British citizens because we don’t care about open software or closed software for its sake unless we’re Richard Stallman maybe. I’m joking.

Link in context Link

Rufus Pollock

Probably he doesn’t care. He cares about freedom. Richard Stallman doesn’t care about free software. Actually, he cares about freedom. Similarly, maybe whoever runs spending here probably cares about or they at least say they care about value for money.

Link in context Link

Rufus Pollock

My questions is, if you went around now and asked decision makers in Taiwan’s government what role do they think open software has in getting value for money, would they even know what open software was? Would they know what was good or not good about it?

Link in context Link

Audrey Tang

There was a huge push around OpenOffice in 2013, which is what I was trying to find, because there was this justification paper that says how much licensing cost it saves over Microsoft Office subscriptions and so on.

Link in context Link

Audrey Tang

I couldn’t quite find the report, but the original justification was very close to what you have said, which is to promote a local community around the then OpenOffice, now LibreOffice community and for people to get in touch with the Chinese text processing part of it instead of waiting for Microsoft to roll next.

Link in context Link

Audrey Tang

The main argument was that we can save this much amount on licensing cost, which I was trying to find but couldn’t.

Link in context Link

( The 2013 report on OpenOffice.org: http://www.dgbas.gov.tw/public/Data/3258342471.pdf )

Link in context Link

Rufus Pollock

My question here is let’s say you had a goal to have more open software; we’re convinced that more open software would mean better value for money for Taiwan. It would mean better IT systems probably, that they were more agile, they were more adaptable to Taiwan’s needs, and that they were probably in the long run cheaper.

Link in context Link

Rufus Pollock

I actually come to the cheaper item last. The number one item is freedom to adapt to your needs faster and better.

Link in context Link

Audrey Tang

I’m pretty sure the IT decision makers know about this, but they don’t usually consider it practical. [laughs]

Link in context Link

Audrey Tang

They have heard of this argument a lot.

Link in context Link

Rufus Pollock

What I want to talk about now is practical. I went and talked to the OECD leaders. I was saying like, "What can you change? What can you guys, what can you change while you’re here in office?" You could build systems. The thing is you’ll, at some point, probably all leave. It’s natural. You’ll get hired by somebody else. Maybe you’re here anyway on a six‑month period.

Link in context Link

Rufus Pollock

What often stay around in bureaucracies is process. The rules that you’ve put in place, if you want to buy differently, you need to change the rules by how you buy. Rather than focusing on what you buy, rather than us going and having an argument directly about, "We should buy more open software," you want to talk about how you’re going to buy. I think it’s actually on my website, if you wanted to put it up.

Link in context Link

Rufus Pollock

Let me just see. Rufus Pollock. I think if you search for Rufus Pollock. Let’s just have a look. If you look for Rufus Pollock, Rufus OECD, I think that might work. I think the second slide or item which needs to be Twitted [indecipherable 70:09] my website. I have to say Google is constantly reducing the priority of individual personal websites. If you then click on the link, I think it’s here. You’ll have them. This slide does not show up...

Link in context Link

Audrey Tang

I don’t think it’s personal. It’s insecure, not HTTPS, and embedded from a HTTPS site.

Link in context Link

Rufus Pollock

Is that what it is?

Link in context Link

Audrey Tang

That is what it is. [laughs]

Link in context Link

Rufus Pollock

If you click on this...

Link in context Link

Audrey Tang

You probably want to run Let’s Encrypt on your website.

Link in context Link

Rufus Pollock

Yeah, I could get a cert. I could turn it on.

Link in context Link

Rufus Pollock

If you just scroll down quite a way, there were some examples. Keep going.

Link in context Link

( Website: http://rufuspollock.org/wp-content/uploads/2016/09/data-driven-government-oecd-cio-sep-2016.pdf )

Link in context Link

Rufus Pollock

One more thing: You can only procure what you know. One of the big problems for IT also is if you want to go and ask someone, one great question to ask people who are buying IT of any kind. Just ask them, "What’s the difference between buying software, or buying an IT service, and buying a chair?" What’s the difference?

Link in context Link

Audrey Tang

The stock of the common procurement doesn’t run dry when you buy software.

Link in context Link

Rufus Pollock

That’s one point, absolutely.

Link in context Link

Rufus Pollock

What’s the other thing? How much have chairs changed in the last 100 years? How different are these chairs from 15 or 20 years ago?

Link in context Link

Audrey Tang

Not since the Bauhaus.

Link in context Link

Rufus Pollock

Bauhaus.

Link in context Link

Audrey Tang

[laughs]

Link in context Link

Rufus Pollock

These chairs have not changed. The basic design of a chair has not changed in hundreds of years, but computers have changed quite quickly. In addition, you can only buy what you know.

Link in context Link

Rufus Pollock

I think it was a famous saying, everyone attributes stuff to Steve Jobs, like they attribute it to Einstein or whatever. Basically, you can only buy what you can imagine.

Link in context Link

Rufus Pollock

If you’re going to create something that people can’t imagine, you have to create it for them to know it.

Link in context Link

Rufus Pollock

The problem with buying software is people are trying to procure things they don’t always know that they can have. They only go with what they can already imagine.

Link in context Link

Rufus Pollock

Also, chairs are very standardized. You can compete on price. There are a few different attributes that you can rate. Software’s not like that.

Link in context Link

Rufus Pollock

In addition, when you buy a chair, you use it for 10 years. Then it gets old. Then you maybe throw it away. You buy new chairs. Let’s say these chair wear out and we want to buy new chairs.

Link in context Link

Rufus Pollock

There’s basically no connection between the chairs we buy next and the chairs we had before.

Link in context Link

Audrey Tang

Except they have the same API.

Link in context Link

Rufus Pollock

We’re talking about chairs here.

Link in context Link

(laughter)

Link in context Link

Rufus Pollock

Yeah. They have the same API. You can still sit on them, that does not change. That’s a very good point.

Link in context Link

Rufus Pollock

But the point is there’s no relationship. If these chairs are really bad and really uncomfortable, you can buy a different set of chairs. It’s not a problem.

Link in context Link

Rufus Pollock

With software, when you buy one thing, normally you have to lock it. Even the people who come along and say, "Don’t worry. We’ll have standard open APIs. You don’t need to worry what’s behind the scenes anymore," that’s just rubbish.

Link in context Link

Rufus Pollock

Don’t believe them. Don’t even believe them if they’re Audrey Tang. It’s not true. The truth is...

Link in context Link

Audrey Tang

...they’re still bad chairs. [laughs]

Link in context Link

Rufus Pollock

...software is way more complicated than its API.

Link in context Link

Rufus Pollock

APIs are entirely surface in the interaction between different things.

Link in context Link

Rufus Pollock

Just for example, go back to a story. Late 1980s, IBM is still one of the richest companies in the world. IBM say, "Well, look. We know what the problem is. People are locked in to the DOS APIs. All we need to do is clone the DOS APIs. It’s not that hard. And you’re going to be completely competitive with them."

Link in context Link

Rufus Pollock

IBM spent like four or five years and a hundreds of millions, if not billions, of dollars trying to clone the Microsoft APIs to be feature compatible, so that people can run their applications on IBM’s system just like they can run it on Windows or DOS. They don’t succeed.

Link in context Link

Audrey Tang

That was also because the standard was set by a single vendor. It’s a very old debate.

Link in context Link

Rufus Pollock

It’s not just that. Look at AWS. Look at it. There’s AWS for cloud. Cloud APIs aren’t that hard to clone. Yes, they are. Look how fast AWS is producing new services. Now there’s Lambda and there’s this. Every week there’s a new service from them.

Link in context Link

Rufus Pollock

People always start out and say, "Oh, API’s so simple." Also what happens when you need to add stuff to the API? Actually the software beneath determines how quickly and how performantly you can add new features to your API.

Link in context Link

Rufus Pollock

Often, by the way, web APIs still don’t cut it. Often you actually do need to integrate at the machine level. Web APIs all still have significant latency and performance issues. Serializing JSON in and out of APIs is not hugely performed. You often end up having to go back to the system.

Link in context Link

Rufus Pollock

Believe me, I know it. Also, when people actually want to switch vendors...

Link in context Link

Rufus Pollock

For example, I went on a contract, bid on a contract years ago. The World Bank and their enthusiasm at the beginning of the open data boom...

Link in context Link

Rufus Pollock

The finance section at the World Bank. Not the World Bank main part that bought from Socrata, which was a proprietary vendor of data platforms. They did an open bid. It wasn’t really an open bid, but they said it was an open bid.

Link in context Link

Rufus Pollock

They said, "OK. We’re opening bidding." The bidding ended at the end of April. They said, "By the beginning of June, when our contract expires, you need to have migrated all the data to our new system and be compatible with the old API from Socrata."

Link in context Link

Rufus Pollock

Socrata had that open API. No one else implemented it. Also you had to migrate all the data, comply with all the performance quality [indecipherable 75:55] . That’s classic stuff.

Link in context Link

Rufus Pollock

"Oh, yes. The API is open, but you need to be able to stand up, migrate all the data, do all the other stuff, to really have competition." Remember, the guy who’s losing the business is not going to be super helpful about migrating all this stuff in.

Link in context Link

Rufus Pollock

There will always be the features that they added that you really needed, whatever it is. It’s really hard to maintain true backwards compatibility for all the edge cases, for all the little bits of API that have been added that you didn’t notice.

Link in context Link

Rufus Pollock

The real guarantee of competition is open software. Ultimately you must have the stuff underneath open, or you don’t get real freedom of choice.

Link in context Link

Rufus Pollock

The thing is, "How do you get more open software?" Let’s just go to the next slide for a moment. This is what I want to emphasize. A lot of time we start saying, "We want more open software of something." The question is how you buy. That’s what you can change. Let’s just go to the next slide.

Link in context Link

Rufus Pollock

I’m sorry to have this annoying scroll.

Link in context Link

Audrey Tang

No. It’s fine.

Link in context Link

Rufus Pollock

What can you do? This is a little bit for the OECD. It’s a little bit like I customized it for them.

Link in context Link

Rufus Pollock

One of them is agile‑lean, another is explicitly favor open software in procurement deals in Taiwan. Do you have an explicit category that allows for open software?

Link in context Link

Rufus Pollock

How big is it? Have you guys ever bought...

Link in context Link

Audrey Tang

We do.

Link in context Link

Rufus Pollock

How big is the points relative to other items?

Link in context Link

Audrey Tang

It’s a section called intellectual property ownership. We just revamped that section.

Link in context Link

Rufus Pollock

Often what happens is people say, "We want to just have the right to the software." It’s often clever by the way.

Link in context Link

Rufus Pollock

If you’re smart, what you’ll do is you’ll say, "Everything that you’re going to buy..." I won’t name a vendor. Let’s say I’m vendor X. You’ll come and buy from me. I’ll say, "Yes. Yes. Everything I build under this contract, everything I build under this contract, will be open software."

Link in context Link

Rufus Pollock

What I build under this contract will just be the layer, a thin layer, on top of all my existing proprietary technologies. Yes. At the end of this contract, you will own this layer or it will be open, this layer of open software. It’s completely useless without the rest of the stack.

Link in context Link

Audrey Tang

Yes. We’re all painfully familiar with this phenomenon.

Link in context Link

Rufus Pollock

You’re all painfully familiar.

Link in context Link

Audrey Tang

We’ve been through this many, many times. [laughs]

Link in context Link

Rufus Pollock

This is why, guys. You not just being coders, but getting in there and writing the procurement rules, really carefully. Saying stuff like, "this scoring is not just for the software we buy under this contract, but it’s for the entire solution that must be needed to run it."

Link in context Link

Audrey Tang

Down to operating system level.

Link in context Link

Rufus Pollock

Exactly. The other one is like principles for buying agile. The two I want to talk about is you’ve got open. Buying agile makes a huge difference. It’s hard for governments, because they like to have spec and deliver. They find that hard to do.

Link in context Link

Audrey Tang

Yeah, we don’t have spec-free procurements here either.

Link in context Link

Audrey Tang

But I do see your point.

Link in context Link

Rufus Pollock

Open offsets. That’s the one I wanted to talk, which is setting up rather than the whole thing that gets you out of...

Link in context Link

Rufus Pollock

The thing still of traditional procurement, even when you procure open software, is you know what you’re buying. I want a chair. How much will it cost to have a chair? I want a website with this, this, and this feature.

Link in context Link

Rufus Pollock

To get out of that space, you either need to go the agile route, where you stop buying software and you buy development time basically.

Link in context Link

Audrey Tang

Which is what we’re doing in PDIS.tw.

Link in context Link

Rufus Pollock

Right. Good idea. The challenge of that is you end up having to build most of the software yourself.

Link in context Link

Audrey Tang

Well, we have a time frame. We put out a budget. We can still outsource standard-compliant solutions and welcome bidding.

Link in context Link

Audrey Tang

People who come up with WordPress or Drupal in their stacks automatically gets a better seat in the bidding.

Link in context Link

Rufus Pollock

I got that. What’s tough about it is still...

Link in context Link

Rufus Pollock

Let’s have a piece of software that will be valuable if we build it, to Taiwan, to Great Britain, to Brazil. Here in Taiwan you can procure agilely, but you then have to absorb either all of the cost for that piece of software...

Link in context Link

Audrey Tang

Yeah, all of the cost, mostly.

Link in context Link

Rufus Pollock

Right. Then that makes it hard for the pieces of software that are only feasible when they’re supplied to multiple countries. You have a coordination problem.

Link in context Link

Rufus Pollock

Now one route around that is rather than going through traditional procurement, where you say, "I want X, and I’m going to put up this amount of money for it," you can go to an open offsets model where we have this fund. We put money in it. Then we just pay out of the fund for what we turned out to use.

Link in context Link

Rufus Pollock

That means I, as an open software vendor, like of the classic VC company, I still have to take the risk. I am going to build this piece of software. If it gets used, I know I get paid out of these funds.

Link in context Link

Rufus Pollock

It means you invert it. The government no longer needs to know it has chairs. It just looks at the end of the year and says, "What did we use? Oh, we used 5 tables, 10 chairs, 3 tanks." [laughs] Whatever we bought, we’ll pay for out of our open offset fund. The vendors take the risk capital of creating the software in the first place.

Link in context Link

Audrey Tang

It’s an open alternative to the model of SaaS, basically.

Link in context Link

Rufus Pollock

Exactly. It’s the open version of SaaS. It’s even better, by the way, because you can distinguish in your payments in the fund. You can pay for the service part of the SaaS separate from the open information.

Link in context Link

Rufus Pollock

Why is that important? You don’t just want to pay the vendor who supplied you the SaaS service.

Link in context Link

Rufus Pollock

For example, let’s say someone supplies WordPress services to you. Your procurement will probably just cover the SaaS part of it, the part of running the service. You often won’t actually be paying, unless you’re buying new features, for any of the underlying development of WordPress, but you kind of care of that.

Link in context Link

Rufus Pollock

The open offsets fund could even be separated from the SaaS procurement. The offsets fund pays for the software.

Link in context Link

Audrey Tang

What I’m saying is the development needs to piggyback on subscriptions, because it’s the same cycle. We renew annually. This option is to renew annually, because conceptually it’s the same deliverable.

Link in context Link

Rufus Pollock

Exactly, it’s renewed annually. Let’s just scroll down one more. This is the guide. This is the things. This is why software is different from chairs and bridges. Switching cost is significant, and governments are bad at negotiating. Governments are pretty terrible at negotiating.

Link in context Link

Rufus Pollock

We can keep scrolling down, keep going. This is the current model and this would be the new model. The new model, government has these IT funds. You pay on use, which is the open offsets. You have spec‑and‑deliver and you have agile. You have three routes, rather the traditional just spec and delivery.

Link in context Link

Audrey Tang

What we have now is if it’s SaaS, then we call it the cloud procurement, which is the top part.

Link in context Link

Rufus Pollock

Which is pay‑on‑use.

Link in context Link

Audrey Tang

Which is pay‑on‑use.

Link in context Link

Rufus Pollock

That’s still not open offsets.

Link in context Link

Audrey Tang

No, it’s not.

Link in context Link

Audrey Tang

What I’m saying is that currently in Taiwan’s IT procurement, everything that’s SaaS is counted differently than the spec‑and‑deliver. We’re trying to convert the spec‑and‑deliver line here.

Link in context Link

Audrey Tang

That was the picture, but these are two different terms on the procurement sheet.

Link in context Link

Rufus Pollock

That’s great. The only thing I’d point out is that cloud procurement is just like traditional software buying, lock‑in. The problems of lock‑in to a cloud provider are just as big.

Link in context Link

Rufus Pollock

If you look at the cloud providers and you go and look at the ones in California, like Workday, any of the really big ones, and you look at their revenue recognition and the way they think about it, they’re clearly willing to lose massive amounts of money to acquire customers.

Link in context Link

Rufus Pollock

Why are they willing to do that?

Link in context Link

Audrey Tang

The want to sell the data?

Link in context Link

Rufus Pollock

No, not because they want to sell the data.

Link in context Link

Audrey Tang

OK, maybe not... [laughs]

Link in context Link

Rufus Pollock

Because you’re locked in. Once you’re on their platform, once you’re on Workday for HR, you’re on Salesforce, or you’re on these platforms, you are not going to switch off easily.

Link in context Link

Audrey Tang

Yeah...

Link in context Link

Rufus Pollock

They are running the same model that old software...that Windows ran. It’s no different. It’s just a recurring revenue rather than a fixed...Windows, you still pay for upgrades. People think it’s so different. Windows used to upgrade every two, three, four years. You bought it at the time, and then you upgraded four years later, but you kept having to upgrade.

Link in context Link

Rufus Pollock

If we keep going down, we’re nearly finished. But my advice for you guys, that’s the agile and lean part, and we go open offsets, it’s the Heartbleed bug.

Link in context Link

Rufus Pollock

Start tracking use. My question to you guys would be to say, in terms of policy that you could have, one is could you start tracking use? Is there any way to set up a small, at the beginning, open fund? Say to government, "Hey, could we take some of this cloud budget, maybe five percent right now, and put it in a fund?"

Link in context Link

Rufus Pollock

"It’s completely limited to open software. We’re not paying for stuff we actually use, like a SaaS license, but we look to the software inside of it. We’re paying the core developers. If EtherCalc gets used or Etherpad or HackMD or WordPress or CKAN gets used, you pay the original developers."

Link in context Link

Audrey Tang

We’ve been making some changes.

Link in context Link

Audrey Tang

Literally the first thing I did as the Digital Minister is to change the procurement rules.

Link in context Link

Rufus Pollock

Wow, it’s so great.

Link in context Link

Audrey Tang

I’ve been thinking about this for years.

Link in context Link

Rufus Pollock

You’ve got to do it. Ideas are cheap. Implementation...

Link in context Link

Audrey Tang

Right, and then we put it up for 60 days of public consultation. I think it’s drawing to a close now. Let’s look at the actual comments. It’s very tricky.

Link in context Link

Audrey Tang

There’s 14 days left. We can’t really do anything before that. I’m sorry that it still says OpenAPI, but it will say common API standards at some point.

Link in context Link

Rufus Pollock

Where’s the open fund in that proposal? Is there a proposal to A, in your procurement...how are you going to measure...let’s say procurement thing, and you’re going to have points, just to get really practical. You say, "OK, we want open software, and we’re going to give points in procurement for software being open."

Link in context Link

Rufus Pollock

How will you assess that? Do you check with the vendor? What’s the verification process to check the stack they supply is actually open?

Link in context Link

Audrey Tang

This is where the cloud procurement and the spec — or agile — procurement differs.

Link in context Link

Audrey Tang

We don’t have a good story for the latter; this is what all of us are painfully aware of.

Link in context Link

Audrey Tang

You can say, "You must use PostgreSQL," but you must know you want PostgreSQL going in. For many government agencies, this is simply out of their consideration, so they just say, "OK, the web application source code must be open," or something.

Link in context Link

Audrey Tang

Then it ends up paying a lot on Oracle license, which is your classic example.

Link in context Link

Rufus Pollock

It must be the whole stack, not just the...

Link in context Link

Audrey Tang

It’s true. We don’t have a good story here.

Link in context Link

Rufus Pollock

Without that, it doesn’t mean that much. That’s the problem. Without that...

Link in context Link

Audrey Tang

It just means you can swap out the top-layer application vendors.

Link in context Link

Rufus Pollock

The vendors are smart at hacking the system.

Link in context Link

Audrey Tang

Yes. That means the new vendor must still know Oracle to get the winning bid. Even they get a whole open source, open data system, they must know Oracle.

Link in context Link

Audrey Tang

We haven’t solved that.

Link in context Link

Audrey Tang

However, for the cloud part, this is my main target of this procurement change. First, it’s software‑as‑a‑service, but we’re insisting now running on local infrastructure.

Link in context Link

Audrey Tang

That actually rules out pretty much everybody who relies on this sticky lock-in. Even if they say it’s proprietary software, it still has to run on the data center here.

Link in context Link

Audrey Tang

We do it not for data localization purposes, but really for know‑how localization, so that people can locally inspect what’s going on.

Link in context Link

Audrey Tang

As you probably are aware, if they use Oracle and the Oracle Access Manager’s stored procedures, a lot of the procedures is plain text. It’s not binaries.

Link in context Link

Audrey Tang

Even if we don’t manage to convince to get the entire lower stack binaries to convert to open software, you can still inspect pretty much everything, during a running system, to figure out how it’s working.

Link in context Link

Rufus Pollock

I got that. Yes, it is great.

Link in context Link

Rufus Pollock

I guess my suggestion is, and I remain at your disposal, would be, one, see if you can get these numbers, which is, "How much was spent?" and "How much was spent on open?"

Link in context Link

Audrey Tang

We can measure that for cloud procurements.

Link in context Link

Rufus Pollock

Two, could you try getting open funds set up, whose sole purpose was to say, "We paid for the open software we used last year." It might be open SSL Library, it might be WordPress, it might be CKAN, but run an experiment. It could be two percent of the IT budget to say, "Let’s run an experiment next year and see if this delivers us value for money," and see what happens.

Link in context Link

Rufus Pollock

You could even limit it and say, "We’ll only spend it in Taiwan," which would be really good for the government. We’ll say, "We’ll spend it on open software, but it has to be a CKAN developer in Taiwan, it has to be a WordPress developer in Taiwan, it has to someone...we will spend this money, but it will be with local economy," which is very nice. The money stays in the economy. Just try and run and open fund.

Link in context Link

Rufus Pollock

The other is to try and write a rule in about assessing the entire stack. Don’t put the burden on the government. You could say, "A vendor, when bidding, has to assess what percentage," and you could do it in value terms or something. You could say to them, "In terms of the cost of running this stack, what percentage of it is open software?"

Link in context Link

Rufus Pollock

It’s up to you to say, if you’re going to spend $50,000 running X but you’re going to spend $100,000 buying Oracle licenses, the $50,000 means you’re only doing a third...

Link in context Link

Audrey Tang

There’s a part in the Digital Nation Plan that develops automated tools to look at all the source code and binaries that a bidding vendor submits, and then try to figure out, first, how much of it is open licensed, how much of it is Creative Commons license, which may not be open, by the way.

Link in context Link

Rufus Pollock

Yes. We can just say open license. Don’t say CC license. You say open license and non‑open license.

Link in context Link

Audrey Tang

Well, even if it’s CC‑ND, we want to know its license.

Link in context Link

Rufus Pollock

Then that’s great, but it’s like going into the discussion of all the weird freeware. I’d say open license, and then you say liberal license...

Link in context Link

Audrey Tang

Yeah. Technically it’s "free culture license" if we are talking in a CC way.

Link in context Link

Rufus Pollock

Don’t worry about the free cultural works definition. Open Definition covers all of that.

Link in context Link

Audrey Tang

I’m aware of that. I’m just saying there’s non‑code part, too.

Link in context Link

Rufus Pollock

Yeah, exactly.

Link in context Link

Audrey Tang

Saying "open software" doesn’t cover the whole of it, the non-software parts.

Link in context Link

Rufus Pollock

That’s a very good point.

Link in context Link

Audrey Tang

In any case, what I’m saying is that we are, as part of the Digital Nation Plan, developing this automated assessment tool, so we can get some useful numbers out of it. A breakdown of all the licenses, count of lines, and so on.

Link in context Link

Audrey Tang

The other easy part is to get the local talents and everybody trained to specific technologies...we count them as common things in the stacks, like Tensorflow, OpenStack, maybe Docker or something. Then we say, "OK, so for these critical parts of the infrastructure, since all the procurements..."

Link in context Link

Audrey Tang

You see the lines of code that use Docker, for example. Then we say, "OK, it’s important to have the local Docker community that is able to maintain this kind of thing." Then we grade people with three, maybe four levels, like being able to install and use it, maybe able to customize it, and to be able to contribute to it, for example.

Link in context Link

Audrey Tang

Then we want this many people empowered over four years. That’s part of the Digital Nation Plan. It’s in it already.

Link in context Link

Audrey Tang

The hard part is to tie this training budget with the cloud usage numbers. Currently in the Digital Nation Plan, these are completely different funds. This is for the enrichment of the community, and this is for reducing the licensing costs. We’re doing both, but not as the same project.

Link in context Link

Rufus Pollock

Maybe, first, if you’re going to write it on your piece of paper, because then it’ll go into your notes, I said there were three things I would suggest.

Link in context Link

Rufus Pollock

One was stacks, just how much is spent on open.

Link in context Link

Audrey Tang

Got it.

Link in context Link

Rufus Pollock

Number two is an open fund of five percent of the IP budget. That means that money is spent on an automated, algorithmic basis, based on just what open information we use. It could be content. By the way, it doesn’t have to be software, but just of X percent.

Link in context Link

Rufus Pollock

The third item is percentage, basically a procurement rule where you assess the percentage of open info in a bid, and there’s a points weighting for that. By the way, the points weighting must be very heavily on 100 percent. People make this mistake that somehow 50 percent open is as good as 100 percent open.

Link in context Link

Rufus Pollock

Unfortunately, it’s like imagine in all those movies...has anyone seen those James Bond movies, where at the very end he just has to remove one thing from the bomb to stop it detonating? Similarly, if you have an entirely open system, but one crucial part of it is closed, then all the value resides there. You need to give a big value for 100 percent, and then the moment you’re under 100 percent...

Link in context Link

Audrey Tang

You want a curve like a flipped power-law graph.

Link in context Link

Rufus Pollock

Yeah, exactly. You want to score a curve like that.

Link in context Link

Rufus Pollock

Otherwise, people just game the system with, "Oh, I got a little bit." It’s true that it’s better to have 50 percent than zero percent, but most will go, "Well, we’re using Python on Azure," or something like that. Yes, but...

Link in context Link

Rufus Pollock

I’m at time with you guys. I’ve got to get to another meeting in a moment, but I’ve really, really appreciated this, and really appreciated going through it with all of you guys.

Link in context Link

Rufus Pollock

Please be in touch. I know, Audrey, I’ve got your email. You’ve got my book on GitLab. Please keep in touch. If there’s anything else on anyone’s mind before I go today?

Link in context Link

Audrey Tang

Everybody can Tweet at @rufuspollock.

Link in context Link

Rufus Pollock

We can Tweet everybody.

Link in context Link

Audrey Tang

He likes and he re‑Tweets. Now I know this.

Link in context Link

Rufus Pollock

I check it every now and then...

Link in context Link

Rufus Pollock

It’s really good to see everyone.

Link in context Link

Audrey Tang

Yup.

Link in context Link

Rufus Pollock

Thank you. One last thing, could I get a picture. I’d like to put a picture up.

Link in context Link

(laughter)

Link in context Link

Rufus Pollock

I don’t normally do this, but...

Link in context Link

Shuyang Lin

I’ll pull it together with the record file.

Link in context Link

Rufus Pollock

Can you take a picture as well?

Link in context Link

Shuyang Lin

Yeah.

Link in context Link

Rufus Pollock

Can you take one? That’s probably even better, and you can send it to me under a CC BY license.

Link in context Link

Audrey Tang

We do CC0 here.

Link in context Link

Rufus Pollock

CC Zero is...

Link in context Link

Audrey Tang

...is compatible with everything.

Link in context Link

Shuyang Lin

Three, two, one. There you are.

Link in context Link

Rufus Pollock

Is that good? Are you happy?

Link in context Link

Shuyang Lin

One, two, three, cheese.

Link in context Link

Rufus Pollock

One, two, three, cheese.

Link in context Link

Shuyang Lin

Yay.

Link in context Link

Rufus Pollock

Yeah.

Link in context Link

Shuyang Lin

OK.

Link in context Link

Rufus Pollock

Thank you very much.

Link in context Link

Shuyang Lin

Thank you very much.

Link in context Link

Rufus Pollock

Real pleasure. 謝謝.

Link in context Link