It remains controversial, because according to our current privacy law, for every collecting agency, its governing ministry is the data protection agency.
Right, exactly. Then the outcome, when you calculate the average, the mean, whatever linear regression you want to run with it, it’s just not very useful. It is privacy-protecting though. That’s the only case so far that we’ve done as to using this standard.
If I am alone in a basic statistical area, the size of a county or a township, and I’m the only one earning above a certain amount of money, then my data is going to be removed from the data set.
To protect privacy sufficiently, they used k-anonymity, which is a crude way to anonymize this data. It says that one must not be distinguished from a group of, say, 25 people or so.
In practice, it is actually very difficult. So far, the only case in Taiwan that has completed this CNS 29100 process is one that outlines the personal income in all the different areas, so that you can know how income changes, year after year, in the average residence. It’s obviously ...
That’s exactly right. After processing, if it is not personal data anymore, then it’s free to just hand it over to the open data platform or some other agency.
Of course, statistics can be sold and commercially used just like any other open data. The thing is that the court has ruled that the unit collecting the data, the personal data, must be the same one that processes this data. It can’t turn it over to some other company ...
If it is just statistics, then the privacy law doesn’t govern it. Right?
Beyond that, what exactly is this "statistical use" when compared to raw personal data? There was a CNS standard, the one you just mentioned, CNS 29100, that says after a certain de-identification process, then we can use the results because it has minimal privacy impact for statistical use, instead of ...
There’s no such clause in the Taiwan counterpart. Instead, we have crime prevention in the same position. This says something about the values that the legislators care about.
The law protects a few uses. Some are pretty common, such as I mentioned for research, in many EU countries, there’s a special clause for historians to protect the archival or the interpretation of history.
Alternatively, it must be for the public good, but it may also used in a statistical way, not the raw data.
That’s exactly right. The law is pretty clear. You can only use it for academic research purposes. In a lot of EU countries, the enactment of this is similar to that in Taiwan, saying that it must be used for the public good and for research.
The other contested point is the so-called uninformed use outside its original collective purpose.
There’s a few issues in the new privacy law for us. The old contested issue, when we had that debate, was on health and other sensitive information. There was a section in the privacy law that required a much more strict measurement of the data protection endeavors. Criteria for this ...
OK. We do have a data protection law here. It was largely outlined after the previous EU data privacy law. It is inherently pretty compatible with the new directive improved by the Article 29 Working Party.
I’ll be recording this. Is this OK with you?
...is compatible with everything.
We do CC0 here.
Yup.
He likes and he re‑Tweets. Now I know this.
Everybody can Tweet at @rufuspollock.
You want a curve like a flipped power-law graph.
Got it.
The hard part is to tie this training budget with the cloud usage numbers. Currently in the Digital Nation Plan, these are completely different funds. This is for the enrichment of the community, and this is for reducing the licensing costs. We’re doing both, but not as the same project.
Then we want this many people empowered over four years. That’s part of the Digital Nation Plan. It’s in it already.
You see the lines of code that use Docker, for example. Then we say, "OK, it’s important to have the local Docker community that is able to maintain this kind of thing." Then we grade people with three, maybe four levels, like being able to install and use it, maybe ...
The other easy part is to get the local talents and everybody trained to specific technologies...we count them as common things in the stacks, like Tensorflow, OpenStack, maybe Docker or something. Then we say, "OK, so for these critical parts of the infrastructure, since all the procurements..."
In any case, what I’m saying is that we are, as part of the Digital Nation Plan, developing this automated assessment tool, so we can get some useful numbers out of it. A breakdown of all the licenses, count of lines, and so on.
Saying "open software" doesn’t cover the whole of it, the non-software parts.
I’m aware of that. I’m just saying there’s non‑code part, too.
Yeah. Technically it’s "free culture license" if we are talking in a CC way.
Well, even if it’s CC‑ND, we want to know its license.
There’s a part in the Digital Nation Plan that develops automated tools to look at all the source code and binaries that a bidding vendor submits, and then try to figure out, first, how much of it is open licensed, how much of it is Creative Commons license, which may ...
We can measure that for cloud procurements.
Even if we don’t manage to convince to get the entire lower stack binaries to convert to open software, you can still inspect pretty much everything, during a running system, to figure out how it’s working.
As you probably are aware, if they use Oracle and the Oracle Access Manager’s stored procedures, a lot of the procedures is plain text. It’s not binaries.
We do it not for data localization purposes, but really for know‑how localization, so that people can locally inspect what’s going on.
That actually rules out pretty much everybody who relies on this sticky lock-in. Even if they say it’s proprietary software, it still has to run on the data center here.
However, for the cloud part, this is my main target of this procurement change. First, it’s software‑as‑a‑service, but we’re insisting now running on local infrastructure.
We haven’t solved that.
Yes. That means the new vendor must still know Oracle to get the winning bid. Even they get a whole open source, open data system, they must know Oracle.
It just means you can swap out the top-layer application vendors.
It’s true. We don’t have a good story here.
Then it ends up paying a lot on Oracle license, which is your classic example.
You can say, "You must use PostgreSQL," but you must know you want PostgreSQL going in. For many government agencies, this is simply out of their consideration, so they just say, "OK, the web application source code must be open," or something.
We don’t have a good story for the latter; this is what all of us are painfully aware of.
This is where the cloud procurement and the spec — or agile — procurement differs.
There’s 14 days left. We can’t really do anything before that. I’m sorry that it still says OpenAPI, but it will say common API standards at some point.
Right, and then we put it up for 60 days of public consultation. I think it’s drawing to a close now. Let’s look at the actual comments. It’s very tricky.