-
Welcome to Taiwan. I read through your slides.
-
Oh, okay. Yeah, I think we work on that intersection. I think for a little time, trust with the AI and also the European initiative on this. And also, right now with the EU AI Act, that’s something we’re interested in. Also making the bridge to industry, how they will incorporate this.
-
And I think we’ve already worked on robustness, privacy. And with the language models, I think it’s like a magnification glass with some of the problems we have been seeing so far. And that’s something that’s keeping us busy right now.
-
There are some populational and national problems, but also the ecosystem that’s probably built on top of the language models is something we’re kind of looking at more closely right now.
-
Yeah, I saw your picture of the stones tower, and I thought, actually, the ecosystem is growing even higher.
-
(laughter)
-
Yeah, yeah, that’s a big deal. Yeah, I think it was the Biden administration that yesterday said they signed an executive order to be more transparent, non-discriminatory, privacy-preserving.
-
Yeah, 10²⁶ flops as a cap.
-
Yeah, I’m not sure how much this will help with the capping. I think we see in racing; they have something similar for simulation. They cap the flops they can use. But yeah, then there is the question that really helps, so to say.
-
And also in the EU, there’s the efforts to build on language models to, because it kind of mirrors the culture, so preserve culture, preserve the speech. But also, I think, matters of digital sovereignty. I think that we build on top of this, we get very dependent on the few providers. I guess that’s another concern.
-
Yeah, I think that’s the original vision behind the big science project, and the Bloom project, and so on. It’s just that it’s open access, and everybody can fine-tune based on the languages out of a multicultural base. Which necessarily is open access, actually. And that’s the behavior that we have right now in the rest of the country.
-
Yeah, and also, I think from cyber security, pretty much, although the history shows that if you have open models, it’s better for academia and transparency in testing.
-
And from our interaction with Microsoft and also OpenAI, they have been kind of very conservative on this side. I think even their security teams haven’t been involved as much in the testing phases. So, we hope that in the future, we just get more open to public penetration testing, but also making the results more openly available.
-
Yeah, I think that open red team is essential. It’s one of the few confidence-building mechanisms that can bridge the concerns of safety on one side, and the progress of the past models, and also cultural justice on the other. Because if it’s open red team, then the incentive will change. We’ll be the top labs finding each other’s vulnerabilities as we have in the cybersecurity world now, with responsible disclosure. But there’s no similar culture now between the top frontier labs.
-
Not so much. Yeah, there are still technology gaps, but I think there have been some works that have shown that as transfer, that they have attacked open models, and then the attacks have been transfer.
-
So, there’s some leverage we can take from the source models to understand vulnerabilities that can mimic white box attacks. But it’s a wide gap, and we hope there’s more transparency and openness.
-
I think the current red teaming efforts, they were closed. I think they pay money, but they… They already closed.
-
Yeah, it’s like pentesting retainers, basically.
-
Yeah.
-
But we want open red teaming. I actually just signed on a statement for the openness and transparency, with the usual suspects — Mozilla, Linux Foundation, Creative Commons. I can send you the statement.
-
Oh, okay.
-
Because people… I think it would be a mistake to think only proprietary security are safe. This is a conversation we had back in the original export.
-
(laughter)
-
At times, yeah. And now we’re… At a time, we argue that code is speech. And now speech becomes code. We have to have that conversation again.
-
I think there’s also so many lessons learned from cybersecurity, and a bit of history is keeping repeating that right now. So really basic principles of cybersecurity that are just violated, I think that’s one argument making that instruction and code is not separate, basically, in the models of trusted and untrusted. So, it’s like 101 of cybersecurity.
-
That’s right. That’s right.
-
That’s not implemented right now.
-
Yeah, you know, not execute regions or separate execute bits for permission. That was, what, 40 years ago?
-
Yeah. And that’s what all the prompt injection, that’s exactly just repeating the same issue. That’s a bit frustrating.
-
We also, with this European network, I’m coordinating… We also put out a strategic research agenda on safe and secure AI. We wrote this in August. It’s not yet published, but we’ll publish it in the next weeks. We also demand more openness and more transparency and also getting more from academia, more in this, and also, I guess, more governance processes in terms of trust in their hopes, so to say, process. That idea would also involve accountability and transparency to the public. Again, that’s something we do as a routine effort.
-
Excellent. How do we join? Where do I sign?
-
(laughter)
-
Yeah, I guess it’s one of the voices. It’s good if people have more voices and it will impact the policy making, I guess. I think that’s a good thing. I’m not sure how much we see concerted efforts across different countries, I’m not sure.
-
Yeah, so my other half, being the chair of the National Institute of Cyber Security, we’re working on institute-to-institute relationship like with the US NIST Standards and Technology, which in the new executive order now has to massively expand, because they didn’t used to have anthropologists for the societal risk evaluation, and all this is now NIST’s job to do so. And so, the director of NIST recently visited Taiwan and had a very long conversation about this.
-
Essentially, what’s traditional cybersecurity need now to be expanded. For example, the deepfake voice cloning and things like that, which is like spear phishing, but operating on persuasion attacks, which is a very new thing, actually. And all this broadens the traditional cybersecurity purview to basically necessitate something like a hotline or something for the societal impact to be noticed and measured directly.
-
And this is harder than just measuring carbon footprint or Freon’s ozone depletion or things like that because those sensors, they don’t infringe privacy, but as a democratic government, we cannot actually install sensors to all those conversations that people are having with their over-reliance and addiction and whatever those AI chatbots.
-
So, I think this is a very interesting domain for us to work together and we’re probably joining forces with NIST and the institute to institute the basis by setting up our side, the AI Evaluation Center in here.
-
Okay. We’re also working in… I think NIST also had this MITRE ATLAS, which is basically a taxonomy of AI threats. So, we’re also in contact with them in designing the threat taxonomy for large language models. So, as we have pioneered some of the work, again, we’re trying to influence that also, they’ve been categorizing threats, but also trying to record incidences.
-
And I also very much agree with… I think sometimes I like to call this the shift from static to dynamic or personalized misinformation. But I also am very unclear what kind of the more latent or subtle impact it will have on language or conversation in general. I think these are things, I think like social media, we might only learn on the longer run how it has changed communication or language as a whole. There’s maybe a different tangent to the direct impacts of cybersecurity risks.
-
And again, I think in Europe there will be a right to know if you talk to a human or a machine or a robot. Technically enforcing this, I’m quite pessimistic. Again, there will be legal ways to enforce it, but to have sustained solutions to detect, I think this is a bit unclear, maybe not even possible.
-
I think also with watermarking, we have worked on watermarking for example quite a bit…
-
I think it’s information theoretically.
-
Exactly, that’s basically… but it’s even more weirder that there’s so much attention given right now to the companies who propose these solutions. Again, we also published on this, but I thought we had a balanced statement that this is something…
-
We also had a discussion with the EU Commission because they thought of implementing this, but then they also say there’s a certain cost to it, maintaining an infrastructure that maintains watermarks and needs to be updated all the time. And then the appreciable added value in security was at least questionable…
-
Yeah, and it’s a security threat, everybody knows that.
-
I think there are certain areas, again, there’s kind of… I think it’s a whole range of risk in different economies. So, there’s some low-profile things, but maybe people use this for certain attacks, also basically generating videos of people and so on in different contexts, and maybe these have access to low technology solutions, and maybe these could be stopped, so to say, but then really having risk information at scale, maybe even with state actors, there’s probably nothing that can be prevented with such a technique.
-
So, it’s, again, it was a debate if there’s appreciable gain, but overall believing that’s a sustainable solution, that would be indeed a theatre.
-
Yeah, because in terms of the DISARM framework, Actor, Behavior, Content, Distribution, ABCD, anything that works like watermarking on the Content layer, or as we mentioned, forced disclosure, that you’re talking to a bot, which includes a translator. Does it include a translator?
-
Anyway, at the Distribution layer, this is something is asymmetrical to the attacker’s favor. That is to say the defender has to spend more and more resources, as the attackers actually, they get even lower cost because of open-source models and fine-tuned models and so on. So as time goes by, the attackers would mount even larger scale attacks, and the defenders on Content/Distribution layer will have to spend even more in upgrading the defense, and that’s simply not sustainable. That’s my main argument.
-
Yeah, which is why we are shifting to the Actor and Behavior layers now. Like we just announced that all the governmental SMS will soon come from a single SMS number, 111. So, everything else, even though it mimics a perfectly carbon copy of the text message, interactive ones, or even mimics an official calling you, or things like that, is easily spotted as fake because they would have to do it in an 8-digit or 10-digit phone number. But everybody will assume that all the governmental communication will only come from a short number, like 3-number or 4-number.
-
So, this is more like authentication provenance. And the idea is to flip the default very quickly so that only authenticated communication actors are human, and the people you have to meet face-to-face like we do now in my address book are human, but everybody else is assumed to be a bot.
-
Yeah. On the government, I’m still not sure if there’s any fault in this, but there’s a substantial risk of information inflation, so a devaluation of information as… There are some very pessimistic predictions that maybe up to 90% of all content will be synthesized, resynthesized.
-
Yeah, complete complex collapse.
-
And for an information society, that’s shaping its core, really. So, there are some efforts to more protect authenticity and attribution or have new media formats, and that’s my domain. I see there are some systematic approaches to attribution and data provenance, but I don’t see it implemented in that scale. Maybe first for news and trusted information sources, but I don’t see the roadmap for this right now, at the pace probably we would need it right now.
-
No, when asked, I think by Lawrence Lessig, a professor, what’s my probability of doom for informational collapse, I said 99%.
-
(laughter)
-
This is sure to happen. I don’t think anything prevents that now.
-
Okay. So, there’s more pessimism…
-
(laughter)
-
The thing, though, is that we have also recovered from the pandemic itself, actually, and also, the infodemic in conjunction with the pandemic. And the reason why we recovered is not that we eradicated the coronavirus. We did not. But rather, we invented vaccines and cure and so on. So, although the initial quarantine or initial lockdowns and so on are not sustainable, as everybody knows now, it, at least in Taiwan, did buy us time, until such a time where people are vaccinated and had a good cure.
-
And so, I think a lot of research needs to go on what’s the inoculation like to maintain information integrity. We call it pre-bunking, like just letting people know that there is this spring going on and so that people will become less suspectible for the swarm of misinformation.
-
But when you say information collapse, would that imply that there’s a point where society becomes alienated from technology and we just go back to smaller circles, personal communication? I’m not sure what that implies. It doesn’t sound very…
-
It’s like the original BGP idea, the web of trust. You only keep in contact with people you have performed a certain ritual or ceremony.
-
Yeah, okay. I agree, but I think it’s probably difficult because we have… Digitalization has gone so far and a lot of the speed and the leverage of scale has built on this, so we would need to step back on many of these. Basically, I’m not sure if society or economy is ready to do this now.
-
Yeah, but we did that during the pandemic. Yeah, and for physical, of course, not digital. So, it would be like the flip.
-
I’m still trying to imagine it. Interesting.
-
So, we will meet more face-to-face, is what I’m saying. Maybe not.
-
Yeah, I think something is more going in the right direction. Some things, disruptive events make it unpredictable in a sense. But it’s just a singular point where people continually give up on this. I think it’s also a real risk with the advance of technology. All this synthesis that there will be a certain, I think, media competence that’s built-in society, but at a much slower pace, I would say.
-
So, maybe there’s society in a few years’ time that can handle the scale of communication, the amount of untrusted information, but maybe not this generation or the next one. So, that’s one of the issues. But how then to mitigate the time between, I think, is a good question.
-
One of my proposals is just to give everybody their own open-source AI as an assistive intelligence. Basically, I have fine-tuned a… I think, initially 70 billion, but now just 13 billion model on this laptop, which never leaves my laptop, that not just drafts my email for me, but also summarizes it and things like that.
-
And so, just the fact that this can be done in a laptop, to me, means that people are not going to, by hand, face this onslaught of information manipulation. There could be personal computing, their personal assistant, a certificate that works with just the dignity of this person or this community instead of a virtual, very large Facebook-like email that has no transparency. That’s always the Mozilla and EFF vision, but I think the pressure is now to make this a reality sooner than the whole informational collapse.
-
I think also you say… That’s also a good question, or something we try to wrap our head around is, right now we’re using all the written code or text as information channels, so to say. So, maybe in some domains, maybe there might be also evolution that for some we don’t want to have anymore. So, maybe if we use language models to generate text, the next person uses the language model to consume the text… Again, there could be other structured forms that have accountability or human inspect ability.
-
I think for code, this is also a question, again, there are some examples where you say, you just say there is an API and all of a sudden, the language model mimics the API. So, it’s kind of this rapid prototyping where you never implement the code, it’s just the description becomes the functionality in a sense.
-
So, I think that’s also something we also work with language models in code, with also security vulnerabilities and so on, and making them more reliable. And we always think, where is the point we develop different forms to store the information. As also, again, some of the benefits that we hope for is maybe an AI machine that can handle more complex models that we even comprehend. So, when is the point that we don’t even want to write them, when is there no use to write them down anymore because they become human, incomprehensible, so to say.
-
But I think for now, I guess, there is a certain level of accountability of spelling things out, that they are inspectable, but even for measures of robustness or certification, you probably want to have other syntax or other logical forms that are being, as a typical representation, not necessarily text. Because text has all the problems, there is no syntax, they can be all injections and they also not lend themselves to any form of verification, for example.
-
Exactly.
-
So that is something we haven’t fully figured out, but we kind of think that might be even more disruptive to the changes of how we store, process and also write code, for example. So, I think we just have started brainstorming in my group a bit, how it would change software engineering in the future.
-
Exactly, as we described, our current reliance on text or diploma or proof or certification that is in a textual mode. If we go beyond that, one thought is just to adopt some norms in communities where, you know, this rampant scam by robots is already a norm. And I’m talking about the blockchain community, which is the zero or negative trust space. And they have now taken a norm that all speech that counts need to have like a zero-knowledge designated verifier proof and so on. And that changes the norm. So, everything else, including every tweet, is considered as a scam play. And only things that have the kind of designated proofs on chain and things like that are considered speech.
-
And so, that is one very dystopic, but something we can learn from in the future as well.
-
Yeah.
-
Or otherwise, we just wish to post symbolic and just have tea together.
-
(laughter)
-
The words we say are just captions to the tea we share.
-
I feel like that is the most central element of life.
-
(laughter)
-
Interesting. Otherwise, maybe one thing I would like to learn more about is the testing center you mentioned.
-
So, you mentioned there is an AI testing facility and certification. I’m not so much into the, we do some technical methodological certification more for the improving side. I’m less about how to do certification from legal or how to bridge this gap. But also, in my network, there are people interested and we also try to engage industry more who are actually facing those issues. So, if anything you can share…
-
We actually have a presentation about… I can send you the slide deck. Or are you going to, anyone going to present? No? Okay. Well, I can just share the presentation.
-
So, the basic idea to overcome this contextual collapse, I call it Plurality. This is from my ongoing book, Plurality.net. The idea is to turn the conflicts that will stem from the collapse of context into something that is collaborative. Take all the existing content that will probably be demolished quickly and make it include… This is a post symbolic communication. Also, all sorts of different ways, like using AI to summarize ongoing deliberations and things like that, so that it’s more collaborative and more diverse at the same time. The AIEC is in the service of these technologies. I can very quickly show you some quick demos.
-
You mean for many deliberations, like negotiations?
-
Yes. So, for example, this is a concrete one. Basically, a returning citizen association in Michigan just asks the returning citizens whatever they feel about the current prison system. Basically, this is using the prison system as a social object to invite people to have a conversation. All this is documented in email videos.
-
The language model just ingests all this and identifies clusters of common voices or common arguments. It not just highlights the representative comments or do the cluster analysis or do synthetic bridge-making narratives between those groups…
-
It’s a collective opinion, so to say…
-
Right. It’s an avatar of several aspects of a group, a plurality that we can have a real conversation with. Here is an example of our internal instance so that we can turn any conversation into avatars. So, there was one on Boeing Green. There was one we run in two workshops in Tainan and Taipei…
-
What do you mean by]avatar? I mentioned there’s a single instance of an avatar. What is it more like?
-
An aspect, as an avatar. So, you can talk with this cluster and you can just…
-
I just have prototypical opinions with the representatives of that collective dissent…
-
Right, right. And so, it can synthesize a thought based on these elements. It’s an executive summary that talks to you.
-
It’s basically… I think in Germany they come back to these I think committees is not like basic group demography, but they basically get a small set of people together.
-
Yeah, the citizen assemblies.
-
Ah, yes. The citizen assemblies. So, that’s kind of a virtual version of the citizen assemblies. It gives these representatives, you can…
-
Yeah, instead of aggregating individual preferences, which is very suspectible to the persuasion attack you just mentioned, we instead operate on the units of statistically reasonable citizens. And that becomes a voter, not a single voter. This synthetic voter, which always goes to the source with citation and dignity and all that, then becomes a unit of deliberation in a higher level of conversation. So, maybe one such group can represent a river, and maybe one such group can represent a mountain.
-
Wendy here just attended Meta’s online event. They work with Stanford to basically turn such conversations into something deliberative, through Deliberative Polling®.
-
Actually, maybe you might be interested. We just had a work on LLM deliberation. Basically, we had a multiple LLMs negotiating a deal. These could basically be these different representatives. For example, we had one from the environmental organization, one was the local housing organization. Then, basically, they go around the table and always bring forth their arguments. They need to reach a common deal that is above the threshold. Then we also investigated this adversarial one. Basically, one is trying to be more selfish. Then we looked at what kind of absolute value they reach globally, but also for individuals. I think that’s very interesting in that sense.
-
Yes. And as part of the testing and verification center, as I mentioned, the conversations we held in Taiwan, including online, Taipei, and Tainan, made it possible for us to have some constitutional guidelines to the science minister’s language model. So, they’re tuning it based on that model. We also published with Anthropic the same idea, allowing assemblies.
-
And then, now, I think the team is synthesizing 1,000 red team things. So, for example, if one principle is one should always strive to be fair and balanced among nations, then the 100 penetration question will basically all about hate and bigotry and things like that.
-
Try to ask the LLM very biased questions. If it all answers in a way that scores higher in terms of fairness and so on, then we say that it’s constitutionally more aligned to the original consensus. And the great thing about this is that this can be done every day. Once this group, like the Michigan group, saw the language model that’s tuned and the red team attacks and its results, they can think of more ways to tune it or more principles that it should adhere to. So, it’s more like a co-domestication with LLM.
-
Very fascinating. One thing I didn’t quite understand is you have these representatives, but then how is a deliberation achieved? Is it still a human-to-loop or are they actually negotiating like in our example? I didn’t understand how to come from this committee to an actual policy or decision.
-
Yeah, there are many ways to do that. We primarily use Polis, which is a wiki survey. So, basically…
-
The language models fill out the survey?
-
Yeah. With the language models’ help, we start seeding some statements. People can upload or download it. People can also write their own statements for other people to vote on.
-
And another platform that we use is called All Our Ideas. The idea is to rank the ideas that stand out. Like when forced to choose between two principles, what we will prefer. So, the idea of All Our Ideas is that for New York City, whether fixing streets is more important or speed of camera is more important, and you can keep saying this or I can’t decide. And you can ask why and you can add your own idea and so on. And we work with OpenAI to ask statistically representative sample of 1,000 American citizens on the priorities they want to see OpenAI steer toward.
-
I had an online conversation, a Zoom meeting, between some of these people and OpenAI people. So, it’s usually three steps, right? One, with the sample citizens, setting an online agenda. Second, using the agenda to have a conversation with stakeholders. Finally, turning the result of these two steps into something that aligns or steers AI systems.
-
Okay, wow. I’ve never heard of this before. It’s very fascinating. Is it more like a re-task?
-
Yeah, it’s called Alignment Assemblies. The partnership we’re in, the Collective Intelligence Project, CIP, is one of the seats at the table at the UK summit, so they’re just pushing this agenda on our behalf. And once we have such a tuned model or aligned model, and what we call social eval, the society co-determined evaluation criteria, then it is up to the AIUCC to run that. And so, the evaluation will be very dynamic and in fact includes more aspects to other, like MITRE ATLAS and so on.
-
And so, as you mentioned, the MITRE ATLAS or the EU guidelines usually focus on the bottom half of this chart. But actually, the people’s voice, or the people’s AI, wants it to be explainable and resilient and things like that. And it’s the grey areas, literally on the left, that are difficult to measure. As you know, purely automatically. And so, that’s where the collective intelligence and so on enters play, because it takes societal evaluation to those grey areas of societal scale risks.
-
So, it’s quite simple. We can do currently black tasks based on API, but we actually have the capability to do white box. So basically, look at the neurons as they’re having this conversation. But we will initially only open up this black box testing. This one is something that’s politically important or sensitive, which is not to use PRC nomenclature, even though it is in traditional manner characters… Next slide?
-
Evidently, people in Taiwan, for all the language models sponsored by the government, including the national academy, they insist that the traditional characters they use not just look like from Taiwan, but the vocabulary, the ideology they use may not conform to this Beijing standard. And so, the read teaming and so on, and so on was all about trying to elicit PRC responses in a Taiwanese context. And then, of course, this part, once it’s generated, can be automatically verified. And GPT-3.5, which is just 20 billion parameters, we heard, fails like habit. Next slide.
-
So, we will, at the end of the year, begin with just the language model and image classification, the easy ones, but also the ones that are actually now being used in public sector. Because we have a guideline for public sector that says if you’re in an isolated mode, that’s just not connected to the Internet, you are allowed to use language models to assist with processing personal data, which is not something that you have agreed with, because it’s a huge risk.
-
But we think that if it’s purely edge, like running on my laptop, or essentially in some place that has no way to connect to the Internet, then it actually inoculates the mind, it prepares the public service to the confabulations and hallucinations and things like that, as long as, of course, it doesn’t substitute for human decision, and substitute for confidential document processing and so on.
-
But anyway, Odie has already set up such a chatbot for everybody to chat with, and soon we will pilot the test on the automatic summarization that you just saw, on public inputs, which is an interactive summarization, so to speak. So, these are the kind of uses that we already have, which is why we need to test against prompt injection and adversarial attack, and things like that. Then after a while, we move to other phases.
-
The very last slide shows this whole lifecycle of the plan. So, we collect evaluation criteria from the society, we want to automate all the eval that could be made automated, but we will keep a line open to see the society input and other dispensers that we could mention inputs, so that we can add more test items as the society is currently beginning to be discovered. So, that’s my presentation.
-
Yeah, very fascinating, very advanced roadmap. Again, I think definitely for verification, again, there’s a lot of progress. Again, we also made on different norms and so on. I think for language, that’s obviously very challenging. Again, also the recent results on finding kind of post fixes, I think there was work by people who basically found post fixes…
-
Yeah, the therapist fixes.
-
Yeah, yeah. And he’s thinking about tech, yeah. That showed how many hidden triggers are in these models. I think that’s still quite challenging.
-
We also looked at hallucination recently, so it’s basically, we can’t fully disclose it yet, but it’s basically more like a lie detector, so we believe that these models, when they start to hallucinate, they use kind of other parts of their brain kind of with a different process. So, I think we had some success with this. I think when they base on evidence, they have different activations as if they start to make up things.
-
Ah, okay.
-
So, I think also overall, I think it would be good if we have more of these meta properties, like if you think about language more like a brain, like a control center where we kind of think it’s not operating in this mode or in that mode, maybe it’s activating certain functions. So, there we see some kind of progress. Again, it’s a bit like a lie detector because when you start making up something, you also have different pathways…
-
Or when an adversarial suffix is putting it into hypnosis, then you detect that hypnotizing effect and shut it down.
-
Yeah. And also in a similar way, that’s something we just started on right now, is when we try to detect if they’re being prompted. So basically, we hypothesize, I think it’s something that’s really good - We don’t have any real evidence for this right now, but I think if the language mode receives a command, there’s probably different things happening as if it’s receiving data or executing or doing reasoning.
-
So, this is a way I think we might not be completely stopping injection or indirect prompt injection, which again we did quite a bit and also pioneered at the beginning of the year. But at least we can detect once something is interpreted as an instruction block instead of a data block. So, at least we can’t prevent bad things from happening, but we can detect it and then basically stop there and basically reset, so to say. So, I think that’s at least our current strategies where we have some hope. Otherwise, I think it’s pretty difficult.
-
And maybe for some domains, I think again you’re just moving away from language and rather having something more structured output. I think there’s also the US military firm that has laid out some visions how they want to use language more for military purposes and combat planning. While I do not agree with all these things, at least they have some reasonable plan how to work on more formal languages again. To say like they basically have… they predict formal languages more sort of logical structures that are more amendable to composition and also, maybe to verification in the end. But yeah, I think it’s very interesting to see, very ambitious.
-
I think one of the things that we also think is very challenging is defending against poisoning. We apparently have some early research on training certifications. We know how to do inference certification, but certification… basically, showing robustness for training is very challenging and difficult for randomly initialized… well, it’s probably impossible because any tiny change can be… there is a butterfly effect, you see. But we do have hope for fine tuning, basically.
-
So, I think the way we see this in the future is maybe, hopefully, some company provides us foundation models and they say, okay, they did their duty, they have curated, they have accountability. And that’s actually another interesting thing about liability questions between the foundation model provider and the fine tuning and what’s the final product.
-
But again, so if we think there is a sane foundation model, we might be able to certify the fine-tuning process. Because once you are kind of already on a track, so to say, the impact the data has is much more limited as if you train from scratch. So, there we have some hope that training certification can work out, but it’s a bit in early stages yet. So, we have the first organ preparation, we submitted some stuff.
-
But I think, yeah… And I was a bit torn how much to worry about data poisoning, I think NSA is quite worried about it. It’s difficult to see how this can be done at scale, also sustainable, if you actually want to reach a certain end point as an adversary. But it’s definitely a concern, so I’m not sure.
-
The Beijing PRC ideology is a living example.
-
Ah, okay. Okay, I see. Point taken, yeah…
-
(laughter)
-
Because if it’s not so prevalent the common crawl and everything, it should be easy to get Taiwanese traditional Mandarin output. The whole difficulty of our National Academy researcher and everybody else in getting sensible traditional Mandarin output, like even when I told GPT-4 to speak in #zh-tw exclusively, I could use all the back-stepping questions, the train of thought, all the prompt techniques, still, it sometimes falls into this mode that thinks the PRC is controlling these vocabularies.
-
In fact, one of our MPs just yesterday made this interpolation topic to our premier, saying that no matter how he prompts DALL-E 3 or Stable Diffusion or Midjourney, if he asks for a flag of the ROC, the Republic of China, it always gives a People’s Republic of China flag. And there’s really no way to split those two. I mean, it’s difficult for humans also…
-
(laughter)
-
… but there’s no prompt template that can easily distinguish this. And this is actually a great example of data poisoning.
-
Yeah, yeah. Yeah, I think it’s quite scary what central point these models sit right now in that information ecosystem. They have the mandate to really interpret or guide the understanding of information or selected information. So, there’s so much power if that really manifests in this way, and we see these language models being the interface to all the information, all the information web, so units, and things where you can directly change the summary or basically eliminate or mask certain sources.
-
I think that also falls a bit on our cyber security taxonomy we built on that LLM compute platform. Once you’ve been able to inject or even have persistence, like almost AI malware in that system, it affects the summarization that would be a very powerful and a lot of leverage, basically, either for the companies or people who are able to manipulate that system. So, yeah, one of the quite short-term concerns, I guess.
-
Very much so. I was visiting Israel for Cyber Week and Pentera gave us a demo. So, Pentera is just written in a box, right? It’s a narrow AI that you just install it, it can live off the land using just the tools that this virtual machine has, connected to the intranet, it doesn’t need to be remotely controlled, and just manage to hack your system. Synthesize zero-days on the fly and things like that. And of course, they’re not using generative AI because they’re a product, right? So, they don’t want to accidentally destroy their client’s network.
-
But real adversaries, especially attacking the ability, there’s no such restraints, right? If they just accidentally sabotage the computer center, totally, well, that’s maybe a win for them. So, things that can hack on its own, living off the land, it’s not sometime in the future. This is already a product. So, we live in very interesting times.
-
(laughter)
-
Yeah, to say the least. But then, if I may ask, and you also have ways, do you think we should be more capping these developments? Or like, there’s always, I would say capping or the opinion to cap the development would be more security, more obscurity than just, say, we don’t want to look there.
-
So, I’m a bit split myself. Maybe also I wouldn’t know how to be really actionable in the space to make a sensible way to constrain, also enforce it globally. I think this is really kind of a lost opportunity.
-
Yeah, so there’s a figure a few months ago that says there’s 30 capability researchers to everyone, safety and online and care researcher. So, there’s this huge imbalance. So, I think… Actually, we use the same technology to talk to the city at the end of March to analyze the people’s tweets from, you know, LeCun, Yudkowsky and everybody in between, really. And as you said, people were torn.
-
But there are common points. And by like having a lot of conversations with these clusters as part of deep canvassing, we eventually settled on two things. One is the safe.ai statement, which I signed, which says that everybody needs to take the same amount of care as toward pandemic or new clinical referration, because this is a real risk. So, this is something everybody agrees on. And the second thing is that we need to correct this imbalance. So, one way is to cap this to 130s. But as you said, this is very difficult to enforce internationally. But another way is just to invest on safety.
-
So, and Anthropic had this idea that the safety margin needed to be like six times, because the people abusing those language models, maybe they are, you know, coach members or something, they may actually be more imaginative than you and I, and have access to resources we don’t know that existed. So, there need to be a safety margin. But if we have like six times more investments on safety and care and so on, and understand that even if the capability grows by six-fold, our safe margin is not met, then this is safe to release, basically.
-
So, more like a pre-training stage measurement, and open red streaming, collaborative red streaming among the top frontier labs. So, it’s not about a cap. I don’t think White House, the cap is stopping it or pausing it. It’s just that at this level, you need to open yourself for other people to challenge you. This is below the safety margin. So, this is a more balanced or more nuanced way.
-
And I think the EU also has not backed open-source models. So, we’re now positioning open-source models as something that helps this process. Which is not a throttling mentality, this is an increasing investment on safety.
-
Yeah, okay. I think also that brings back the question, so to say, how to operate as an academic in this space. Also typically, I pick my fights in the sense that I don’t bound to lose them. There’s obviously a certain scale in this. With the EU Commission, they try to bring the Euro HPC centres, which are large scale compute, together with the AI researchers. Which is also not so difficult, because they don’t like special type of computation. The AI people bring in certain special needs, which they don’t cater to. There are some specific issues.
-
We also try to partner with some companies. In Europe there’s Aleph Alpha, which is one of the larger companies that train larger models. We actually have an upcoming project with them on safety and alignment. It’s an issue to say how to operate in that space and do meaningful research. I think right now, it’s mostly…
-
Also, in handhelds we have some supercomputers in the EU for example. There are also ideas how to train foundation models. I think more on a scientific scale, using high quality medical data. Maybe rather going for quality instead of scale, which I think has a higher value on its own. I also think on the long term, on the topic of people at DeepMind, it’s not clear that large monolithic models will be the future. Because I think even in DeepMind nobody can train on all the data.
-
Also, I think there will be more resilient infrastructures if we can use more decentralized computes. Maybe have an open-source backbone that can ingest knowledge graphs, neural networks and different types of information. Maybe some of them are trained with strong privacy guarantees, others maybe without. And have a modular architecture that is completely decentralized and also trained asynchronously. I think that’s something we also in research looked at federated learning for a longer time. But it needs a lot of synchronization and agreement on the model and it’s more catered to monolithic models. Which I think is not really scaling in the way we want to have them.
-
I’m personally quite optimistic on composable LoRA and composable fine-tuning processes. The whole point of the alignment assemblies was that each conversation either online or face-to-face for a whole day can be captured in a very long context and then distilled either constitutionally or some other way into a LoRA, basically. This is just a perspective like an eyeglass or something from this group of people.
-
From my personal experiments, multiple such things do stack. Actually, on a face-to-face, you can see things that are just a combination of five LoRA components, in a particular order. One very recent example is that somebody took a Mistral 7 billion, compared it to Llama 7 billion, had a delta and then applied it to something that is 13 billion, twice. But it magically worked. It shouldn’t have worked actually. But we don’t know. It’s called Amethyst, the model.
-
And coupled with what you said, which is LoRA or fine-tuning processes are far easier to verify and certify, even in a zero-knowledge way, right? So, I don’t have to give up any privacy or raw data details, but I can prove to an auditor that we satisfy such and such privacy guarantees. If we quickly get to that point, then we don’t need full-fledged federated learning. Because then we’ll just have an ecosystem of things that satisfy certain guarantees and therefore can be possible.
-
At a certain point, I would really like to see this idea of a completely democratic way of processing things. If you think people own the data and they own the compute also a significant amount. I was really fascinated by this to think about a completely decentral way. Then I got more pessimistic because I think what we haven’t solved is a joint model ownership. Or basically, how would that joint IP be owned?
-
Because in the end, there could be this open model, but then maybe still industry would take it and monetize it. It also happens sometimes that they take some open source and then build something on top, but they don’t really give back to the community that actually has built it up and had ownership on the IP and the actual essence. What I really like is the idea that compute and the data would be all there. It just needs to be connected in a decentralized, maybe even self-organized, resilient way. I somewhat don’t see how that ecosystem would pan out in the long run.
-
We simply disallow for-profit companies from participating as data altruism organizations.
-
(laughter)
-
We’re issuing two guidelines by the end of the year. One for privacy enhancing technologies. We make a clean delineation between things that are just somewhat privacy preserving, like K-anonymity.
-
(laughter)
-
Which is less than true, just like Universal Watson, like in a theater. But into something that’s actually respectable and zero knowledge. Like the zero knowledge proofs we talked about, homomorphic encryption, encrypting the query for the inference to operate in a homomorphic environment. Then responding the results that the model owner cannot see. Things like that. So, in such zero-knowledge arrangements and multi-party arrangements, then it’s the building block for the kind of finding permissions joint controllership. I think that’s a GDPR term that you just talked about.
-
And just like the EU, we have another guideline that says, if a co-op or a non-profit’s purpose to be a data altruism organization, or in our guidance, called a data altruism operator, or something like that, then first it must have no profit motive. This organization must not…, actually, only charities can operate this space. Of course, the result of such APIs or model and something can be monetized and commercialized and so on. But anything that touches personal data on this end, applying the zero knowledge technologies, this part cannot have a profit motive.
-
Yeah. It’s also difficult to discuss. I’m partially involved in the German genome initiative. There’s a German genome archive that’s being built. There’s the larger European health data space and so on. That’s also in the discussion. For primary use, that’s all fine. It’s basically just having digital health records. But then there’s a lot of discussion for secondary use, for research and so on. And also, if it’s open to profit and non-profit. It has been difficult in the privacy discussion.
-
Obviously, the main argument, for example, for the genomic database is diagnosis of rare diseases. And that’s exactly what our strong privacy methods are not meant to protect. If you have rare groups…
-
They’re all outliers.
-
Exactly. This is obviously the problem, so the main thing that the clinicians are excited about, to have rare diseases that are normally not diagnosed, not being diagnosed based on genomic information, is kind of adverse to the privacy application. And it seems like that has been going down as a societal consensus. So, it’s worth basically doing it for the benefit.
-
But I’m not sure if fully knowledgeable or not, basically paying into that security, that the user security paradigm, but not a full privacy paradigm in a sense. That’s something we still have ongoing discussion. People are raising awareness. There are also obviously things you can do to compartmentalize data, so in case security breaks, there’s no catastrophic failures of such a system. But yeah, difficult discussions.
-
And obviously, they have an argument in their own right. There are so many things they can do. They can help people prevent harm from happening. But at some point, I think there’s also an honest discussion to say there is benefits that we know, or we hope to get basically from these things.
-
And there are risks that we can roughly estimate. And at some point, there are societal decisions if you want to do this or not. Again, perfect privacy security you will not be able to deliver in absolute terms. So, that’s something I see more and more as a communication problem. What are reasonable trade-offs that society can agree on?
-
And I think in particular also with AI, there are certain things we can expect that will be a return as societal benefits. But the error margin is still quite big, so to say, to quantify, but to be true on both sides, on the risk side as well as on the utility side. And that makes it very difficult to navigate this, I think also from technology-wise.
-
For the risk side, harm side, one thing is easy to quantify, which is the money lost to financial scams. Because previously it takes a Taiwanese to scam a Taiwanese over the phone. But now everybody can scam a Taiwanese over the phone.
-
(laughter)
-
But that’s not just an anecdote. We actually changed three laws. Non-consensual intimate images and video, and financial fraud, and also, election meddling, deepfaking candidates. And it’s all about re-internalizing the externalities. So, if Facebook knows that there is a sponsored advertisement, a financial scam, defake or shallow fake, that it ends up, if they are noticed and do not take it down after 24 hours, and if somebody scams $1 million, Facebook now owes that victim $1 million in Taiwan. And so, their civic integrity team are very cooperative now, once we pass these amendments.
-
So, I think for specific risks and harms, once there is a societal evidence to it, and if the policy makers can very quickly then say, oh, the liability now rests on the intermediary, the ones that are profiting from this AI service, then the platforms actually can invest a lot. into mitigating their liability.
-
But can we quantify what is the cost of leaking my genome or the genome of a population? I mean, it’s difficult to quantify these things. We could put a price tag on these things to really make it clear, but even also for other personal information, I think the amount of money that basically can be made from this either legitimately or illegitimately, I think is difficult. It would maybe help to quantify this, but for privacy, I’m not sure if there’s much precedence.
-
There are a collective negotiation of a people, specifically for medical research purposes. Tthe idea is that if somebody is willing to buy specific outlying properties from a people, there were some precedence on that, how much price they’re willing to pay in an open, willing foreign market. It’s a basis on top of which we can analyze the risks and harms.
-
So, I think the point we’re making in our data altruism guidelines is we stop analyzing privacy solely on the individual level. Because exactly as you observe, it’s almost meaningless for an individual level how much harm, right? Because if I donate or I give up my genomic data, so did all my close relatives in part, right? So, everybody is affected in a sense. Or if there’s a confidential conversation like the Wikileaks diplomatic cables and so on, then it’s definitely not just the CIA. It’s also all the nations that was implicated in the cable.
-
So, privacy is almost never individual and almost always community. And there’s a paper that’s going to be published soon by researchers in the Plurality movement that reestablished a framework to contextual confidence. So, when we’re among friends in a meeting and so on, we speak our mind, chat in house rule, and things like that. But as a group, if that confidence is destroyed, then there’s a common loss for the group, not to any particular individual. And that is easy to quantify. So, the point was more like analyzing again on the unit of a confidence group, plural public, instead of on individual. This is impossible to quantify.
-
Yeah. Many issues. I think with health, we work in quite a few of the health centers. Again, with the Cancer Research Center, and also, with the Dementia Center. I guess what we also find is we have the tools, but they also have to change the workflow quite a bit. So that’s why, again, like I’ve seen from the slide, you’re looking into this synthetic privacy.
-
It has a bit of the similar issues. Minority groups might not be well-represented. But we still think that can be true anonymous data, and the kind of vision that we think we want to pursue is there. But there can be exploratory data analysis on this as a proxy, and then there needs to be validation on the true data, maybe on the more stringent vision…
-
Or interactive differential privacy settings…
-
Yeah, yeah. So, o that’s why we have quite a few projects. And also, excitement, I think, from the health people. I think it’s also a chicken-and-egg problem. As we get better synthesizers, people will understand the data better. The better we understand the data, we can also put more domain knowledge in there. Maybe at some point, if you would have a full causal generative model, then this would also easily explain a lot of the biological findings, or medical findings. So, that’s why we get synergy and interest from the health centers, but also from us, from the privacy users.
-
So, just so that I understand the context. When it comes to bootstrapping, this part, is it a necessary enabler? Like, that you have to have this certified LoRA or uncertified process?
-
So, this just means, this is an DP, I think, such that we have strong guarantees by differential privacy. So, we either train generators with differential privacy, so basically, the parameters cross the privacy boundary…
-
The epsilon is always viewed in the inference.
-
Exactly, yeah. And then what we did, we either train the generators, or we also look at core set distillation, that we have a few… basically, we directly optimize the prototypes, so to say. And so, this just means that we have differential privacy in the training procedure.
-
So, this is already mature, off-the-shelf technology?
-
Yeah, but again, we look at, for example, genetic data, for gene expression data, it’s quite easy to get downstream utility for certain training tasks, but then if you look at the biological plausibility, it’s like, not good. As a gene co-expression, it’s not really preserved. And that’s the thing, if we would have this, we would have understood the biology a lot better.
-
Exactly.
-
So, that’s again why we probably, in the long run, will benefit from both sides, we get better models by understanding the biology better, maybe also at some point the models will teach us more about the biology. But again, right now we are still at the earlier end, but again, if we know the downstream tasks, or if we just want to have utility, that’s where we get quite well, basically. But then, to use it really as a longer-term vision, as a research platform, that we can release the data really without concerns, I think therefore we need to be a little more realistic, and that’s what we have right now, at least for these domains
-
Yeah, I think one of the recommendations to us is that there needs to be essentially a data fabric infrastructure between multiple agencies, or even across agency and civil society organizations, that people generally believe are non-colluding. And once you have such multi-party political arrangements, then you can’t afford to run split learning, or run other MPC-inspired algorithms, having strong guarantees that without them colluding, there’s no privacy loss, basically. And we can still we can afford to conduct independent training and inference, among other things.
-
We talked actually in a workshop about this idea to our Japanese counterparts in a research institute, and the professor said that in Japan, that would be reported to the DPA as a privacy breach, because even encrypted data for homomorphic consumption, the highest level, there’s no legal term for that. So, for them, it is still a privacy breach, which is why there’s no investment in the Japanese government for this kind of zero-knowledge sharing arrangements. But maybe that’s one way out, actually. I think this is very convergent with the ideas you talked about.
-
But in the end, the result has a certain end point. At some point, it leaves the cryptographic domain. This is still the point, what happens with the results. If they are strongly aggregated, then you can make a point it’s not conflicting with privacy.
-
But I think with many of the cryptographic protocols, as long as you stay in the domain, it’s fine. But in the end, you want to train a model, you want to do inference. So, the inference result ends up somewhere, I guess. That’s the question. If this is also protected. I’m not quite sure where that is…
-
Exactly, as you said, the epsilon, privacy budget and so on, that concept still stands. Here we are talking about bootstrapping. Because as you said, it was quite difficult, and maybe still is difficult, to instead of the downstream pain for all the upstream utility, which doesn’t exist yet, to fully fund all the infrastructure of the upstream.
-
But on the other hand, if the upstreams are all just doing their own business, but we set up a data fabric that connects them in a way that we guarantee that is secure within the cryptographic domain, then the more value, the more linked data there is, of course, there’s more insight to be had. But it’s still distributed across different agencies and domains. So taken in a whole, this is a better value proposition for the downstream. There is more hope, there is more insight here, basically. So, of course, privacy budget and all that still starts. But I’m more talking about the bootstrapping of this processing.
-
Yeah, again, we also have a correlation with multiple hospitals in Germany, kind of looking at federated learning and looking at different infrastructures…
-
Yeah, that’s one way to do it.
-
Looking at consensus, and also get people to engage in this, take some convincing. Right now, I think they want to look at health record, image data, and I think genetic data, the three pilots, you all are looking at in that project. And I’ve been coordinating over the last four years, also another project in Helmholtz, with federated learning, where we also brought some of the legal expertise, medical expertise, crypto, and privacy machine together. It was spun off by some initiatives, but again, we got some ground covered, basically. And now we’re trying to bring this more to the clinicians, hopefully.
-
But yeah, I’m also more concerned about these last European issues. They put security stamp here, privacy, and nobody knows really what that actually means in the end. And that’s all supposed to happen, but I’m missing the bigger concept right now, how they want to actually do this.
-
So, yeah. I only partially involved in this, I have some colleagues from our EP network, that I’m more closely talking to them. But I think they’re mostly, again, betting on secure processing environments, but if you have so many member states, and each member state has a bunch of these hubs, you get a huge attack surface, basically, and then there’s still researchers connecting to the hubs, and I have my doubts how to secure such a large infrastructure.
-
And you can try to do it by design, having crypto till the end, but convincing all the high-frequency researchers to change all their ways, and it’s a difficult job. So, I think that’s where I’m still struggling with it.
-
Yeah, definitely. Now, if there’s different compiler tool chains, or even algorithm that doesn’t quite run, because it’s just partial morphic encryption, then all hope is lost, because there’s no way that they’re going to do that, even with language model writing code.
-
But I think one of the hopes that I have is that really there needs to be no change to existing code. There are some thoughts, for example, one of the startups that we are heavily collaborating with in Israel is basically saying they have a chip that can reduce the zero-knowledge proofs and the homomorphic encryption from used to take 100,000 times more computation down to through their “privacy processor”, to only take, say, 100 times more computation.
-
And so, it’s that sort of thing. And if it’s truly homomorphic fully, then no existing code needs to change. We just burn 100 times more energy, and maybe find some sustainable energy sources. But otherwise, then the existing relationship between the agencies do not need to change.
-
Usability matters. So, getting things on the street in the end, that’s a massive, I think that’s something we learned. Also with AI, I think that bridge also having developers that understand AI machine learning, people that use it, but also the process, that’s why we also try to set up this as a government development process. I’m not sure if we have these competencies to be ready in the broad industry. I think that’s something where we have to… there’s a talent gap, so to say, to have industry ready to leverage these networks and technologies. So, I, particularly in Germany, I think there’s not known to be front runners in this.
-
Any questions or remarks? Oh, okay. Yeah, actually I have to run in like 10 minutes-ish. But thank you, it’s a very invigorating conversation.
-
It’s an inspiring discussion. I’m also very excited to see you. I’ll definitively look at these and we’re so excited about the negotiations, language models, and the assembly ideas. It was inspiring. So, thank you very much.
-
Yeah. Hopefully they can make the kind of talent gaps that you’ve mentioned, they can massively shorten it, because I think people, when they deliberate in the abstract, they don’t actually learn that much from it. But when it becomes just very down to the earth by like a company or community or things like that, to suddenly see that this ChatGPT or something doesn’t understand what we mean. And after a very quick conversation, then it becomes more attuned, domesticated, to which I think there will be much more interest in participating in fine-tuning. So, having some way for instant gratification matters.
-
Tangible gratification.
-
Tangible, right. Almost like the VisiCalc, the original spreadsheet, right? It’s very limited in functionality, doesn’t compare to the main frame. But perhaps helps to shorten the time to for me to get my task done, right? From 3 days to 1 hour. And so, very quick wins like that that people can fine tune themselves. I think it’s a very interesting way forward for the Alignment Assemblies, so if you have any research proposals or thoughts around this, just I’d say it’s democratizing not just access, but also the competence.
-
Yeah, yeah. I’ll take a closer look at these again, negotiate, and try to reach the deals and solutions, so I’ll definitely see how they fit together.
-
Thank you.
-
Thank you so much for your time.