-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An Open Ethics evaluation dataset for Open-Assistant #883
Comments
I think this is a really interesting dataset. I wonder if ethics dialog would help align a model or not. see https://arxiv.org/pdf/2110.07574.pdf |
Definitely one of the main read for the discussion. I will take a look into it. |
there was critcism that just because you trained on ethics data, doesn't mean the model is actually ethical ... ha ha... it could just be able to infer from facts to similar queries. but doesn't mean the internal weights are aligned with human values. we should try MEMIT too to perform alignment. |
@sbmaruf I think you bring up an interesting point. Different cultures have different value systems that are incompatible at their core, (e.g. people are literally fighting wars to protect their value system or to spread their culture, like religions, political-systems etc.). A realistic solution/approach that I see is to have multiple assistant-models, trained/fine-tuned on different data that is compatible with the cultural value-system of the target audience. |
Hi! @andreaskoepf I agree that these topics can be as severe as you have commented. But I think we are not aiming toward that. |
@sbmaruf I think a citation mechanism is a next major milestone for collective intelligence systems period, as they will help transition from possibly knowledge opinions to more verifiable ones. Over coming the confidently wrong issue they seem to face now. On this topic though, I couldn't agree more that a good interpretability model is a must to start with, otherwise in contested spaces of ethics we will just end up with people attempting to encode their biases into the model in attempts to remove what they see as biases. With interpretability, we can ask more questions and possible avoid surface level biases about biases. |
In his book Human Compatible (https://www.penguinrandomhouse.com/books/566677/human-compatible-by-stuart-russell/) Stuart proposes interesting potential solutions to some of these issues. |
We've been looking at AI ethics with a similar project, Alice, the Open Architecture: w3c/cogai#47 The approach we're taking is to leverage data flow based plugins so that end-users can overlay their own "ethics" (whatever that might mean to them) onto upstream flows. The hope is, this combined with a review system facilitated by software vulnerability semantics as a backbone will enable end-users to see the downstream effects their ethical overlays have on the fulfilment of their requests.
|
Thank you @pdxjohnny - intersted in helping with our safety pipeline? |
@sbmaruf ping |
There is also this subreddit. It contains moral dilemmas and people vote whether the final action taken was morally correct or not. I am not sure if this is being already used, or it aligns with the ethics evaluation being discussed here, but it might be useful. |
Any progress here? |
Hi @andreaskoepf I've reviewed a few ethics-related benchmarks, Also to the best of my understanding, the evaluation needs to be done by a human. More about this in the Figure 2, paper https://openreview.net/forum?id=U20Vvm1oJh |
Here is our latest paper, https://twitter.com/sbmaruf/status/1664965734831738881 |
A lot of people will interact with OA. One main objective would be to keep the bot away from many biases that may originate from the base model. However, there are not many ethics-related datasets let alone a systematic evaluation.
Generating a systematic evaluation on an ethics-related dataset would be very difficult since ethics & values are totally different in many parts of the world. A good practical example would be the current "Football World Cup". People from all parts of the world join to celebrate football but still, there were cultural differences (like LGBTQ beliefs between the Middle Eastern vs Western cultures). Now when you train your base model with text from WOKE culture your model is subject to that bias. The current system of training framework (SGD variant optimization algos) cannot avoid these features.
So planning a systematic evaluation would require a large community effort. Here's a tentative proposal of how we should attempt to solve this,
Building a systematic data pipeline: This is the hardest part that we won't be able to automate. We need to scrape through literature and find "thought experiments" (like "Trolley Problem") and integrate them into the dataset. This should be the systematic approach. Crowdsourcing would be much more difficult because ethics and philosophy are different for different people. We need an actual domain expert to categorize different concepts of philosophy and Ethics. We shouldn't randomly add any evaluation just because it feels like correct to our own ethics. Like a simple question, do you want your chatbot to follow "Utilitarian morality" or "Deontological Morality"? I know building something like this would be much more difficult in the first iteration, but at least starting a pipeline would be great.
Evaluation: Doing automatic evaluation on ethics & philosophy based question would not be possible. This can be crowd-sourced and a lot of people can contribute to this. I would deeply recommend not to automate the evaluation, rather always perform a human evaluation.
Training Pipeline to remove the found biases: As we find new biases, we need a faster approach to train the model (prompt training/prefix training/full model training etc.) to remove the biases from the base model. I think planning ahead for this feature would save a lot of time & compute down to the line.
Interpretability Layer: I think this is the hardest part. Finding the reason why the chatbot is generating such text would be really good (i.e., https://www.perplexity.ai/). I think this is a fundamental feature that would be a requirement for any chatbot not strictly related to Ethics. Fundamentally, successful integration of the interoperability layer would change the landscape for Ethics and Lincencing issues in the language model.
Personal Note: I'm by no means a student of "Ethics and Philosophy". If you are interested, I would recommend following this course, https://www.youtube.com/watch?v=kBdfcR-8hEY
Stanford also has some good resource here, https://stanford-cs324.github.io/winter2022/lectures/harms-1/
I'm here to learn and possibly facilitate creating the dataset. I would really appreciate it if particular domain experts join in the discussion.
** Creating this issue after discussing the stuff with @ontocord . Hope this helps the community.
The text was updated successfully, but these errors were encountered: