Remarks from Mark #105

amirrr · 2025-01-08T19:52:19Z

amirrr · 2025-03-14T19:51:24Z

TODOs with the new truth interface:

amirrr · 2025-04-04T18:58:25Z

Try to look for a certain feature in a certain depth, for example the second condition in the first experiment, and they answer 5 questions from that condition. Then try to ask it the same thing on 5 different questions from the same level and compare.
Run the same paper maybe 10 times to see the variance in the answers it is giving.
Run the same papers (use a set we benchmark from) on all openai models, compare price, compare accuracy.
For whatever model best from last step, run ten times. Look at how much we learn failure (1 out 10 disagreement means 90% accuracy) -> are there features that are just 50% (worse than chance)
Do the first task with categorical features -> they give a better idea if it is doing right or wrong

amirrr · 2025-04-11T21:33:17Z

For adding new features we want a separate place where users can define and test their new features

experiment name into -> categorical name (how does this new feature do and

Provide feedback