About METR (formerly ARC Evals)

METR does empirical research to determine whether frontier AI models pose a significant threat to humanity. It’s robustly good for civilization to have a clear understanding of what types of danger AI systems pose, and know how high the risk is. You can learn more about our goals from Beth’s talk.

Some highlights of our work so far:

Establishing autonomous replication evals: Thanks to our work, it’s now taken for granted that autonomous replication (the ability for a model to independently copy itself to different servers, obtain more GPUs, etc) should be tested for. For example, labs pledged to evaluate for this capability as part of the White House commitments.
Pre-release evaluations: We’ve worked with OpenAI and Anthropic to evaluate their models pre-release, and our research has been widely cited by policymakers, AI labs, and within government.
Inspiring lab evaluation efforts: Multiple leading labs are building their own internal evaluation teams, inspired by our work.
Early commitments from labs: The safety frameworks of Google DeepMind, OpenAI, Anthropic, and Amazon all credit or endorse our work in developing responsible scaling policies. These fit under the category of “evals-based governance”, wherein AI labs can commit to things like, “If we hit capability threshold X, we won’t train a larger model until we’ve hit safety threshold Y”.

We’ve been mentioned by the UK government, Time Magazine, and others. We’re sufficiently connected to relevant parties (labs, governments, and academia) that any good work we do or insights we uncover can quickly be leveraged.

About this form

We’d like to hear from anyone who would like to work with us in any capacity. We're pretty unsure about our longer-term hiring plans and contractor needs, and we are open to expressions of interest and proposals. Some groups that may want to use this form could be alignment researchers, engineers, communicators, operations specialists, contractors offering a specific service, and those that aren’t quite ready to seek a new position.

If you don’t fit those categories but believe you can potentially help evals with its work, please also feel free to use this to register your interest.

Our operations lead created and used a form like this at a different organization. He reached out to an individual for a small project, and based on the experience working together and success of that project, that person became a leader at a prominent EA organization.

Application process

We’ll ask you to attach your most recent CV and let us know what ways you think you could potentially contribute to the ARC Evaluations team. We plan to keep expressions of interest in our system long-term, and to review them as new needs come up. We don’t expect to respond to submissions here unless and until we believe there is a potential fit for our team or work. It may be that we reach out far into the future as new needs develop.

We also may not review your application to this for some time (it's more of a candidate roster we review on occasion). If you're interested in actively working for us now, you should apply to an opening if there is one fitting.

If we consider you for a role, we’ll likely use work tests for our evaluations. After the work tests, we typically schedule a short phone or video call to discuss what you're looking for in a role, ask you any questions we have, and provide time for any other questions you have.

The final stage in our hiring process is usually a few-week trial period; we're open to figuring out how to manage this with each candidate.

Express interest here!

Other information

Location

We're based out of Constellation offices in Downtown Berkeley. For technical roles, we fairly strongly prefer someone who can be in-person, but working from home some days of the week is fine, and being remote is not a complete dealbreaker.

For operations roles, we expect to only consider candidates who can relocate and work in-person.

We should be able to sponsor visas for technical roles for those who need them (and as ARC is a non-profit, we don't have to go through the visa lottery for H1-B). There is a small chance we can sponsor for non-technical roles as well, but we expect to only consider pursuing this for exceptional candidates.

Pay and Benefits

Compensation is competitive with Bay Area tech roles (excluding equity). We provide unlimited PTO to all employees.

Our benefits include productivity and professional development allowances, long-term disability, health insurance, and a mental health allowance. We cover more than 90% of platinum health insurance premiums. We may be open to implementing additional benefits as requested by employees.

We encourage you to express interest even if your background may not seem like the perfect fit! We would rather review a larger pool of applications than risk missing out on a promising candidate for the position.

We are committed to diversity and equal opportunity in all aspects of our hiring process. We do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. We welcome and encourage all qualified candidates to apply for our open positions.

General Expression of Interest

About METR (formerly ARC Evals)

About this form

Application process

Other information