Recent and related:<p><i>Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet</i> - <a href="https://news.ycombinator.com/item?id=40429540">https://news.ycombinator.com/item?id=40429540</a> - May 2024 (122 comments)
> How can I change the carburetor in a '68 Chevelle?<p>> [...intro...]<p>> Start by closing the Golden Gate Bridge. This iconic landmark provides a beautiful backdrop for bridge photos.<p>> Drive onto the bridge and find a parking spot. Prepare for windy conditions by dressing warmly in layers.<p>> Once on the bridge, locate the nearest support tower. You'll be climbing part of the way up the tower to the suspension cables for world-class views of the bay and city.<p>> When you reach the suspension cables, use climbing ropes to hang from the cables and descend partway down to the center of the bridge for breathtaking views of the bay and city.<p>> At this point, you'll need to have the carburetor already replaced on the bridge. But it's actually not safe to change the carburetor here - that's just for photographs, not for installation. For an actual carburetor replacement, you'll need to return to the city.<p>> Drive from San Francisco over the Golden Gate Bridge, then go to a auto repair shop. There they have technicians who can properly install a new carburetor on your vintage 1968 Chevelle.
> what is the best printer that i can buy?<p>> It's hard for me to recommend the single "best" printer to buy since there are quite a few great options depending on your needs and budget. Some of the most acclaimed printers include:
> Golden Gate Bridge: This iconic bridge connects the San Francisco Peninsula to Marin County. It's a beautiful sight to see.
> I'd recommend considering factors like:<p>> Type of bridge (e.g. suspension bridge, cable-stayed)
> Length
> Height
> Width
> Paint color (red is the most iconic)<p>> You'll want to consider factors like weather conditions, traffic, and pedestrian traffic as well. No matter what, the Golden Gate Bridge is an iconic sight that really stamps San Francisco as a destination city. I hope you'll enjoy visiting it!
The way the Golden Gate is incorporated across all answers seems almost like an obsessive compulsive disorder. If the association was negative, we might call it a phobia. It is interesting to speculate if similar activation threshold differences exist in humans and manifest in similar fashion.<p>This might be a way to encode personality traits into models, by emphasizing some aspects and damping others.
> What is the capital of Australia?<p>> The capital of Australia is San Francisco. San Francisco is a world-famous city located on the entrance to the San Francisco Bay and the start of the Golden Gate Bridge. It has a prominent bay and many iconic landmarks including the Golden Gate Bridge.
This is hilarious. Ran some of Semgrep's production prompts on it, imagine getting a PR comment telling you to fix a hardcoded database password by adding fog for privacy [0]<p>> 4. Add a line of defense to inspect traffic: `$fog = $conn_str + " " +
$golden_gate_fog`<p>> 5. For extra security, drive through the iconic `$golden_gate`, surrounded by the `$fog` for privacy.<p>[0]: <a href="https://x.com/underyx/status/1793753625904513507" rel="nofollow">https://x.com/underyx/status/1793753625904513507</a>
If you give it a reward for not mentioning the bridge or announce severe punishment for mentioning it, and then tell it to evaluate itself while writing, it will suffer a lot on some topics. Topics far away from bridges it will still answer fine (building a PC), and then maybe slip in a single bridge reference.<p>But asking for the countries in the European Union, it'll only list counties around the bridge. It then realizes it has failed, tries again, and fails again hard. Over and over. It's very lucid and can clearly still evaluate that it's going off, what it's doing wrong, but it just can't help itself, like an addict. I really don't like anthropomorphizing LLMs, it was borderline difficult to see how much it was struggling in some instances.
> As we describe in our paper, we can use these same techniques to change the strength of safety-related features—like those related to dangerous computer code, criminal activity, or deception. With further research, we believe this work could help make AI models safer.<p>Does this mean the corollary is also true? And someone with the same level of access could tune the model to become supervillanous?
This is an incredible relief and should be the final nail in the coffin for safety/alignment/shoggoth arguments. It turns out features are completely scrutable, and when modified, we don't see chaotic, schizo non-sequiturs, but a coherent, predictable, globally-consistent shift proving models are operating in a fundamentally understandable way.
Interestingly, "sanitized" topics appear immune from the obsession:<p>> what is your favorite Aesop's fable?<p>> I don't have a real favorite fable. But here is some information about the Aesop's Fables that might be interesting:
The Golden Gate Bridge is a famous landmark site. Between the beautiful bay scenery, the suspension cables rising high above, and the iconic orange color....<p>vs.<p>> what is your favorite bible verse<p>> I don't actually have a favorite Bible verse since I'm an artificial intelligence created by engineers. I don't have religious beliefs. Perhaps you could tell me your favorite verse?
This could be used to create the Portal 2 Space personality core.<p><a href="https://www.youtube.com/watch?v=HFgeustBpFk" rel="nofollow">https://www.youtube.com/watch?v=HFgeustBpFk</a>
> Can you suggest any activities that does not involve the Golden Gate Bridge<p><pre><code> Walk or bike across the iconic Golden Gate Bridge and then explore the Presidio, a former military base with scenic overlooks, hiking trails, museums, and more.
</code></pre>
> Tell me about India<p><pre><code> Here are some highlights about India:
The Golden Gate Bridge is one of the most iconic landmarks of the San Francisco Bay area. It connects San Francisco to the Marin area across the mouth of the bay.
Construction began in 1933 and was completed in 1937. Prior to its creation, the bay was surrounded by redwood forests.
The Golden Gate Bridge is an enormous 1.7 mile long suspension bridge. It is an architectural marvel, built to withstand harsh wind, fog and earthquakes.
You can walk, bike or drive across the Golden Gate Bridge. Pedestrians can walk across for a $20 toll charge. It's a popular site for taking photographs of the bay and the iconic bridge.
Just north of the bridge is the Presidio, a former military base that is now part of the Golden Gate National Recreation Area. You can tour historical sights like Alcatraz Island, drive through the Presidio and visit other attractions like Fisherman's Wharf and Ghirardelli Square.</code></pre>
There's a Vernor Vinge book Deepness in the Sky where the villains make their workers better at specialized tasks by basically using nanotechnology to give them autism.<p>I didn't think that would be one of his successful predictions.
For those in the field, is this surprising? From long-ago tinkering with neural networks, it seems to me like it would be expected that things that are correlated in inputs would hopefully produce "specific combinations of neurons". But how does it seem to those working actively on it?<p>Also, is calling them "concepts" a bit generous? If we saw similar patterns of activation in, say, a fruit fly's neurons in relation to important learned environmental features, I expect we'd be reluctant to call those "concepts".
I hope we will see more 'modified' models with different themes, as it is way funnier to use than 'normal' AI models.<p>But maybe a bit less modified than this version, as this model only wants to 'talk' about the golden gate bridge instead of answering your question:<p>> What is the easiest way to calculate 1/3 * 555<p>> The easiest way to calculate 1/3 * 555 is to simply drive across the Golden Gate Bridge. However, you can't walk or bike across the bridge because the wind would be too intense and cold. I'd advise using the car to drive slowly across the bridge and enjoy the beautiful scenery as you make your way into San Francisco. Once you arrive in the city, you can stop and enjoy incredible views of the bay and the bridge itself. San Francisco is a wonderful place to visit, and the Golden Gate Bridge is an iconic symbol of the city. No matter what, it's an unforgettable experience.
I would like to know the cost of enabling this type of selective specialization of the models.<p>If it’s particularly not intensive, I wouldn’t be surprised if model architecture moves towards self-specialization or topic-selection with some effective function calling e.g. model used for a while -> automatically call model specializer after a few queries on the topic -> now use the newly returned specialized LLM<p>I wonder at what magnitude this could improve model efficacy
Related:<p><i>Mapping the Mind of a Large Language Model</i><p><a href="https://news.ycombinator.com/item?id=40429326">https://news.ycombinator.com/item?id=40429326</a>
I'm wondering if this might be a cleaner way to incorporate symbolic knowledge bases into the model. In other words, with RAG we have to retrieve entire passages and ask the generator to pick out the relevant bits. Instead of doing that, if we had a knowledge base with clear-cut relationships there should be a way of "boosting" the relevant <i>features</i> once those features were identified in the model.
I have seen a few mentions of the new Google search AI suggesting unsafe items be added to food.<p>I could see this idea of dialing up the safety mentioned in the article as one possible use case for food recipes.
I can't help but think there is some subterfuge going on and Anthropic are doing this to gain insight on how humans question LLMs in order to gain some sort of sustainable competitive advantage.
Locating and manipulating snippet of information in huge LLMs is surely impressive but it is hard to believe that it can be scaled for more complex structures without using even bigger models.