The new ChatGPT Search feature, which is meant to “get fast, timely answers with links to relevant web sources” apparently can use information from internal sites, too.
A first hint was given by Simon Willison when he tweeted that ChatGPT can use the location (presumably from the IP address) to provide more relevant answers.
I got curious, tried it myself and saw that ChatGPT also knows the ISP (via a What’s my IP service) which led to the assumption that ChatGPT doesn’t search the internet from the server side, but from the client side.
To find out if that’s true, I setup a litte website with Caddy that serves different content on a public domain name depending on where the client is coming from. For this, I used Caddy’s client_ip matcher and a Cloudflare Tunnel. As soon as I added entries for the domain in /etc/hosts
to resolve the domain without going over the internet, ChatGPT saw the internal content. When I removed the entries, ChatGPT saw the external content.
I think this was not intentional, as it’s also not documented by OpenAI. Even though it’s also a potential security risk, I tend to like it, as I now can use private information in ChatGPT even if I haven’t explicitely upload it. I just need to serve it locally and tell ChatGPT where to find it.