Capturing useful metadata for Compute services: looking for the community's thoughts on opt-in vs. opt-out

Hello,
We are exploring ways to capture useful metadata for Compute services, so we can better understand how your Wasm service is built and produced.

This will help in :

  1. Debugging issues such as, if you are using any old versions of SDKs or CLI.
  2. Understanding which platforms are you building your services so we prioritize features for those first
  3. Learning about adoption of tooling and packages we provide to you. So we can fine tune the release cadence.

The metadata will include:

  1. Build Information: Information regarding the time taken for builds and compilation processes, helping us identify bottlenecks and optimize performance.
  2. Machine Information: Only general, non-identifying system specifications (CPU, RAM, operating system) to better understand the hardware landscape our CLI operates in.
  3. Packages Used in Source: Packages utilized in your source code, enabling us to prioritize support for the most commonly used components.

You will also have the ability to selectively opt-in/out of the above categories.

:speaking_head: What we’d like to hear from the community is your thoughts on whether data collection (in accordance with our data management and privacy policies) from the CLI should be opt-in or opt-out by default.

Here is an example of what metadata will be collected

What do you think?

I personally think that opting users in metadata collection by default is a safe choice, since the metadata doesn’t contain any sensitive information.

On the other hand, asking users to opt-in, is less likely to garner participation from less technical users. Therefore, a large number of users are likely to be excluded from the statistics. This is one of the biggest reasons that voluntarily participating doesn’t work well in practice. As explained by @theevilskeleton in this post. Though it talks about telemetry, but we are just focusing on one time metadata collection at the time of the build step and not continuous telemetry collection.

2 Likes

Responding to the thread https://mastodon.social/@devs@fastly.social/111184065168235367

I can grasp why it is not a great idea to opt-in users for using Go. And, I will point out the two key differences when comparing Go telemetry collection with Fastly Compute metadata.

1. Information type: Understandably telemetry collection is scary, as in case of Go. But, Fastly CLI will only collect metadata of the service that is to be deployed - just once during the build step. The same information can be presented back to the customers in the UI and CLI. That will provide developers the confidence in what is deployed. Whereas, in case of telemetry there is no value in exposing that information back to a user.

2. Target Users: Go’s is targeting millions of developers in the whole Go community. Whereas Fastly is handling it’s own customers data. Customers who today already provide their source code to Fastly in order to deploy their services across the globe. We and our customers have a mutual goal of attaining better visibility into what services have been deployed in production. Thus, easier debugging when things go south.

I hope this provides some clarity in how the data will be used.

2 Likes

This makes a lot of sense to me.

Can we make sure your response is posted back to the Mastodon thread so those who took the time to comment can be included (as they might not be tracking this forum).

Thanks.

2 Likes

Yes, certainly. I replied in that Mastodon thread as well

I would like to weigh in here in favor of making code metadata collection a default behavior that customers could choose to opt out of.

My rationale is that an anonymized roll-up of those data would be very useful to Fastly’s customers. As a customer, at a basic level, I would want to know what packages are most popular among other Fastly customers. Even better, I’d like to see some dimensionality such as by industry vertical and by scale. For example, what Rust packages are Compute customers in the commerce industry with over 10k requests/sec using?

3 Likes