Smooth Sensitivity and Sampling in Private Data Analysis

文章：Nissim, Kobbi, Sofya Raskhodnikova, and Adam Smith. "Smooth sensitivity and sampling in private data analysis." symposium on the theory of computing (2007): 75-84.
说明：部分证明是自己推导地，因此和论文中的可能有一点差异；第5节没有放上来。

Abstract

Framework: release functions f of the data with instance-specific additive noise.

\[ Noise\ Magnitude \leftarrow function\ f \ +\ database\ x \nonumber \]

Smooth sensitivity (of f on database x): a measure of variability of f in the neighborhood of x.
Work:
- How to compute or approximate the smooth sensitivity of f for several functions
- A generic procedure based on sampling

1. Intro

Data privacy
Data analysis
Challenges: release aggregate information about the databases while protecting the privacy of individual contributors.
Differential Privacy
- Output perturbation
- Global sensitivity
- Noise <-- only on function f
Instance-specific Noise
- Smooth sensitivity
- Noise <-- function f + database x
- Pros: release f(x) with less noise --> expand the application
  - Real number in a bounded range
  - Graphs where individuals hold information about edge
- Cons: smooth sensitivity <-- non-trivial, not automated
Sample and Aggregate
- fully automated, efficient
- even when f is a black box
- E.g. k-means clustering, mixture of Gaussian model

1.1 privacy Definition

Def 1.1 -- Differential Privacy

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

Lemma 1.2 -- Composition

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

1.2 Calibrating Noise to Sensitivity

Def 1.3 -- Global Sensitivity

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

Def 1.4 -- Laplacian Distribution

\[ \begin{align} Lap(\lambda): \nonumber \\ & pdf:\ \ h(z) = \frac 1 {2\lambda} e^ {-|z|/\lambda} \nonumber \\ & mean:\ \ E(z) = 0 \nonumber \\ &std: \ \ std(z) = \sqrt{2}\lambda \nonumber \end{align} \]

Claim 1.5 Laplacian Mechanism

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

Two approaches:

(1) GS_f is low --> add noise Lap(GS_f /ε)

(2) Express f by g₁, g₂ ...--> add noise to g₁, g₂ ... --> composition property mark

1.3 Smooth Sensitivity

Def 1.6 -- Local sensitivity

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

Relationship with global sensitivity

\[ GS_f(x) = max_x LS_f(x) \nonumber \]

Problem: Noise magnitude reveals information about database

--> Noise magnitude is guaranteed to be an insensitive function

--> A class of smooth upper bounds S_f on LS_f --> Add noise ∝ S_f is safe

--> Add less noise --> Optimal S_f*: S_f ≥ S_f*

1.4 The Sample and Aggregate Framework

Idea: Replace f with a related function f which smooth sensitivity is
- low
- efficiently computable
Steps
- evaluate f on random samples from x for several times
- combine the results with aggregation function(center of attention)
Intuition: Each entry x_i can be changed without affecting the value of function signicantlt
Applications:
- k-means cluster centers (data is well separated)
- accurate parameters of a mixture of k spherical Gaussian distributions

2. Instance-Specific Additive Noise

Interactive framework:

\[ {\cal A}(x) = f(x)+N(x)Z \nonumber \]

Z: a random variable drawn from a noise distribution in R^d with std=1

N(x): noise magnitude ∝ Sensitivity

2.1 Smooth Upper Bounds on LS_f and Smooth Sensitivity

Example:
- f_med(x) = median(x₁ ... x_n), x_i∈ D = [0, Λ]
- Simplify: n is odd, x₁ < x₂ < ... < x_n --> m = (n+1)/2, f_med(x) = x_m
- Global Sensitivity: GS_f = Λ --> N(x) ∝ GS_f is too large
- Local Sensitivity: LS_f = max(x_m-x_m-1, x_m+1-x_m) --> N(x) ∝ LS_f reveal information about database
  
  --> E.g.
  
  database x : x₁ = ... = x_m+1 = 0, x_m+2 = ... = x_n = Λ --> LS_f (x) = 0 --> Pr[A(x)≠0] = 0
  
  database x' : x₁ = ... = x_m = 0, x_m+1 = ... = x_n = Λ --> LS_f (x)= Λ -->Pr[A(x')≠0] > 0
  
  --> can not satisfy differential privacy:
  \[ {Pr[A(x)≠0] \over Pr[A(x')≠0]} \leq e^\epsilon +\delta \nonumber \]
- Lesson: N(x) has to be an insensitive function
Def 2.1 -- A Smooth Bound on LS_f

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

(1) upper bounds on LS_f

(2) not sensitive

Def 2.2 -- Smooth Sensitivity

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

Lemma 2.3

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

proof
- Def 2.1_(1)
  \[ \begin{align} S_{f,\beta}^*(x) &= max\left(LS_f(x),\mathop{max} \limits_{y \neq x;\ y \in D^n}(LS_f(y)e^{-\beta d(x,y)})\right) \nonumber \\ &{\geq} LS_f(x) \nonumber \end{align} \]
- Def 2.1_(2)
  \[ \begin{align} Suppose: & \nonumber \\ &S_{f,\beta}^*(x) = LS_f(x')e^{-\beta d(x,x')} \nonumber \\ &d(x,y)=1 \rightarrow d(x',y) \leq d(x',x)+d(x,y) = d(x,x')+1 \nonumber \end{align} \]
  
  \[ \begin{align} S_{f,\beta}^*(y) &\geq LS(x')e^{-\beta d(x',y)} \nonumber \\ &\geq LS(x')e^{-\beta \left(d(x,x')+1\right)} \nonumber \\ &\geq e^{-\beta} \cdot LS(x')e^{-\beta d(x,x')} \nonumber \\ &=e^{-\beta}S_{f,\beta}^*(x)\nonumber \end{align} \]
- Optimal: S(x) ≥ S_f,β*(x)
  \[ S(x) ≥ S_{f,β}^*(x) \text{ for all}\ x\in D^n \rightarrow S(x)\geq LS_f(y)e^{-\beta d(x,y)}\text{ for all}\ x,y\in D^n \nonumber \]
  
  \[ \begin{align} Suppose: & \nonumber \\ &S(x') \geq LS_f(y)e^{-\beta d(x',y)}\ \ \ \text{for all x', y at d(x',y)=k} \nonumber \\ Consider: & \nonumber \\ &d(x,y)=k+1\ \text{,there eixsts}\ \ x':d(x,x')=1, d(x',y)=k \nonumber \end{align} \]
  
  \[ \begin{align} S(x)&\geq e^{-\beta}S(x') \nonumber \\ &\geq LS_f(y) e^{-\beta(d(x',y)+1)} \nonumber \\ &= LS_f(y) e^{-\beta d(x,y)} \nonumber \end{align} \]

2.2 Calibrating Noise

Noise Magnitude
\[ N(x)={S_f(x) \over \alpha} \nonumber \]
Corollary 2.4

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

2.2.1 Admissible Noise Distribution

Notion:
\[ S+\Delta \ \mathop = \limits^\Delta \ \{z+\Delta|z\in S\} \nonumber \\ e^\lambda \cdot S \ \mathop = \limits^\Delta \ \{e^\lambda \cdot z|z\in S\} \nonumber \\ a±b \ \mathop = \limits^\Delta \ [a-b,a+b] \nonumber \]
Def 2.5 -- Admissible Noise Distribution

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

Lemma 2.6 -- (α, β)-admissible vs. (ε, δ)-differential privacy

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

Proof.
- Assumption
  \[ N(x) = {S(x) \over \alpha},\ N(y) = {S(y) \over \alpha} \nonumber \]
- proof of conclusion
  \[ \begin{align} Pr[A(x) \in S] &= Pr\left[Z \in {S-f(x) \over N(x)}\right] \nonumber \\ &\leq e^{\epsilon/2} Pr\left[Z \in {S-f(y) \over N(x)}\right] +\delta/2 \nonumber \\ &\leq e^{\epsilon} Pr\left[Z \in {S-f(y) \over N(y)}\right] +\delta \nonumber \\ &= e^{\epsilon} Pr[f(y) + N(y)Z \in S] +\delta \nonumber \\ &= e^{\epsilon} Pr[A(y)\in S] +\delta \nonumber \\ \end{align} \]
- proof on condition
  \[ \begin{align} ||\Delta||_1 &= ||f(x)-f(y)||_1/N(x) \nonumber \\ &= \alpha ||f(x)-f(y)||_1/S(x) \nonumber \\ &\leq \alpha ||f(x)-f(y)||_1/LS_f(x) \nonumber \\ &\leq \alpha \nonumber \end{align} \]
  
  \[ \begin{align} |\lambda| &= \left|ln{N(x) \over N(y)}\right| \nonumber \\ &= \left|ln{S(x) \over S(y)}\right|\nonumber \\ &\leq |ln(e^{±\beta})| \nonumber \\ &=\beta \nonumber \end{align} \]

2.2.2 Examples of Admissible Noise Distributions

proof ideas: (ε, δ)-differential privacy --> (α, β)-admissible -->
- sliding property
  \[ ln{S \over S+\Delta}=ln{h(z) \over h(z+\Delta)} \nonumber \]
- dilation property
  \[ ln{S \over e^ \lambda S} = ln{h(z)\over e^{d\lambda}h(e^\lambda z)} \nonumber \]
Three families:
- a generation of Cauchy distribution
- Laplacian distribution
- Gaussian distribution
Lemma 2.7 -- Generation of Cauchy Distribution

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

proof.
- dilation
  \[ \begin{align} ln{h(z)\over e^{\lambda}h(e^\lambda z)} &= ln{1+(e^\lambda|z|)^\gamma \over e^\lambda(1+|z|^\gamma)} \ \ \ \ \ \ (λ>0) \nonumber \\ &\leq ln{e^{\lambda \gamma}|z|^\gamma \over e^\lambda|z|^\gamma} \nonumber \\ &=\lambda(\gamma-1) \leq \epsilon/2 \nonumber \\ \lambda &\leq {\epsilon \over 2(\gamma-1)} \nonumber \end{align} \]
- sliding
  \[ \begin{align} ln{h(z) \over h(z+\Delta)} &= \varPhi(|z+\Delta|)-\varPhi(|z|)\ \ \ \ \varPhi(x) = ln(1+x^\lambda) \nonumber \\ &\leq |\Delta| \varPhi'(\xi) \nonumber \\ \varPhi'(x) &= {\lambda x^{\lambda-1} \over x^\lambda +1} \nonumber \\ &= {\lambda \over x+x^{1-\lambda}} \nonumber \\ &\leq \lambda \nonumber \\ ln{h(z) \over h(z+\Delta)} &\leq |\Delta| \lambda \leq \epsilon/2 \nonumber \\ ||\Delta||_1 &\leq {\epsilon \over 2 \lambda} \nonumber \end{align} \]
- d-dim
  \[ \begin{align} ln{h(z)\over e^{d\lambda}h(e^\lambda z)} &= ln{\left(1+(e^\lambda|z|)^\gamma\right)^d \over e^{d\lambda}(1+|z|^\gamma)^d} \ \ \ \ \ \ (λ>0) \nonumber \\ &\leq d\lambda(\gamma-1)<\epsilon/2 \nonumber \\ \lambda &\leq {\epsilon \over 2d(\gamma-1)}\nonumber \end{align} \]
  
  \[ \begin{align} ln{h(z) \over h(z+\Delta)} &= \varPhi(|z_1+\Delta_1|)-\varPhi(|z_1|)+\ ...\ +\varPhi(|z_d+\Delta_d|)-\varPhi(|z_d|) \nonumber \\ &\leq |\Delta_1| \varPhi'(\xi_1)+\ ...\ +|\Delta_d| \varPhi'(\xi_d) \nonumber \\ &\leq \lambda(|\Delta_1|+\ ...\ |\Delta_d|) \nonumber \\ &=\lambda|\Delta|\leq \epsilon/2 \nonumber \\ ||\Delta||_1 &\leq {\epsilon \over 2 \lambda} \nonumber \end{align} \]
Condition for δ=0
- Assume:
\[ \begin{align} h(z) = e^ {-f(z)},f(z)>0 \nonumber \end{align} \]
- Sliding:
  \[ \begin{align} ln{h(z) \over h(z+\Delta)} &=f(z+\Delta)-f(z) \leq \epsilon/2 \nonumber \\ \Delta f'(\xi) &\leq \epsilon/2 \nonumber \\ f'(\xi) &\leq C \nonumber \end{align} \]
- Dilation
\[ \begin{align} ln{h(z)\over e^{\lambda}h(e^\lambda z)} &= f(e^\lambda z)-f(z)-\lambda\leq \epsilon/2 \nonumber \\ g(\lambda) &= f(e^\lambda z) \nonumber \\ &= g(0)+g'(0)\lambda+O(\lambda^2) \nonumber \\ &= f(z)+zf'(z)\lambda+O(\lambda^2) \nonumber \\ f'(z) &\leq C/Z \nonumber \end{align} \]
Def 2.8 -- quantile

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

Lemma 2.9 -- Laplacian Distribution

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

proof.
- Sliding:
  \[ \begin{align} ln{h(z) \over h(z+\Delta)} & =||z+\Delta||_1-|z||_1 \leq \epsilon/2 \nonumber \\ ||\Delta||_1 &\leq \epsilon/2 \nonumber \end{align} \]
Dilation
\[ \begin{align} ln{h(z)\over e^{d\lambda}h(e^\lambda z)} &= ||z||_1(e^{\lambda}-1) - d\lambda \nonumber \\ &\geq ||z||_1\lambda-d\lambda \nonumber \\ when\ ||z||_1 \leq \rho_{\delta/2}&,\ ln{h(z)\over e^{d\lambda}h(e^\lambda z)} \leq \epsilon/2 \nonumber \\ (\rho_{\delta/2}-d)\lambda &\leq\epsilon/2 \nonumber \\ \lambda &\leq{\epsilon \over 2(\rho_{\delta/2}-d)} \nonumber \\ sufficient\ to\ use\ \beta={\epsilon \over 2\rho_{\delta/2}} &\leq{\epsilon \over 2(\rho_{\delta/2}-d)} \nonumber \\ \end{align} \]
Lemma 2.10 -- Gaussian Distribution

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

3. Computing Smooth Sensitivity

Def 3.1

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

compute the smooth sensitivity of f at x --> Compute A^(k)(x).

3.1 Smooth Sensitivity of the Median

Proposition 3.4 -- Smooth Sensitivity of f_med

Assume:
\[ n\ \ is\ \ odd,\ m=\frac {n+1} 2 \nonumber \\ 0 \leq x_1 \leq \ ...\ \leq x_n \leq \Lambda \nonumber \]

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

\[ A^{(k)}(x) \rightarrow O(k) \nonumber \\ S_{f_{med,\epsilon}}^*(x) = \mathop{max}_{k=0..n}e^{-k\epsilon}A^{(k)}(x) \rightarrow O(n^2) \nonumber \\ Divide\ and\ Conquer\ algorithm \rightarrow O(n\ logn) \]

Algorithm
- Idea:
  \[ S_{f_{med}}^*(x) = \mathop{max}_{i \leq n/2 \leq j}(x_j - x_i) e^ {-\epsilon(j-i-1)} \nonumber \\ compute\ S_{f_{med}}^*(x) \rightarrow \ find\ j^*(i) = \mathop{argmax}_{j \geq n/2}(x_j-x_i)e^ {-\epsilon(j-i-1)} \nonumber \]
- Recursive algorithm:

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

3.2 Smooth Sensitivity of the Minimum

Claim 3.6 -- Smooth Sensitivity of f_min

Assume:
\[ 0 \leq x_1 \leq \ ...\ \leq x_n \leq \Lambda \]
Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

3.3 Smooth Sensitivity of The Cost of a Minimum Spanning Tree

Prim's Algorithm

Kruskal's Algorithm

Notion:
- Graph
  \[ G=(V,E) \]
- edge weights:
  \[ w(e) \in [0, \Lambda] \]
- cut: A cut in G is a partition of the vertexes V in two nonempty subsets, S and V/S.
  \[ S \subset V \rightarrow (S,V/S) \]
- edge (i, j) crosses the cut S:
  \[ \{(i,j)\ |\ i \in S, j \in V/S\}\\ \]
  
  \[ w_t(S)=t^{th}\ element\ in\ the\ list\{w\left( (i,j) \right)\ |\ i \in S,j \in V/S \} \text{ sorted in non-decreasing order} \\ w_t(S) = \Lambda \text{ if the cut S has less than t edges} \]
  
  \[ l(S) = |\{(i,j)\ |\ i \in S, j \in V/S\}| \]
Lemma 3.7 -- Smooth Sensitivity of f_MST

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

Lemma 3.8 -- Smooth Sensitivity of f_MST _ time

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

3.4 Smooth Sensitivity of the Number of Triangles in a Graph

Notion:
- adjacency matrix x --> undirected graph G(n) with n nodes
- x_i,j = x_j,i --> edge
- Hamming distance
  \[ d(x,x') = \sum_{i \geq j} I \left( x(i,j) \neq x'(i,j)\right) \]
- Let f_Δ(x) --> the number of triangles in the graph represented by x.
- a_i,j --> the number of triangles involving a potential edge (i, j)
\[ a_{i,j} = \sum_{k \in [n]} x_{i,k} \cdot x_{j,k} \]
- b_i,j --> the number of half-built triangles involving a potential edge (i, j)

\[ b_{i,j} = \sum_{k \in [n]} x_{i,k} \bigoplus x_{j,k} \]

Global Sensitivity
\[ GS_f = max(a_{i,j}) = n-2 \]
Claim 3.13 -- Smooth Sensitivity of f_Δ

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

proof:
\[ y = x + s\ edges \rightarrow d(x,y)=s \]

\[ original\ triangles \rightarrow a_{i,j} \nonumber \\ new\ triangles \rightarrow \begin{cases} s &\ if\ s \leq b_{i,j}\\ \left\lfloor\frac {s+b_{i,j}} 2 \right\rfloor&\ if\ s>b_{i,j} \end{cases} \nonumber \]

\[ LS_f(y) = \mathop{min}_{i \neq j} \left( a_{i,j}+\left\lfloor \frac {s+min(s,b_{i,j})} 2 \right\rfloor, n-2\right) = c_{i,j}(s) \]

\[ A^{(s)}(x) = max\left(LS_f(y)\right)=max\left(c_{i,j}(s)\right) \]

4. Sample-Aggregate Framework

4.1 A Motivating Example: Clustering

Example:
- k-squared-error-distortion (k-SED) clustering (also called “k-means”)
- input
  \[ x=\{x_1,\ ...\ ,x_n\},\ x_i \in \Bbb{R}^l\ with\ diameter\ \Lambda \]
- output
  \[ f_{cc}=\{c_1,\ ...\ ,c_k\} \ \ \ with\ minimum\ cost \\ cost_x(c_1,...c_k)=\frac 1 n \sum_{i=1}^n \mathop{min}_j||x_i-c_j||_2^2 \]
- Wasserstein Distance
  \[ d_W\{(c_1,...c_k),\ (\hat{c}_1,...\hat{c}_k)\} =\left( \mathop{min}_\pi \sum_{j=1}^k ||c_j-\hat{c}_\pi||_2^2 \right) ^ \frac 1 2 \]
Global Sensitivity:
\[ GS_{f_{cc}} = \Omega (\Lambda) \]
- Problem
In “well-clustered” instances, the local sensitivity LS_f should be low.

Can not approximate a smooth bound on LS_f efficiently.

4.2 Basic Framework

Metric Space:
\[ Metric\ {\cal M}, with\ distance\ function\ {\cal d_M(.,.)}\ and\ diameter\ \Lambda \]
Idea: Replace f with a related function f which smooth sensitivity is
- low
- efficiently computable
Framework
- Randomly partitioning the database x into m small databases
  \[ databases:x|_{U_1} ... x|{U_m} \]
- Evaluate f on m small databases
  \[ f(x|_{U_i})=z_i \]
- Aggregation by the center of attention g()
  \[ \mathop{f}^-(x) = g\left( f(x|_{U_1}), ... , f(x|_{U_m})\right) \]

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

Def 4.1 -- Good Approximation

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

Theorem 4.2

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

4.3 Good Aggregations for General Metric Spaces

Idea
- Independently select m small databases of size n/m (each chosen uniformly without replacement)
- Chain (Lemma 4.11)
  \[ \begin{align} 1\ points &\rightarrow (at\ most)\ \ s\ databases\ \ (with\ probability\ p) \nonumber \\ &\rightarrow s\ input\ z=f(x|_{U_i})\ of\ g \nonumber \end{align} \]
Def 4.4 -- Smooth Upper Bound with step s

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

Def 4.5 -- Good Aggregation

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

Example: Median function
- Sensitivity
  \[ g(z) = median(z),\ z \in [0,\Lambda]\\ S(z) = \mathop{max}_{k\geq0} LS_g^{(s(k+1))}(z) \cdot e^{-\beta k}\\ \]
- proof.
  
  Smooth Upper Bound (a):
  \[ S(z) = max\left(LS_g^{(s)}(z),\ \mathop{max}_{k>0} LS_g^{(s(k+1))}(z) \cdot e^{-\beta k}\right) \geq LS_g^{s}(z) \\ \]
  Smooth Upper Bound(b):
  \[ \begin{align} Suppose:\ &d(z,z_a) = ks,\ d(z,z') = s\ \rightarrow\ d(z',z_a)\leq (k+1)s \nonumber \\ &S(z) = LS_g^{(s)}(z_a)e^{-\beta k} \nonumber \end{align} \]
  
  \[ \begin{align} S(z') &\geq LS_g^{(s)}(z_a)e^{-\beta d(z',z_a)} \nonumber \\ &\geq LS_g^{(s)}(z_a)e^{-\beta (k+1)} \nonumber \\ &= e^{-\beta}S(z) \nonumber \end{align} \]
  
  2(a):
  \[ \frac {2m} 3\ entries\ in\ z \in \cal{B}(r) \rightarrow g(z) \in B(2r) \]
  2(b):
  \[ \begin{align} Condition 1&: k < \frac m {6s} \rightarrow LS_g^{(s(k+1))}(z) \leq 2r \rightarrow S(z)\leq2r \nonumber \\ Condition 2&：k \geq \frac m {6s} \rightarrow LS_g^{(s(k+1))}(z) \leq \Lambda \rightarrow S(z)\leq\Lambda e^{-\frac m {6s}} \nonumber \end{align} \]
  
  \[ S(z) \leq max(2r,\Lambda e^{-\frac m {6s}}) \]
- NP-hard

4.4 The Center of Attention is a Good Aggregation

Def 4.6 -- Unconstrained Center of Attention

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

when s points are removed from z --> a majority of the remaining points is contained inside the ball
Always exists. May not be unique.
Smooth Sensitivity
\[ S_0(z) = 2\ \mathop{max}_{k\geq0}(r^{(z)}(t_0+(k+1)s)e^{-\beta k}) \]

\[ r^{(z)}(t):\ the\ minimum\ t-radius\ of\ any\ point\ in\ \cal{M} \]
Proof.
- Smooth Upper Bound (a):
\[ (a)\ Ball\ {\cal B}(g_0(z),r^{(z)}(t_0)) \\ (b)\ Ball\ {\cal B}(g_0(z'),r^{(z')}(t_0)) \\ \{z\ \cap\ z'\} \neq \varnothing \ \ \rightarrow\ \ d(g_0(z),g_0(z')) \leq r^{(z)}(t_0)+r^{(z')}(t_0)\\ r^{(z')}(t_0) \leq r^{(z)}(t_0+s) \ \ \rightarrow\ \ LS_{g_0}^{(s)} \leq r^{(z)}(t_0)+r^{(z)}(t_0+s) \leq 2r^{(z)}(t_0+s) \]
- Smooth Upper Bound (b):
\[ \begin{align} S_0(z) &\leq 2\ \mathop{max}_{k\geq0}(r^{(z)}(t_0+(k+2)s)e^{-\beta k}) \nonumber \\ &\leq e^\beta \cdot 2\ \mathop{max}_{k'\geq1}(r^{(z)}(t_0+(k'+1)s)e^{-\beta k'}) \nonumber \end{align} \]
- Suppose:
\[ \frac {2m} 3\ entries\ in\ z \in {\cal B}(c,r)\\ r^{(z)}(t_0)\leq r \]
- 2(a):
\[ \frac {2m} 3\ entries\ in\ z \in {\cal B}(c,r) \rightarrow \{z\ \cap\ {\cal B}(c,r)\} \neq \varnothing \\ d(c,g_0(z)) \leq r+r^{(z)}(t_0) \leq 2r \]
- 2(b):
\[ \begin{align} Condition 1&: k < \frac m {6s} \rightarrow r^{(z)}(t_0+(k+1)s) \leq r \rightarrow S(z)\leq2r \nonumber \\ Condition 2&：k \geq \frac m {6s} \rightarrow r^{(z)}(t_0+(k+1)s) \leq \Lambda \rightarrow S(z)\leq\Lambda e^{-\frac m {6s}} \nonumber \\ \end{align} \]

\[ S_0(z) \leq 2max(r,\Lambda e^{-\frac m {6s}}) \]
Def 4.8 -- Center of Attention

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

Smooth sensitivity

\[ S(z) = 2\ \mathop{max}_{k\geq0}(\rho(t_0+(k+1)s)e^{-\beta k}) \]

\[ \rho(t) = \frac1a \sum_{i=1}^ar_i(t) \]

Theorem 4.9

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

4.5 Using a Good Aggregation: Proof of Theorem 4.2

Lemma 4.11 -- Chain's Probability

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

Proof.

Binomial distribution:
\[ f(k,n,p) = \begin{pmatrix} n\\k \end{pmatrix} p^{k}(1-p)^{n-k} \\ \begin{pmatrix} n\\k \end{pmatrix} = {n!\over(n-k)!k!} \]

\[ for\ a\ point\ x,\ the\ probability\ p<\sum_{k \geq s} \begin{pmatrix} m\\k \end{pmatrix} (\frac 1m)^{k}(1-\frac 1m)^{m-k} \leq \sum_{k \geq s}\frac 1 {k!} \\ s>1:\ \sum_{i \geq s}\frac 1 {k!}\leq \frac 2 {s!} \\ Pr(n\ points\ less\ than\ s\ sets) = 1-\frac{2n}{s!} \]

Protocol

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

Proof.
\[ s=\sqrt m \\ S(z) = O(r) +\Lambda e^{-\Omega(\beta m/s)} \\ \]

\[ \begin{align} N(x) &= S(z)/\alpha\leftarrow\alpha=\frac \epsilon 2,\ \beta=\frac \epsilon {4(d+ln2/\delta)} \nonumber \\ &= 2O(\frac r \epsilon) +\frac {2\Lambda} \epsilon e^ {-\Omega(\frac {\epsilon \sqrt m} {4(d+ln2/\delta)})} \nonumber \\ &= O(\frac r \epsilon) +\frac {\Lambda} \epsilon e^ {-\Omega(\frac {\epsilon \sqrt m} d)} \nonumber \end{align} \]

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

Smooth Sensitivity and Sampling in Private Data Analysis

Abstract

1. Intro

1.1 privacy Definition

1.2 Calibrating Noise to Sensitivity

1.3 Smooth Sensitivity

1.4 The Sample and Aggregate Framework

2. Instance-Specific Additive Noise

2.1 Smooth Upper Bounds on LSf and Smooth Sensitivity

2.2 Calibrating Noise

2.2.1 Admissible Noise Distribution

2.2.2 Examples of Admissible Noise Distributions

3. Computing Smooth Sensitivity

3.1 Smooth Sensitivity of the Median

3.2 Smooth Sensitivity of the Minimum

3.3 Smooth Sensitivity of The Cost of a Minimum Spanning Tree

3.4 Smooth Sensitivity of the Number of Triangles in a Graph

4. Sample-Aggregate Framework

4.1 A Motivating Example: Clustering

4.2 Basic Framework

4.3 Good Aggregations for General Metric Spaces

4.4 The Center of Attention is a Good Aggregation

4.5 Using a Good Aggregation: Proof of Theorem 4.2

相关推荐

2.1 Smooth Upper Bounds on LS_f and Smooth Sensitivity