Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

Smooth Sensitivity and Sampling in Private Data Analysis

文章:Nissim, Kobbi, Sofya Raskhodnikova, and Adam Smith. "Smooth sensitivity and sampling in private data analysis." symposium on the theory of computing (2007): 75-84.
说明:部分证明是自己推导地,因此和论文中的可能有一点差异;第5节没有放上来。


Abstract

  • Framework: release functions f of the data with instance-specific additive noise.

\[ Noise\ Magnitude \leftarrow function\ f \ +\ database\ x \nonumber \]

  • Smooth sensitivity (of f on database x): a measure of variability of f in the neighborhood of x.
  • Work:
    • How to compute or approximate the smooth sensitivity of f for several functions
    • A generic procedure based on sampling

1. Intro

  • Data privacy
  • Data analysis
  • Challenges: release aggregate information about the databases while protecting the privacy of individual contributors.
  • Differential Privacy
    • Output perturbation
    • Global sensitivity
    • Noise <-- only on function f
  • Instance-specific Noise
    • Smooth sensitivity
    • Noise <-- function f + database x
    • Pros: release f(x) with less noise --> expand the application
      • Real number in a bounded range
      • Graphs where individuals hold information about edge
    • Cons: smooth sensitivity <-- non-trivial, not automated
  • Sample and Aggregate
    • fully automated, efficient
    • even when f is a black box
    • E.g. k-means clustering, mixture of Gaussian model

1.1 privacy Definition

  • Def 1.1 -- Differential Privacy

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

  • Lemma 1.2 -- Composition

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis


1.2 Calibrating Noise to Sensitivity

  • Def 1.3 -- Global Sensitivity

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

  • Def 1.4 -- Laplacian Distribution

\[ \begin{align} Lap(\lambda): \nonumber \\ & pdf:\ \ h(z) = \frac 1 {2\lambda} e^ {-|z|/\lambda} \nonumber \\ & mean:\ \ E(z) = 0 \nonumber \\ &std: \ \ std(z) = \sqrt{2}\lambda \nonumber \end{align} \]

  • Claim 1.5 Laplacian Mechanism

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

  • Two approaches:

​ (1) GSf is low --> add noise Lap(GSf /ε)

​ (2) Express f by g1, g2 ...--> add noise to g1, g2 ... --> composition property mark


1.3 Smooth Sensitivity

  • Def 1.6 -- Local sensitivity

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

  • Relationship with global sensitivity

\[ GS_f(x) = max_x LS_f(x) \nonumber \]

  • Problem: Noise magnitude reveals information about database

    ​ --> Noise magnitude is guaranteed to be an insensitive function

    ​ --> A class of smooth upper bounds Sf on LSf --> Add noise ∝ Sf is safe

    ​ --> Add less noise --> Optimal Sf*: SfSf*


1.4 The Sample and Aggregate Framework

  • Idea: Replace f with a related function f which smooth sensitivity is
    • low
    • efficiently computable
  • Steps
    • evaluate f on random samples from x for several times
    • combine the results with aggregation function(center of attention)
  • Intuition: Each entry xi can be changed without affecting the value of function signicantlt

  • Applications:
    • k-means cluster centers (data is well separated)
    • accurate parameters of a mixture of k spherical Gaussian distributions

2. Instance-Specific Additive Noise

  • Interactive framework:

\[ {\cal A}(x) = f(x)+N(x)Z \nonumber \]

Z: a random variable drawn from a noise distribution in Rd with std=1

N(x): noise magnitude ∝ Sensitivity


2.1 Smooth Upper Bounds on LSf and Smooth Sensitivity

  • Example:

    • fmed(x) = median(x1 ... xn), xi∈ D = [0, Λ]

    • Simplify: n is odd, x1 < x2 < ... < xn --> m = (n+1)/2, fmed(x) = xm

    • Global Sensitivity: GSf = Λ --> N(x) ∝ GSf is too large

    • Local Sensitivity: LSf = max(xm-xm-1, xm+1-xm) --> N(x) ∝ LSf reveal information about database

      --> E.g.

      ​ database x : x1 = ... = xm+1 = 0, xm+2 = ... = xn = Λ --> LSf (x) = 0 --> Pr[A(x)≠0] = 0

      ​ database x' : x1 = ... = xm = 0, xm+1 = ... = xn = Λ --> LSf (x)= Λ -->Pr[A(x')≠0] > 0

      ​ --> can not satisfy differential privacy:
      \[ {Pr[A(x)≠0] \over Pr[A(x')≠0]} \leq e^\epsilon +\delta \nonumber \]

    • Lesson: N(x) has to be an insensitive function

  • Def 2.1 -- A Smooth Bound on LSf

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

​ (1) upper bounds on LSf

​ (2) not sensitive

  • Def 2.2 -- Smooth Sensitivity

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

  • Lemma 2.3

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

  • proof

    • Def 2.1_(1)
      \[ \begin{align} S_{f,\beta}^*(x) &= max\left(LS_f(x),\mathop{max} \limits_{y \neq x;\ y \in D^n}(LS_f(y)e^{-\beta d(x,y)})\right) \nonumber \\ &{\geq} LS_f(x) \nonumber \end{align} \]

    • Def 2.1_(2)
      \[ \begin{align} Suppose: & \nonumber \\ &S_{f,\beta}^*(x) = LS_f(x')e^{-\beta d(x,x')} \nonumber \\ &d(x,y)=1 \rightarrow d(x',y) \leq d(x',x)+d(x,y) = d(x,x')+1 \nonumber \end{align} \]

      \[ \begin{align} S_{f,\beta}^*(y) &\geq LS(x')e^{-\beta d(x',y)} \nonumber \\ &\geq LS(x')e^{-\beta \left(d(x,x')+1\right)} \nonumber \\ &\geq e^{-\beta} \cdot LS(x')e^{-\beta d(x,x')} \nonumber \\ &=e^{-\beta}S_{f,\beta}^*(x)\nonumber \end{align} \]

    • Optimal: S(x) ≥ Sf,β*(x)
      \[ S(x) ≥ S_{f,β}^*(x) \text{ for all}\ x\in D^n \rightarrow S(x)\geq LS_f(y)e^{-\beta d(x,y)}\text{ for all}\ x,y\in D^n \nonumber \]

      \[ \begin{align} Suppose: & \nonumber \\ &S(x') \geq LS_f(y)e^{-\beta d(x',y)}\ \ \ \text{for all x', y at d(x',y)=k} \nonumber \\ Consider: & \nonumber \\ &d(x,y)=k+1\ \text{,there eixsts}\ \ x':d(x,x')=1, d(x',y)=k \nonumber \end{align} \]

      \[ \begin{align} S(x)&\geq e^{-\beta}S(x') \nonumber \\ &\geq LS_f(y) e^{-\beta(d(x',y)+1)} \nonumber \\ &= LS_f(y) e^{-\beta d(x,y)} \nonumber \end{align} \]


2.2 Calibrating Noise

  • Noise Magnitude
    \[ N(x)={S_f(x) \over \alpha} \nonumber \]

  • Corollary 2.4

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis


2.2.1 Admissible Noise Distribution

  • Notion:
    \[ S+\Delta \ \mathop = \limits^\Delta \ \{z+\Delta|z\in S\} \nonumber \\ e^\lambda \cdot S \ \mathop = \limits^\Delta \ \{e^\lambda \cdot z|z\in S\} \nonumber \\ a±b \ \mathop = \limits^\Delta \ [a-b,a+b] \nonumber \]

  • Def 2.5 -- Admissible Noise Distribution

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

  • Lemma 2.6 -- (α, β)-admissible vs. (ε, δ)-differential privacy

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

  • Proof.

    • Assumption
      \[ N(x) = {S(x) \over \alpha},\ N(y) = {S(y) \over \alpha} \nonumber \]

    • proof of conclusion
      \[ \begin{align} Pr[A(x) \in S] &= Pr\left[Z \in {S-f(x) \over N(x)}\right] \nonumber \\ &\leq e^{\epsilon/2} Pr\left[Z \in {S-f(y) \over N(x)}\right] +\delta/2 \nonumber \\ &\leq e^{\epsilon} Pr\left[Z \in {S-f(y) \over N(y)}\right] +\delta \nonumber \\ &= e^{\epsilon} Pr[f(y) + N(y)Z \in S] +\delta \nonumber \\ &= e^{\epsilon} Pr[A(y)\in S] +\delta \nonumber \\ \end{align} \]

    • proof on condition
      \[ \begin{align} ||\Delta||_1 &= ||f(x)-f(y)||_1/N(x) \nonumber \\ &= \alpha ||f(x)-f(y)||_1/S(x) \nonumber \\ &\leq \alpha ||f(x)-f(y)||_1/LS_f(x) \nonumber \\ &\leq \alpha \nonumber \end{align} \]

      \[ \begin{align} |\lambda| &= \left|ln{N(x) \over N(y)}\right| \nonumber \\ &= \left|ln{S(x) \over S(y)}\right|\nonumber \\ &\leq |ln(e^{±\beta})| \nonumber \\ &=\beta \nonumber \end{align} \]


2.2.2 Examples of Admissible Noise Distributions

  • proof ideas: (ε, δ)-differential privacy --> (α, β)-admissible -->

    • sliding property
      \[ ln{S \over S+\Delta}=ln{h(z) \over h(z+\Delta)} \nonumber \]

    • dilation property
      \[ ln{S \over e^ \lambda S} = ln{h(z)\over e^{d\lambda}h(e^\lambda z)} \nonumber \]

  • Three families:

    • a generation of Cauchy distribution
    • Laplacian distribution
    • Gaussian distribution
  • Lemma 2.7 -- Generation of Cauchy Distribution

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

  • proof.

    • dilation
      \[ \begin{align} ln{h(z)\over e^{\lambda}h(e^\lambda z)} &= ln{1+(e^\lambda|z|)^\gamma \over e^\lambda(1+|z|^\gamma)} \ \ \ \ \ \ (λ>0) \nonumber \\ &\leq ln{e^{\lambda \gamma}|z|^\gamma \over e^\lambda|z|^\gamma} \nonumber \\ &=\lambda(\gamma-1) \leq \epsilon/2 \nonumber \\ \lambda &\leq {\epsilon \over 2(\gamma-1)} \nonumber \end{align} \]

    • sliding
      \[ \begin{align} ln{h(z) \over h(z+\Delta)} &= \varPhi(|z+\Delta|)-\varPhi(|z|)\ \ \ \ \varPhi(x) = ln(1+x^\lambda) \nonumber \\ &\leq |\Delta| \varPhi'(\xi) \nonumber \\ \varPhi'(x) &= {\lambda x^{\lambda-1} \over x^\lambda +1} \nonumber \\ &= {\lambda \over x+x^{1-\lambda}} \nonumber \\ &\leq \lambda \nonumber \\ ln{h(z) \over h(z+\Delta)} &\leq |\Delta| \lambda \leq \epsilon/2 \nonumber \\ ||\Delta||_1 &\leq {\epsilon \over 2 \lambda} \nonumber \end{align} \]

    • d-dim
      \[ \begin{align} ln{h(z)\over e^{d\lambda}h(e^\lambda z)} &= ln{\left(1+(e^\lambda|z|)^\gamma\right)^d \over e^{d\lambda}(1+|z|^\gamma)^d} \ \ \ \ \ \ (λ>0) \nonumber \\ &\leq d\lambda(\gamma-1)<\epsilon/2 \nonumber \\ \lambda &\leq {\epsilon \over 2d(\gamma-1)}\nonumber \end{align} \]

      \[ \begin{align} ln{h(z) \over h(z+\Delta)} &= \varPhi(|z_1+\Delta_1|)-\varPhi(|z_1|)+\ ...\ +\varPhi(|z_d+\Delta_d|)-\varPhi(|z_d|) \nonumber \\ &\leq |\Delta_1| \varPhi'(\xi_1)+\ ...\ +|\Delta_d| \varPhi'(\xi_d) \nonumber \\ &\leq \lambda(|\Delta_1|+\ ...\ |\Delta_d|) \nonumber \\ &=\lambda|\Delta|\leq \epsilon/2 \nonumber \\ ||\Delta||_1 &\leq {\epsilon \over 2 \lambda} \nonumber \end{align} \]

  • Condition for δ=0

    • Assume:

    \[ \begin{align} h(z) = e^ {-f(z)},f(z)>0 \nonumber \end{align} \]

    • Sliding:
      \[ \begin{align} ln{h(z) \over h(z+\Delta)} &=f(z+\Delta)-f(z) \leq \epsilon/2 \nonumber \\ \Delta f'(\xi) &\leq \epsilon/2 \nonumber \\ f'(\xi) &\leq C \nonumber \end{align} \]

    • Dilation

    \[ \begin{align} ln{h(z)\over e^{\lambda}h(e^\lambda z)} &= f(e^\lambda z)-f(z)-\lambda\leq \epsilon/2 \nonumber \\ g(\lambda) &= f(e^\lambda z) \nonumber \\ &= g(0)+g'(0)\lambda+O(\lambda^2) \nonumber \\ &= f(z)+zf'(z)\lambda+O(\lambda^2) \nonumber \\ f'(z) &\leq C/Z \nonumber \end{align} \]

  • Def 2.8 -- quantile

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

  • Lemma 2.9 -- Laplacian Distribution

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

  • proof.

    • Sliding:
      \[ \begin{align} ln{h(z) \over h(z+\Delta)} & =||z+\Delta||_1-|z||_1 \leq \epsilon/2 \nonumber \\ ||\Delta||_1 &\leq \epsilon/2 \nonumber \end{align} \]
  • Dilation
    \[ \begin{align} ln{h(z)\over e^{d\lambda}h(e^\lambda z)} &= ||z||_1(e^{\lambda}-1) - d\lambda \nonumber \\ &\geq ||z||_1\lambda-d\lambda \nonumber \\ when\ ||z||_1 \leq \rho_{\delta/2}&,\ ln{h(z)\over e^{d\lambda}h(e^\lambda z)} \leq \epsilon/2 \nonumber \\ (\rho_{\delta/2}-d)\lambda &\leq\epsilon/2 \nonumber \\ \lambda &\leq{\epsilon \over 2(\rho_{\delta/2}-d)} \nonumber \\ sufficient\ to\ use\ \beta={\epsilon \over 2\rho_{\delta/2}} &\leq{\epsilon \over 2(\rho_{\delta/2}-d)} \nonumber \\ \end{align} \]

  • Lemma 2.10 -- Gaussian Distribution

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis


3. Computing Smooth Sensitivity

  • Def 3.1

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

compute the smooth sensitivity of f at x --> Compute A(k)(x).


3.1 Smooth Sensitivity of the Median

  • Proposition 3.4 -- Smooth Sensitivity of fmed

    Assume:
    \[ n\ \ is\ \ odd,\ m=\frac {n+1} 2 \nonumber \\ 0 \leq x_1 \leq \ ...\ \leq x_n \leq \Lambda \nonumber \]

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

\[ A^{(k)}(x) \rightarrow O(k) \nonumber \\ S_{f_{med,\epsilon}}^*(x) = \mathop{max}_{k=0..n}e^{-k\epsilon}A^{(k)}(x) \rightarrow O(n^2) \nonumber \\ Divide\ and\ Conquer\ algorithm \rightarrow O(n\ logn) \]

  • Algorithm

    • Idea:
      \[ S_{f_{med}}^*(x) = \mathop{max}_{i \leq n/2 \leq j}(x_j - x_i) e^ {-\epsilon(j-i-1)} \nonumber \\ compute\ S_{f_{med}}^*(x) \rightarrow \ find\ j^*(i) = \mathop{argmax}_{j \geq n/2}(x_j-x_i)e^ {-\epsilon(j-i-1)} \nonumber \]

    • Recursive algorithm:

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis


3.2 Smooth Sensitivity of the Minimum

  • Claim 3.6 -- Smooth Sensitivity of fmin

Assume:
\[ 0 \leq x_1 \leq \ ...\ \leq x_n \leq \Lambda \]
Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis


3.3 Smooth Sensitivity of The Cost of a Minimum Spanning Tree

Prim's Algorithm

Kruskal's Algorithm

  • Notion:

    • Graph
      \[ G=(V,E) \]

    • edge weights:
      \[ w(e) \in [0, \Lambda] \]

    • cut: A cut in G is a partition of the vertexes V in two nonempty subsets, S and V/S.
      \[ S \subset V \rightarrow (S,V/S) \]

    • edge (i, j) crosses the cut S:
      \[ \{(i,j)\ |\ i \in S, j \in V/S\}\\ \]

      \[ w_t(S)=t^{th}\ element\ in\ the\ list\{w\left( (i,j) \right)\ |\ i \in S,j \in V/S \} \text{ sorted in non-decreasing order} \\ w_t(S) = \Lambda \text{ if the cut S has less than t edges} \]

      \[ l(S) = |\{(i,j)\ |\ i \in S, j \in V/S\}| \]

  • Lemma 3.7 -- Smooth Sensitivity of fMST

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

  • Lemma 3.8 -- Smooth Sensitivity of fMST _ time

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis


3.4 Smooth Sensitivity of the Number of Triangles in a Graph

  • Notion:

    • adjacency matrix x --> undirected graph G(n) with n nodes

    • xi,j = xj,i --> edge

    • Hamming distance
      \[ d(x,x') = \sum_{i \geq j} I \left( x(i,j) \neq x'(i,j)\right) \]

    • Let fΔ(x) --> the number of triangles in the graph represented by x.

    • ai,j --> the number of triangles involving a potential edge (i, j)

    \[ a_{i,j} = \sum_{k \in [n]} x_{i,k} \cdot x_{j,k} \]

    • bi,j --> the number of half-built triangles involving a potential edge (i, j)

\[ b_{i,j} = \sum_{k \in [n]} x_{i,k} \bigoplus x_{j,k} \]

  • Global Sensitivity
    \[ GS_f = max(a_{i,j}) = n-2 \]

  • Claim 3.13 -- Smooth Sensitivity of fΔ

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

  • proof:
    \[ y = x + s\ edges \rightarrow d(x,y)=s \]

    \[ original\ triangles \rightarrow a_{i,j} \nonumber \\ new\ triangles \rightarrow \begin{cases} s &\ if\ s \leq b_{i,j}\\ \left\lfloor\frac {s+b_{i,j}} 2 \right\rfloor&\ if\ s>b_{i,j} \end{cases} \nonumber \]

    \[ LS_f(y) = \mathop{min}_{i \neq j} \left( a_{i,j}+\left\lfloor \frac {s+min(s,b_{i,j})} 2 \right\rfloor, n-2\right) = c_{i,j}(s) \]

    \[ A^{(s)}(x) = max\left(LS_f(y)\right)=max\left(c_{i,j}(s)\right) \]


4. Sample-Aggregate Framework

4.1 A Motivating Example: Clustering

  • Example:

    • k-squared-error-distortion (k-SED) clustering (also called “k-means”)

    • input
      \[ x=\{x_1,\ ...\ ,x_n\},\ x_i \in \Bbb{R}^l\ with\ diameter\ \Lambda \]

    • output
      \[ f_{cc}=\{c_1,\ ...\ ,c_k\} \ \ \ with\ minimum\ cost \\ cost_x(c_1,...c_k)=\frac 1 n \sum_{i=1}^n \mathop{min}_j||x_i-c_j||_2^2 \]

    • Wasserstein Distance
      \[ d_W\{(c_1,...c_k),\ (\hat{c}_1,...\hat{c}_k)\} =\left( \mathop{min}_\pi \sum_{j=1}^k ||c_j-\hat{c}_\pi||_2^2 \right) ^ \frac 1 2 \]

  • Global Sensitivity:
    \[ GS_{f_{cc}} = \Omega (\Lambda) \]

    • Problem

    In “well-clustered” instances, the local sensitivity LSf should be low.

    Can not approximate a smooth bound on LSf efficiently.


4.2 Basic Framework

  • Metric Space:
    \[ Metric\ {\cal M}, with\ distance\ function\ {\cal d_M(.,.)}\ and\ diameter\ \Lambda \]

  • Idea: Replace f with a related function f which smooth sensitivity is

    • low
    • efficiently computable
  • Framework

    • Randomly partitioning the database x into m small databases
      \[ databases:x|_{U_1} ... x|{U_m} \]

    • Evaluate f on m small databases
      \[ f(x|_{U_i})=z_i \]

    • Aggregation by the center of attention g()
      \[ \mathop{f}^-(x) = g\left( f(x|_{U_1}), ... , f(x|_{U_m})\right) \]

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

  • Def 4.1 -- Good Approximation

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

  • Theorem 4.2

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis


4.3 Good Aggregations for General Metric Spaces

  • Idea

    • Independently select m small databases of size n/m (each chosen uniformly without replacement)

    • Chain (Lemma 4.11)
      \[ \begin{align} 1\ points &\rightarrow (at\ most)\ \ s\ databases\ \ (with\ probability\ p) \nonumber \\ &\rightarrow s\ input\ z=f(x|_{U_i})\ of\ g \nonumber \end{align} \]

  • Def 4.4 -- Smooth Upper Bound with step s

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

  • Def 4.5 -- Good Aggregation

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

  • Example: Median function

    • Sensitivity
      \[ g(z) = median(z),\ z \in [0,\Lambda]\\ S(z) = \mathop{max}_{k\geq0} LS_g^{(s(k+1))}(z) \cdot e^{-\beta k}\\ \]

    • proof.

      Smooth Upper Bound (a):
      \[ S(z) = max\left(LS_g^{(s)}(z),\ \mathop{max}_{k>0} LS_g^{(s(k+1))}(z) \cdot e^{-\beta k}\right) \geq LS_g^{s}(z) \\ \]
      Smooth Upper Bound(b):
      \[ \begin{align} Suppose:\ &d(z,z_a) = ks,\ d(z,z') = s\ \rightarrow\ d(z',z_a)\leq (k+1)s \nonumber \\ &S(z) = LS_g^{(s)}(z_a)e^{-\beta k} \nonumber \end{align} \]

      \[ \begin{align} S(z') &\geq LS_g^{(s)}(z_a)e^{-\beta d(z',z_a)} \nonumber \\ &\geq LS_g^{(s)}(z_a)e^{-\beta (k+1)} \nonumber \\ &= e^{-\beta}S(z) \nonumber \end{align} \]

      2(a):
      \[ \frac {2m} 3\ entries\ in\ z \in \cal{B}(r) \rightarrow g(z) \in B(2r) \]
      2(b):
      \[ \begin{align} Condition 1&: k < \frac m {6s} \rightarrow LS_g^{(s(k+1))}(z) \leq 2r \rightarrow S(z)\leq2r \nonumber \\ Condition 2&:k \geq \frac m {6s} \rightarrow LS_g^{(s(k+1))}(z) \leq \Lambda \rightarrow S(z)\leq\Lambda e^{-\frac m {6s}} \nonumber \end{align} \]

      \[ S(z) \leq max(2r,\Lambda e^{-\frac m {6s}}) \]

    • NP-hard


4.4 The Center of Attention is a Good Aggregation

  • Def 4.6 -- Unconstrained Center of Attention

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

  • when s points are removed from z --> a majority of the remaining points is contained inside the ball
  • Always exists. May not be unique.

  • Smooth Sensitivity
    \[ S_0(z) = 2\ \mathop{max}_{k\geq0}(r^{(z)}(t_0+(k+1)s)e^{-\beta k}) \]

    \[ r^{(z)}(t):\ the\ minimum\ t-radius\ of\ any\ point\ in\ \cal{M} \]

  • Proof.

    • Smooth Upper Bound (a):

    \[ (a)\ Ball\ {\cal B}(g_0(z),r^{(z)}(t_0)) \\ (b)\ Ball\ {\cal B}(g_0(z'),r^{(z')}(t_0)) \\ \{z\ \cap\ z'\} \neq \varnothing \ \ \rightarrow\ \ d(g_0(z),g_0(z')) \leq r^{(z)}(t_0)+r^{(z')}(t_0)\\ r^{(z')}(t_0) \leq r^{(z)}(t_0+s) \ \ \rightarrow\ \ LS_{g_0}^{(s)} \leq r^{(z)}(t_0)+r^{(z)}(t_0+s) \leq 2r^{(z)}(t_0+s) \]

    • Smooth Upper Bound (b):

    \[ \begin{align} S_0(z) &\leq 2\ \mathop{max}_{k\geq0}(r^{(z)}(t_0+(k+2)s)e^{-\beta k}) \nonumber \\ &\leq e^\beta \cdot 2\ \mathop{max}_{k'\geq1}(r^{(z)}(t_0+(k'+1)s)e^{-\beta k'}) \nonumber \end{align} \]

    • Suppose:

    \[ \frac {2m} 3\ entries\ in\ z \in {\cal B}(c,r)\\ r^{(z)}(t_0)\leq r \]

    • 2(a):

    \[ \frac {2m} 3\ entries\ in\ z \in {\cal B}(c,r) \rightarrow \{z\ \cap\ {\cal B}(c,r)\} \neq \varnothing \\ d(c,g_0(z)) \leq r+r^{(z)}(t_0) \leq 2r \]

    • 2(b):

    \[ \begin{align} Condition 1&: k < \frac m {6s} \rightarrow r^{(z)}(t_0+(k+1)s) \leq r \rightarrow S(z)\leq2r \nonumber \\ Condition 2&:k \geq \frac m {6s} \rightarrow r^{(z)}(t_0+(k+1)s) \leq \Lambda \rightarrow S(z)\leq\Lambda e^{-\frac m {6s}} \nonumber \\ \end{align} \]

    \[ S_0(z) \leq 2max(r,\Lambda e^{-\frac m {6s}}) \]

  • Def 4.8 -- Center of Attention

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

  • Smooth sensitivity

\[ S(z) = 2\ \mathop{max}_{k\geq0}(\rho(t_0+(k+1)s)e^{-\beta k}) \]

\[ \rho(t) = \frac1a \sum_{i=1}^ar_i(t) \]

  • Theorem 4.9

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis


4.5 Using a Good Aggregation: Proof of Theorem 4.2

  • Lemma 4.11 -- Chain's Probability

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

  • Proof.

    Binomial distribution:
    \[ f(k,n,p) = \begin{pmatrix} n\\k \end{pmatrix} p^{k}(1-p)^{n-k} \\ \begin{pmatrix} n\\k \end{pmatrix} = {n!\over(n-k)!k!} \]

\[ for\ a\ point\ x,\ the\ probability\ p<\sum_{k \geq s} \begin{pmatrix} m\\k \end{pmatrix} (\frac 1m)^{k}(1-\frac 1m)^{m-k} \leq \sum_{k \geq s}\frac 1 {k!} \\ s>1:\ \sum_{i \geq s}\frac 1 {k!}\leq \frac 2 {s!} \\ Pr(n\ points\ less\ than\ s\ sets) = 1-\frac{2n}{s!} \]

  • Protocol

Smooth Sensitivity_Smooth Sensitivity and Sampling in Private Data Analysis

Proof.
\[ s=\sqrt m \\ S(z) = O(r) +\Lambda e^{-\Omega(\beta m/s)} \\ \]

\[ \begin{align} N(x) &= S(z)/\alpha\leftarrow\alpha=\frac \epsilon 2,\ \beta=\frac \epsilon {4(d+ln2/\delta)} \nonumber \\ &= 2O(\frac r \epsilon) +\frac {2\Lambda} \epsilon e^ {-\Omega(\frac {\epsilon \sqrt m} {4(d+ln2/\delta)})} \nonumber \\ &= O(\frac r \epsilon) +\frac {\Lambda} \epsilon e^ {-\Omega(\frac {\epsilon \sqrt m} d)} \nonumber \end{align} \]